eheap.c is the complete source code for the heap described in this invention.
eheap.h provides definitions and examples of configuration constant definitions that are necessary to use eheap in an embedded or similar environment.
edemo.c is demonstration code for eheap.
The following discussion centers on embedded systems because their requirements are well understood. However, the methods presented herein should not be considered to be limited to embedded systems.
In a previous patent application (Moore U.S. Ser. No. 15/441,065 titled “Optimizable Heap for Embedded and Similar Systems with Enhanced Debugging and Self-Healing”, Aug. 23, 2018) an invention comprising several methods to improve heaps for embedded systems was presented. None of the methods presented in this patent application were included in that prior application. Nonetheless, information contained therein may be helpful to understand the requirements for embedded system heaps.
Heaps are well-known structures used in computer systems to provide dynamic memory allocation and deallocation. Due to the simple nature of most embedded systems, heaps have not been used extensively in them. However, embedded systems are becoming more complex and their need for dynamic memory allocation is increasing.
Embedded systems are characterized by the following requirements:
Although targeted at the above requirements, this invention is not limited to embedded systems. Other systems may share some of the above characteristics or require the enhanced performance and security offered by this invention and thus benefit from it. Therefore, the term “embedded system” should not be interpreted to exclude other systems.
Aligned Allocations: dlmalloc created by Doug Lea performs aligned allocations (gee.cs.oswego.edu/pub/misc/mallloc-2.8.2.c, 5/27/09). Alignment is on power-of-two boundaries. For an aligned allocation, it seldom happens that the data block is already aligned. Hence, there usually is spare space between the chunk control block (CCB) and the aligned data block. dlmalloc requires that this space be at least as large as the minimum size chunk allowed by it and the space is made into a new free chunk. This approach has two disadvantages:
These create size and performance problems, which may be prohibitive for embedded systems.
In another approach by Alexander Tomlinson (US Pub. 20130346719 titled “Systems and Methods for Efficient Memory Access”, Dec. 26, 2013), a pointer to the chunk control block is loaded below the aligned data block and space between it and the chunk control block is simply wasted. This has two disadvantages:
The first is likely to be too inefficient for limited-memory systems. The second introduces debugging complexity.
Numerous power-of-two data block size allocators exist (e.g. Binary Buddy Algorithm: Kenneth Knowlton, “A fast storage allocator” Comm of the ACM 8(10):623-625), for which data blocks are naturally aligned on powers of two. However, these allocators do not permit allocating data blocks specified sizes with specified alignments.
MPU Region Allocations: The Cortex v7M processor architecture from ARM Ltd. is by far dominant architecture for Micro Controller Units (MCUs) used in embedded systems. This architecture is the basis for hundreds of different types of MCUs, manufactured by dozens of semiconductor vendors, and billions of these MCUs have been shipped, to date. The Cortex v7M architecture includes a Memory Protection Unit (MPU), which presents memory regions to application code. Such code cannot access memory outside of the MPU regions. In addition, each MPU region permits only certain types of accesses such as execute never, read only, read/write, etc. Any violation of a memory address or an access attribute triggers a Memory Manage Fault (MMF), which stops the application code and allows the operating system to take over.
Thus, an MPU effectively thwarts malware from gaining system control, thus improving the security of embedded systems that control automobiles, machinery, etc. This has become increasing important due to connection of more and more embedded systems to the Internet, known as the Internet of Things (IoT). Unfortunately, these connections allow hackers access to highly vulnerable embedded systems.
Unfortunately, each Cortex v7M MPU region must be a power of two in size and must be aligned upon its size boundary. This makes the MPU difficult to use in embedded systems, which typically have very small memories compared to general purpose systems such as laptops, servers, and smart phones. As a consequence, the Cortex v7M MPU has been little-used in embedded systems despite growing security threats to embedded systems.
Heaps are typically used to allocate task stacks, buffers, messages, etc. It is highly desirable to put these into individual MPU regions so that malware can be easily caught and hackers stopped in their tracks. However, no known heap permits MPU region allocations.
Large Numbers of Small Data blocks: Due to the increasing complexity of embedded systems, more and more application code is being written in object-oriented languages such as C++ and Java. These languages are easier to use for application code than the traditional C and assembly languages used heretofore.
An inherent characteristic of such languages is that applications often allocate hundreds, or even thousands of small objects from the heap. These objects may be as small as 8 bytes. As a consequence, general-purpose heaps such as dlmalloc place major emphasis upon low overhead per inuse data block. These heaps typically have only 8 bytes of overhead on an 8-byte data block or 4 bytes of overhead on a 12-byte data block, giving overheads of 100% or 50%, respectively. However, even this much overhead may be too much for an embedded or similar system having very restricted amounts memory if hundreds or thousands of small objects are needed.
In addition, the resulting dlmalloc heap structure is fragile since it is not possible to scan the heap in both directions in order to fix broken links. This is an acceptable weakness for general-purpose systems in which applications (e.g. word processors) run for only a few hours or a day, at a time. It is unacceptable for embedded systems, which are expected to run forever in harsh environments and with no operator presence. In such systems many single-bit memory errors are likely to occur during their operational lifetimes. Such errors can cause serious damage if not caught and fixed quickly.
Many heaps are based upon block pools. Examples are McMahon U.S. Pat. No. 5,784,699 titled “Dynamic Memory Allocation in a Computer Using a Bit Map Index” 7/21/98 and Czajkowski U.S. Pat. No. 6,453,403 titled “System and Method for Memory Management Using Contiguous Fixed-Size Blocks”, Sep. 17, 2002. See Wilson, “Dynamic Storage Allocation: A Survey and Critical Review”, for other examples. The problem with these methods is that while block pools are efficient for very small data blocks, they impose having a very large number of block pools or having data blocks that are generally too large for larger data blocks. Neither of these is good for limited memory systems.
eheap Demonstration Code
eheap source code has been included to meet the full disclosure requirement. eheap.c is the complete code for eheap and edemo.c demonstrates features relevant to this invention. Compile both, link, and run using IAR EWARM C/C++ for any supported evaluation board. Also, can compile and run on a PC using Microsoft Visual C/C++, however some editing may be required due to compiler differences.
The inventive subject matter consists of three parts:
The following information describes a simple embodiment of the invention sufficient to explain how it works. Other possible embodiments may be mentioned where useful to illustrate the scope of the invention. Drawings are not to scale for the sake of clarity. In the descriptions that follow, the term “heap memory” means the memory from which data is allocated. Heap memory is composed of chunks; each chunk contains a Chunk Control Block (CCB) followed by a data block. The data block is what is allocated to a program. The CCB contains information necessary to manage the chunk and the heap. A chunk with an allocated data block is called an “inuse” chunk. A chunk with a free data block is called a “free” chunk.
The heap structure described herein is a basic heap for which various search mechanisms may be used such as linear search, bin search, tree search, etc. The search mechanism is not important for this invention. The chunks in the heap must be linked together in order. Usually forward and backward pointers are used, but chunk and prechunk sizes or other methods may equally be used. The exact mechanism for doing this is also not important for this invention.
It often happens that a chunk is larger than necessary and residual space is left at the top in the chunk above the data block. Typically, this space is merged with a free postchunk, split into a new free chunk, or left in the allocated chunk. The mechanisms for this are not important for this invention. Hence, for simplicity, residual space is not shown in most figures, but this should not be interpreted to mean that residual space is excluded from this invention.
Notes:
Heap Structure
Aligned Allocations
The lower chunk consisting of CCB 403 and inuse data block 404 is unchanged and its spare space flag 402 remains 0.
MPU Region Allocations
A region search starts with a requested size, s. The region size, 2{circumflex over ( )}r, is determined such that 2{circumflex over ( )}(r−1) is less than s and s is less than or equal to 2{circumflex over ( )}r. Then the subregion size is 2{circumflex over ( )}(r−n), assuming n subregions per region. For simplicity, we assume 8 subregions per region so that n=3. Some MPUs have more or less subregions. This invention works for any number of subregions.
Then the search size, ss, is determined such that for N=a positive integer, ifs is greater than (N−1)*2{circumflex over ( )}(r−3) and s is less than or equal to N*2{circumflex over ( )}(r−3), then ss=N*2{circumflex over ( )}(r−3). For example, if s=600 bytes, then 2{circumflex over ( )}r=1024 bytes, a subregion=128 bytes, and ss=5*128=640 bytes. Then an aligned search is made using ss for the size and an=7 for the alignment number. When such a chunk is found the final requirement is that all subregions of the data block must be in the same region.
Region free data block 607 is aligned on subregion 1 boundary 614 and ends one byte below subregion 6 boundary 615. It contains 5 subregions 1 through 5. It should be appreciated that this is just an example. A region free data block must be an integral number of subregions large enough to hold the requested block size. In this case the requested size, s, is greater than 4 subregions and less than 5 subregions in size, so it has been rounded up to 5 subregions. In actual use, the final block size may be anything from 5 to 8 subregions and the data block can be aligned on any subregion boundary as long as the whole region free data block 607 fits within the same region. For example, instead of being aligned on subregion 1, data block 607 could be aligned on subregion 3. However, it could not be aligned on subregion 4, because the last subregion would be outside of region 612.
A unique aspect of this invention is that the region data block 607 need not be aligned on a region boundary, but rather can be aligned on a much smaller subregion boundary such as 614. This reduces the time to find a suitable free chunk and may reduce wasted space.
The main differences between aligned allocations and MPU region allocations are as follows:
So, the MPU region chunk search is similar to an aligned search, except that the data block found must be entirely with an MPU region.
Integrated Block Pools
An external Pool Control Block (PCB12) 720 controls access to pool 706. It contains a last block pointer (PX) 721, a next block pointer (PN) 723, a first block pointer (PI) 722, the number of blocks in the pool (NUM) 724, the number of blocks inuse (INUSE) 725, the high-water mark (HWM) 726, and other useful fields.
An external start of heap pointer (SHP) 730, points to the first chunk of the heap.
Initially, PN field 722 of PCB12 720 points to the first block in pool 706; the first word of this block points to the next block, etc., until all blocks in pool 706 are linked together into a singly-linked free list, as shown in the dashed pool 706. The last block containing 0 in its first word. This is the common way that block pool free lists are structured. It is not part of this invention and shown only for completeness in
A data block may be allocated from block pool 704 if the requested size is 8 bytes or less. If the alignment number, an, is 3 or less, the block pointed to by PN 712 is taken and its first word is loaded into PN 712. PN 712 will be 0 if the pool free list is now empty. If an is greater than 3, the free list pointed to by PN 712 is searched for up to N blocks to find a block with alignment 2{circumflex over ( )}an, where N is a compile-time configuration constant chosen by the programmer. If an aligned block is not found or if block pool 704 is empty, the data block is taken from heap space 708.
A data block may be allocated from block pool 706 if the requested size is 12 bytes or less and greater than 8 bytes. If the alignment number, an, is 2 or less, the block pointed to by PN 722 is taken its first word is loaded into PN 722. PN 722 will be 0 if the pool free list is now empty. If an is greater than 2, the free list pointed to by PN 722 is searched for up to N blocks to find a block with alignment 2{circumflex over ( )}an, where N is a compile-time configuration constant chosen by the programmer. If not found or if block pool 706 is empty, the data block is taken from heap space 708.
In PCB08 710, NUM field 714 is set to the number of blocks in pool 704, INUSE field 715 keeps track of how many blocks are currently in use, and HWM field 716 records the largest value of INUSE 715 since operation began. If, after a long run, HWM field 716 equals NUM field 714 then block pool 704 has been exhausted and additional blocks have been allocated from heap space 708. The programmer can increase the size of block pool 704 until this no longer happens. Then he has a good idea of peak demand for blocks from block pool 704 and he can set block pool 704 size smaller to achieve a good balance between performance and memory usage. The same applies to block pool 706.
When a data block is freed, its block pointer, bp, is tested. If bp is greater than SHP 730, the data block is freed to heap space 708. Otherwise, if bp is greater than or equal to PI 723 it is freed to block pool BP12 706 by adding it to the free block list pointed to by PN 722. Otherwise, it is freed to block pool BP08 704 by adding it to the free block list pointed to by PN 712. This method is fast enough for a few block pools. If there are more block pools another method, such as binary search, can be employed to find the correct pool.
Free Operation with Spare Space
Aligned Search
When a candidate chunk 902+903 has been found, the next step is to find the first 2{circumflex over ( )}an boundary 907 above the start of free space 906, where an is the requested alignment number. The distance between the start of free space 906 and boundary 907 is d 910. d is in the range of 0 to 2{circumflex over ( )}an−1. In this case s+d exceeds free space 903, so chunk 902+903 is rejected and a search is made for another candidate free chunk.
It should be appreciated that the foregoing is but one embodiment of this invention, which has been chosen here to present a clear description of all of the features of the invention. It should be recognized by one skilled in the art that the introduction of spare space has the following advantages and ramifications:
The foregoing advantages make aligned allocations more practical in embedded systems with limited memory. Embedded systems frequently have heavy I/O loads and I/O controllers often require aligned buffers, packets, arrays, and structures. Being able to allocate these from a heap simplifies programming and increases software flexibility. In addition, doing cache-line aligned allocations from a heap in DRAM can result in significantly improved performance. Finally, aligned allocations are necessary to create dynamic regions for the Cortex v8M MPU.
It should be recognized by one skilled in the art that the introduction of MPU region allocations allows dynamic Cortex v7M MPU region creation, which is useful for:
Thus, the ramifications of this unique heap service are to improve the security of embedded and similar systems.
It should be recognized by one skilled in the art that the introduction of integrated block pools combines the fast access and low overhead of block pools for very small blocks with the flexibility of heaps for larger blocks. The advantages of this method are:
As a consequence, block pools need not be sized for peak demand, which is met by heap backup when pools become empty or cannot satisfy alignment requirements. Thus, a tradeoff can be made between the size of each pool and overall system performance. In essence, block pools have been made to look like part of the heap to application programmers. This simplifies their job and reduces programming errors, while achieving excellent performance and memory efficiency for the small data block allocations typical of object-oriented languages.
Number | Name | Date | Kind |
---|---|---|---|
5561786 | Morse | Oct 1996 | A |
5680582 | Slayden | Oct 1997 | A |
5784699 | McMahon | Jul 1998 | A |
6070202 | Minkoff | May 2000 | A |
6412053 | Bonola | Jun 2002 | B2 |
6757802 | Trainin | Jun 2004 | B2 |
7962707 | Kaakani | Jun 2011 | B2 |
8015385 | Schopp | Sep 2011 | B2 |
8838928 | Robin | Sep 2014 | B2 |
20040139272 | Rodriguez-Rivera | Jul 2004 | A1 |
20100205374 | Meka | Aug 2010 | A1 |
20140282589 | Kuang | Sep 2014 | A1 |
20190073145 | Angelino | Mar 2019 | A1 |
20200097646 | Buhren | Mar 2020 | A1 |
Entry |
---|
Lea, Doug, “A Memory Allocator”, last modified Apr. 4, 2000, retrieved from <gee.cs.oswego.edu/dl/html/malloc.html> (Year: 2000). |
Lea, Doug, “malloc-2.8.2.c”, Version 2.8.2, Jun. 12, 2005, retrieved from <gee.cs.oswego.edu/pub/misc/malloc-2.8.2.c> (Year: 2005). |
Yoonseo Choi and Hwansoo Han. 2006. Protected heap sharing for memory-constrained java environments. In Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems (CASES '06). Association for Computing Machinery, New York, NY, USA, 212-222. (Year: 2006). |
Ross McIlroy, Peter Dickman, and Joe Sventek. 2008. Efficient dynamic heap allocation of scratch-pad memory. In Proceedings of the 7th international symposium on Memory management (ISMM '08). Association for Computing Machinery, New York, NY, USA, 31-40. DOI:https://doi.org/10.1145/1375634.1375640 (Year: 2008). |
U. Seshua, N. Bussa and B. Vermeulen, “A Run-Time Memory Protection Methodology,” 2007 Asia and South Pacific Design Automation Conference, Yokohama, 2007, pp. 498-503, doi: 10.1109/ASPDAC.2007.358035. (Year: 2007). |
Number | Date | Country | |
---|---|---|---|
20200249852 A1 | Aug 2020 | US |