Not applicable.
Not applicable.
In a processor system, memory bandwidth is a precious resource as it may directly translate into system performance and cost. Therefore, memory (e.g., buffer) management capabilities are usually included in modern processor systems. In use, a running application in a processor may send a request to a memory controller or management unit, which may then allocate a memory space of a certain size to the application. When the application no longer needs the memory space, the memory management unit may deallocate or free the memory space for future use. In practice, memory management may be one of the most common functions in a networking hardware system and/or software feature to provide, e.g. high performance packet processing. To implement memory management, various algorithms and data-structures may be used to reduce a required memory bandwidth to fulfill a certain feature.
In one embodiment, the disclosure includes an apparatus comprising a memory configured to store a free list comprising a plurality of nodes, wherein at least one of the plurality of nodes is configured to store a plurality of node addresses, and wherein each of the plurality of node addresses corresponds to one node in the plurality of nodes.
In another embodiment, the disclosure includes a method of memory management comprising using a free list comprising a plurality of nodes and storing a plurality of node addresses in at least one of the plurality of nodes, and wherein each of the plurality of node addresses corresponds to one node in the plurality of nodes.
In yet another embodiment, the disclosure includes an apparatus comprising a memory configured to store a free list, wherein the free list comprises a plurality of nodes including a first pointer node, a second pointer node, and a set of non-pointer nodes, wherein the first pointer node is configured to store a set of node addresses, wherein one of the set of node addresses points to the second pointer node, and wherein the rest of the set of node addresses point to corresponding ones of the set of non-pointer nodes, a processor coupled to the memory and configured to generate an allocation request, and a memory management unit coupled to the memory and configured to store identifying information of the free list, wherein the identifying information comprises a node address of the first pointer node, in response to the allocation request, remove the first pointer node and the corresponding ones of the set of non-pointer nodes from the free list by changing the node address of the first pointer node to a node address of the second pointer node, and store the node address of the first pointer node and the rest of the set of node addresses in a local buffer.
These and other features will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims.
For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.
It should be understood at the outset that, although an illustrative implementation of one or more embodiments are provided below, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or in existence. The disclosure should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.
There are a variety of data structures used for memory management today, including free lists, buddy blocks, bit vectors, etc. Specifically, a free list is a data structure used for dynamic memory allocation and deallocation. In a free list-based memory management scheme, free or unused memory may be organized in units of blocks, which also be referred to herein as nodes. Each node represents a small region (e.g., 32 bytes, 1K bytes, etc.) of memory. Further, each node may be divided into a plurality of sections or parts of equal size. Since one or more of the sections may store node address(es) pointing to other node(s), a section size may depend on a length of a node address. For example, if each node address in the system has 4 bytes (4B), then the size of each section may be determined to be 4B to accommodate the address size.
In use, as any node in a memory pool may become a free node, one or more node addresses may be stored in one or more sections of each node. The node address(es) may point or correspond to another node, thus nodes of free memory may be interlinked together to form the free list.
In addition, identifying information 130 may be stored in a memory management unit and used by the traditional memory management scheme 100 to identify the free list 110. Such information may include at least two of the three parameters including head, tail, and length of the free list 110. In the traditional memory management scheme 100, to allocate a node to a processor, the node 112 (i.e., the head) may simply be removed from the free list 110. Removal of the node 112 may be realized by updating head and length of the identifying information 130. For example, the node address of the node 114 (i.e., second node) may be read from the node 112. Then, head information in the identifying information 130 may be changed from the node 112 with index 0 to the node 114 with index 1, and length information may be reduced by one. On the other hand, when the processor no longer needs a node, the traditional memory management scheme 100 may deallocate the node by adding it behind the tail of the free list 110. Addition of a node may be realized by updating tail and length of the identifying information 130.
The traditional memory management scheme 100 may allocate and deallocate memory by removing and adding a node to the free list 110. These operations may be relatively simple compared to other data structures. However, in the traditional memory management scheme 100, only one section in each node is utilized to store a node address, and other sections are left unused. Consequently, each allocation request may only remove one node from the free list 110, and each deallocation request may only add one node from the free list 110. In other words, allocation and deallocation of multiple nodes may require multiple requests from the processor. Thus, with the free list 110 constructed as is in the traditional memory management scheme 100, it may be difficult to reduce the memory bandwidth required to allocate and de-allocate multiple nodes.
Disclosed herein are systems and methods for improved memory allocation and deallocation. In a disclosed memory management scheme, the data structure of a free list is modified compared to a traditional free list. The disclosed free list comprises one or more pointer nodes and a plurality of non-pointer nodes. Each pointer node is configured to store a plurality of node addresses or pointers. In an embodiment, one of the plurality of node addresses points to a next pointer node in a pointer chain of the free list, while rest of the plurality of node addresses point to a set of non-pointer nodes. The set of non-pointer nodes may not contain any node address, however each non-pointer in the set may be located based on its corresponding node address stored in the pointer node. In the present disclosure, a plurality of nodes may be allocated (or deallocated) with a single allocation (or deallocation) request. In an embodiment, with a read operation in a memory, one pointer node as well as a set of non-pointer nodes indicated by the pointed node may be allocated. Similarly, with a write operation in the memory, one pointer node as well as its pointed set of non-pointer nodes may be deallocated. Further, identifying information of the free list, such as a head, tail, and/or length, may also be used in a memory management module to facilitate memory allocation and deallocation. In comparison with a traditional memory management scheme using a free list, a disclosed memory management scheme may require no additional memory space. At the same time, the disclosed memory management scheme may bring about various benefits such as reducing a bandwidth requirement, improving system throughput, and lowering memory latency.
Although illustrated as a single processor, the processor 112 is not so limited and may comprise a plurality of processors. For example, the processor 112 may be implemented as one or more central processor unit (CPU) chips, cores (e.g., a multi-core processor), field-programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), and/or digital signal processors (DSPs), and/or may be part of one or more ASICs. In practice, if the processor 120 comprises a plurality of cores, a request may be generated and sent by any of the plurality of cores. In addition, the processor 210 may be a network-based processor such as a router, switch, data-center equipment, and gateway GPRS (general packet radio service) support node (GGSN) device.
The memory management unit 220 may process allocation and deallocation requests received from the processor 210. In an embodiment, the memory management unit 220 comprises a logic unit 222 and a local buffer 224. The logic unit 222 may receive request from the processor 210, and make a logic decision to allocate or deallocate a memory space. The local buffer 224 may be used for temporary storage of data which are frequently accessed by the logic unit 222. For instance, in memory allocation, the local buffer 224 may store node addresses read from a node which has been removed from a free list. In memory deallocation, the local buffer 224 may store node addresses pointing to nodes, which are to be added to the free list. The local buffer 224 may reside on a same chip (i.e., on-chip) with the logic unit 222. In addition, the logic unit 222 may reside on a same or different chip with the processor 210 and/or the memory 230.
At least one free list 232 may be stored in the memory 230. The free list 232 may be managed by the memory management unit 220, thus identifying information of the free list 232 may be stored in the memory management unit 220, e.g., in the local buffer 224. The free list 232 will be further described in paragraphs below.
In use, the memory 230 may be any form or type of memory. In an embodiment, the memory 230 is a buffer or a cache. In this case, the memory management unit 220 may be referred to as a buffer management unit or a cache management unit. In terms of location, the memory 230 may be an on-chip memory (i.e., on a same physical chip with the processor 210), such as a cache, special function register (SFR) memory, internal random access memory (RAM), or an off-chip memory, such as an external SFR memory, external RAM, hard drive, universal serial bus (USB) flash drive, etc. Further, if desired, a single memory chip may be divided into a plurality of parts or regions, and each region may be used as a separate smaller memory. Alternatively, if desired, a plurality of memory chips may be used in combination as a single larger memory. Thus, a disclosed memory management scheme may be performed within a single memory chip or across multiple memory chips.
An interconnect between the processor 210 and the memory management unit 220 may be any communication channel or switching fabric/switch which facilitates data communication. In practice, the interconnect may take a variety of forms, such as one or more buses, crossbars, unidirectional rings, bidirectional rings, etc. Likewise, an interconnect between the memory management unit 220 and the memory 230 may also be any communication channel. In the event that the processor 210 and the memory management unit 220, or the memory management unit 220 and the memory 230 are at different locations, the interconnect may take form of a network channel, which may be any combination of routers and other processing equipment necessary to transmit signals between the processor 210 and the memory management unit 220, or between the memory management unit 220 and the memory 230. The interconnect may, for example, be the public Internet or a local Ethernet network.
As an example, the free list 310 comprises a total of (n+1) nodes, where n is a positive integer. For the purpose of illustration, each node is labeled by an index ranging from 0 to n. In the free list 310, the pointer node 312 is a first node with index 0 (also referred to as a head), and the pointer node 318 is a last node with index n (also referred to as a tail). Each pointer node in the free list 310 comprises four sections, and each section is configured to store one node address. For instance, the pointer node 312, whose address is simply labeled by its index 0, may store four node addresses labeled as 1, 2, 3, and 4. A first section of the node 312 may store the node address 4, which points to the pointer node 314, while other sections of the node 312 may store the node addresses 1, 2, and 3, which points to the non-pointer nodes 320, 322, and 324 respectively.
Similarly, other pointer nodes of the free list 310 may also be configured to store four node addresses, e.g., the pointer node 316 with an index of 4 may store four node addresses pointing to one pointer node (with index 8) and three non-pointer nodes (with indexes 5, 6, and 7). Nevertheless, the last pointer node 318 with index n may be configured differently, since it may not point to any additional node. The last pointer node 318 may be configured to contain null addresses or no address. It should be noted that, depending on an addressing scheme used by the processor system, the node addresses stored in a pointer node, such as the pointer node 312, may be either physical or virtual addresses.
Although each pointer node in
The free list 310 may be stored in a memory (e.g., the memory 230 in
In an embodiment, when a processor (e.g., the processor 210) generates an allocation request for a memory space, the memory management unit may allocate the memory space to the processor by removing a plurality of nodes from the free list 310 at one time. For example, with one single read operation performed in the memory to read the contents of the pointer node 312, the pointer node 312 (i.e., the head) and a number of non-pointer nodes 320, 322, and 324, which are pointed to by the pointer node 312 may be allocated together to the processor. Removal of the plurality of nodes from the free list 310 may be realized by updating the identifying information 330. In an embodiment, the memory management unit may send a read request to the memory to access the node 312 for all stored addresses. The address pointing to the node 314 with index 4 may be updated in the identifying information 330 as a new head of the free list 310. Further, if length information is used in the identifying information 330, the length of the free list 310 may be reduced by four (provided that the free list 310 has at least eight nodes).
After being allocated to the processor, the nodes 312, 320, 322, and 324 may no longer be part of the free list 310. Instead, these nodes may be utilized by the processor to store various types of data. In some cases, whether the nodes are free or used by the processor, a first of the four sections in each node may be reserved for storage of a node address pointing to another node, while the other three sections may be used for storage of any type of data. In other cases, after the nodes are allocated, all four sections of each node may be available for storage of any type of data. It should be noted that, when non-pointers nodes (e.g., the node 320) are part of the free list 310, they may be empty or contain any type of data, since the content stored in the non-pointer nodes may not affect functioning of the free list 310.
In an embodiment, when the processor no longer needs a memory space, it may send a deallocation request to the memory management unit to deallocate (or release, or recycle) the memory space for future use. The memory management unit may then add a plurality of nodes to the free list 310 at one time. For example, with one write operation performed in the memory, a plurality of grouped or packed nodes may be added to the free list 310. To add the plurality of nodes, the memory management unit may send a write request to memory to access the free list 310. A plurality of node addresses pointing to the nodes may be written into sections of the node 318 (i.e., the tail). Additionally, the identifying information 330 may be updated in the memory management unit. In an embodiment, the tail of the free list 310 may be changed from the pointer node 318 to a new pointer node (e.g., the pointer node 314 previously removed from the free list 310), and the length of the free list 310 may increase by four.
As shown in
The local buffer 340 may also facilitate the deallocation of nodes. For example, a number of node addresses, which point to a number of nodes released by the processor, may be temporarily stored in the local buffer 340 before being written into the tail of the free list 310. The number of nodes may be nodes that have been previously removed from the free list 310 (e.g., the pointer node 312, the non-pointer nodes 320, 322, and 324). Alternatively, the number of nodes may be nodes that have not been included in the free list 310 before. As shown in
In use, the head of the free list 310 may change after each memory allocation, and tail of the free list 310 may change after each memory deallocation. Therefore, it is possible that any block or node in the memory may at some point end up being the head or the tail of the free list 310. Also, any node in the memory may at some point be a pointer node or a non-pointer node. In the present disclosure, manipulation of node addresses as described above may allow a node to be any node in the free list 310. Furthermore, although only one free list 310 is illustrated in the memory management scheme 300, more than one free list may be used in a disclosed memory management scheme. Multiple free lists may comprise a same or different number of pointer nodes and/or non-pointer nodes. Also, multiple free lists may comprise a same or different size of nodes. In different free lists, nodes may contain a same or different number of sections, and each section may have a same or different size.
Compared with the traditional memory management scheme 100, which may only allocate or deallocate one node with one request, the disclosed memory management scheme 300 may allocate or deallocate a plurality of nodes with one request. As a result, memory allocation and deallocation may be executed faster in the memory management scheme 300. Effective memory bandwidth is increased, or in other words, a requirement of memory bandwidth to fulfill a certain hardware/software feature is lowered, which may lead to cost reduction. This improvement may lead to performance boost in, e.g., dynamic RAM (DRAM) where the memory nodes may be large in size (e.g., 32B or higher). In a DRAM, if free nodes are 32B and a pointer is 4B, then eight node addresses pointing to eight free nodes may be contained within one node. The eight free nodes may be allocated or deallocated with a single request, thereby reducing the memory bandwidth requirement by 8-fold. Consequently, memory management performance may be boosted which leads to higher throughput and lower latency. Also, power consumption of the system may be reduced as a result of the reduced number of read and write operations in the memory. Furthermore, these benefits come at no cost of additional memory space.
Next, in step 420, the method 400 may receive a request, which may be generated by, e.g., a running application in the processor 210 in
In step 460, the method 400 may determine if the request is a deallocation request. If the condition in the block 460 is met, the method 400 may proceed to step 470. Otherwise, the method 400 end. In response to receiving the deallocation request, in step 470, a plurality of additional nodes may be added to the free list. In an embodiment, adding the plurality of additional nodes may be realized by writing node addresses of (or pointing to) the plurality of additional nodes in the last pointer node of the free list. If the node addresses of the plurality of additional nodes have been stored in the local buffer, next in step 480, the node addresses may be removed or evicted from the local buffer. Eviction of the local buffer may leave room for temporary storage of other allocated nodes. It should be noted that, if the node addresses of the plurality of additional nodes have not been stored in the local buffer, they may also be directly written into the last pointer node of the free list. In other words, step 480 may sometimes be skipped, if so desired. After deallocation of the additional nodes, the method 400 may end.
The schemes described above may be implemented on any general-purpose network component, such as a computer or network component with sufficient processing power, memory resources, and network throughput capability to handle the necessary workload placed upon it.
The secondary storage 604 is typically comprised of one or more disk drives or tape drives and is used for non-volatile storage of data and as an over-flow data storage device if the RAM 608 is not large enough to hold all working data. The secondary storage 604 may be used to store programs that are loaded into the RAM 608 when such programs are selected for execution. The ROM 606 is used to store instructions and perhaps data that are read during program execution. The ROM 606 is a non-volatile memory device that typically has a small memory capacity relative to the larger memory capacity of the secondary storage 604. The RAM 608 is used to store volatile data and perhaps to store instructions. Access to both the ROM 606 and the RAM 608 is typically faster than to the secondary storage 604.
At least one embodiment is disclosed and variations, combinations, and/or modifications of the embodiment(s) and/or features of the embodiment(s) made by a person having ordinary skill in the art are within the scope of the disclosure. Alternative embodiments that result from combining, integrating, and/or omitting features of the embodiment(s) are also within the scope of the disclosure. Where numerical ranges or limitations are expressly stated, such express ranges or limitations should be understood to include iterative ranges or limitations of like magnitude falling within the expressly stated ranges or limitations (e.g., from about 1 to about 10 includes, 2, 3, 4, etc.; greater than 0.10 includes 0.11, 0.12, 0.13, etc.). For example, whenever a numerical range with a lower limit, R1, and an upper limit, Ru, is disclosed, any number falling within the range is specifically disclosed. In particular, the following numbers within the range are specifically disclosed: R=R1+k*(Ru−R1), wherein k is a variable ranging from 1 percent to 100 percent with a 1 percent increment, i.e., k is 1 percent, 2 percent, 3 percent, 4 percent, 7 percent, . . . , 70 percent, 71 percent, 72 percent, . . . , 95 percent, 96 percent, 97 percent, 98 percent, 99 percent, or 100 percent. Moreover, any numerical range defined by two R numbers as defined in the above is also specifically disclosed. The use of the term about means ±10% of the subsequent number, unless otherwise stated. Use of the term “optionally” with respect to any element of a claim means that the element is required, or alternatively, the element is not required, both alternatives being within the scope of the claim. Use of broader terms such as comprises, includes, and having should be understood to provide support for narrower terms such as consisting of, consisting essentially of, and comprised substantially of. Accordingly, the scope of protection is not limited by the description set out above but is defined by the claims that follow, that scope including all equivalents of the subject matter of the claims. Each and every claim is incorporated as further disclosure into the specification and the claims are embodiment(s) of the present disclosure. The discussion of a reference in the disclosure is not an admission that it is prior art, especially any reference that has a publication date after the priority date of this application. The disclosure of all patents, patent applications, and publications cited in the disclosure are hereby incorporated by reference, to the extent that they provide exemplary, procedural, or other details supplementary to the disclosure.
While several embodiments have been provided in the present disclosure, it may be understood that the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.
In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, units, techniques, or methods without departing from the scope of the present disclosure. Other items shown or discussed as coupled or directly coupled or communicating with each other may be indirectly coupled or communicating through some interface, device, or intermediate component whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and may be made without departing from the spirit and scope disclosed herein.
The present application claims priority to U.S. Provisional Patent Application No. 61/531,352 filed Sep. 6, 2011 by Sailesh Kumar et al. and entitled “High Performance Free Buffer Allocation and Deallocation”, which is incorporated herein by reference as if reproduced in its entirety.
Number | Date | Country | |
---|---|---|---|
61531352 | Sep 2011 | US |