In network communications systems, data is typically transmitted in packages called “packets” or “frames,” which may be routed over a variety of intermediate network nodes before reaching their destination. These intermediate nodes (e.g., controllers, base stations, routers, switches, and the like) are often complex computer systems in their own right, and may include a variety of specialized hardware and software components.
Often, multiple network elements will make use of a single resource. For example, multiple servers may attempt to send data over a single channel. In such situations, resource allocation, coordination, and management are important to ensure the smooth, efficient, and reliable operation of the system, and to protect against sabotage by malicious users.
Packets move through a network processor along a pipeline from one network processing unit to another. Each instance of the pipeline passes packets from one stage to the next, across network processing unit boundaries. Thus at any one time, a network processor may have dozens of packets in various stages of processing.
In a network processor application, the pipeline spans several network processing units. For example, a receive network processing unit reads a data stream from multiple ports, assembles packets, and stores the packets in memory by placing the packets onto a ring that may serve as a holding point for the remainder of the packet processing performed by a second group of network processing units. Traditionally, these rings use a pre-allocated memory region. In addition, conventional rings use a control structure that includes a head pointer that points to the current GET location, a tail pointer that points to the current PUT location, a count that defines the number of entries in the ring, and a size that defines the maximum size of ring.
Reference will be made to the following drawings, in which:
Systems and methods are disclosed for dynamically changing ring size in network processing. It should be appreciated that these systems and methods can be implemented in numerous ways, several examples of which are described below. The following description is presented to enable any person skilled in the art to make and use the inventive body of work. The general principles defined herein may be applied to other embodiments and applications. Descriptions of specific embodiments and applications are thus provided only as examples, and various modifications will be readily apparent to those skilled in the art. Accordingly, the following description is to be accorded the widest scope, encompassing numerous alternatives, modifications, and equivalents. For purposes of clarity, technical material that is known in the art has not been described in detail so as not to unnecessarily obscure the inventive body of work.
Such a flexible ring structure is in contrast to conventional ring structures that are fixed in size. With conventional ring structures, if there are multiple rings defined in a given memory channel, the rings do not share unused capacity with one another. Such an approach causes under utilization of memory resources.
The exemplary ring structure 20 shown in
The ring control structure 34 also includes a tail or insert pointer that contains a tail address T. A write entry residue may be 4B to cache odd 4 write bytes when dealing with burst-of-4 memory as in burst-of-4 memory, writes are performed in 8 bytes. Thus odd 4 bytes are maintained in the write entry residue and are written to memory as full 8 bytes when the next 4 bytes of PUT request arrives. Although not shown, an optional read entry residue may be provided as reads are similarly performed in 8 byte increments when dealing with burst-of-4 memory. The read entry residue caches odd 4 read bytes which are returned to the requester when a GET request for the next 4 bytes arrives.
The ring control structure 34 further includes a count for the number of 4 byte entries in the ring. The count may be a 3B parameter. ME#/TH#/signal# defines the external agent ID that needs to be notified when a critical condition happens. For example, if threshold is reached or exceeded or if the number of available memory blocks falls below a predefined threshold, the external agent can be notified for controlling whether to stop sending entries and/or to add memory blocks. It is noted that the configuration as illustrated and described herein is merely illustrative and various other configuration may be similarly implemented.
Depending on the number of flexible rings, local storage to hold the ring control structures 34 may be added. Merely as an example, for a 64 ring design, a total of 64×16B or 1 kB of internal memory to hold the corresponding ring control structures 34 can be added. The ring control structure 34 may be treated like control and status registers (CSRs). An external host may initialize, e.g., upon boot-up, the ring control registers with their predetermined base values.
The external host may also initialize free block pool in external memory such as in dynamic random access memory (DRAM) channel or in static random access memory (SRAM) channel. The external host may also assign the external service thread 30 (as shown in
Upon determining its local free block pool empty at boot-up, the local free block pool manager 28 generates and transmits a free block fill-up request to the external service thread 30 through its next neighbor FIFO in order to fill up the local free block pool. In response, the external service thread 30 returns a free block pool, e.g., a free block pool of up to 32B. The return by the external service thread 30 may be performed using a write to a dummy address in the SRAM channel which an SRAM controller may then direct to the free block pool manager 28. Optionally, local blocks that become freed up and no longer needed by the free block pool manager 28 may be transmitted to the external service thread 30 to be placed into the external free block pool.
During operation, when local blocks are freed-up and no longer needed by the free block pool manager, the free block pool manager sends the free-up blocks to the external service thread at block 48. The external service thread puts the freed up blocks into the external free block pool at block 50.
When the ring manager receives a PUT request for a corresponding ring at block 62, the ring manager stores the PUT request of 4B in a local write entry residue or writes the PUT request with a previous PUT request (8B total) to location defined by tail pointer. In particular, when the ring manager receives a 4B PUT request, the ring manager may store the 4B PUT request in its local write entry residue when writes are to be performed in 8B increments. When ring manager for this ring receives the next PUT request of 4B, the ring manager writes a total of 8B. The write is performed at the location defined by the tail pointer. The ring manager increments Count and increments the tail pointer by 2 locations at block 66. In particular, Count is incremented on every long word PUT request.
If the incremented tail pointer is the last location of the currently attached block as determined at decision block 68, the ring manager sends a request for a new block to and receives the new block from the free block pool manager at block 70. The ring manager then stores the address of the new block received from the free block pool manager in the last location of the currently attached block at block 72. The ring manager then sets the tail pointer to the first entry of the new block and the new block then also becomes attached (linked) at block 74.
When the ring manager receives a GET request for a corresponding ring at block 82, the ring manager uses the head pointer to issue a read of 8B from the external memory and returns the requested data to the requester at block 84. Upon obtaining the data from the external memory, the ring manager may return the requested 4B word to the requester and discards the remaining 4B. As noted, a read residue may be maintained such that, rather than discarding the remaining 4B word, then remaining 4B word is maintained in the read residue with the read residue valid bit set. Upon receiving the next GET request, the ring manager retrieves the requested data from the read residue.
At block 86, the ring manager decrements Count and increments the head pointer by 1 location. In particular, Count is decremented on every long word GET request.
If the incremented head pointer is the last location of the currently attached block as determined at decision block 88, the ring manager reads the address stored in the last location of the currently attached block, i.e., the link address or pointer to the next attached block at block 90. The ring manager sets the head pointer to the first block of the next linked block at block 92 and returns the previously attached block to the free block pool manager at block 94.
After performing the PUT or GET process, if the Count is equal to the threshold as defined in the ring control structure by the 3-bit encoded threshold parameter for the ring as described above, the external agent defined by ME#/Thread#/Signal# is notified at block 96. In particular, the ring manager also notifies external agent defined by ME#/Thread#Signal# when, for example, the free block pool reaches a critical low threshold.
A ring size encoding of, e.g., 4 bits defines the maximum ring size of 128 LW or 512B to 16M LW or 64 MB in encoded form. It is noted that process 60 is implemented only when linked mode is selected in the ring control structure. If flat mode is selected instead, the block size for the corresponding ring is equal to the maximum size as defined for the ring. In addition, the head and tail pointers wrap aligning with the maximum size as defined for the ring. An external service thread is not employed in the flat mode.
The dynamic changing of the ring sizes allows the allocation of a pool of free memory to be shared amongst a set of rings depending on current memory needs of each ring. Such dynamic changing of the ring sizes rather than the allocation of dedicated memory for each ring improves memory capacity utilization and thus reduces the overall memory capacity requirements for the rings, especially when the rings are used in a mutually exclusive way. For example, if a Ethernet packet is dropped, its parameters can go to Ethernet rings and if a POS packet is dropped, its parameters can go into POS ring. Since a packet is inserted into only one ring, the total memory utilization for each packet is fixed and such a property can thus be exploited in implementing the dynamic changing of the ring sizes.
As noted, the systems and methods described herein can be implemented in a network processor for a variety of network processing devices such as routers, switches, and the like.
The network processor 100 shown in
Network processor 100 may also feature a variety of interfaces that carry packets between network processor 100 and other network components. For example, network processor 100 may include a switch fabric interface 102 (e.g., a Common Switch Interface (CSIX)) for transmitting packets to other processor(s) or circuitry connected to the fabric; an interface 105 (e.g., a System Packet Interface Level 4 (SPI-4) interface) that enables network processor 100 to communicate with physical layer and/or link layer devices; an interface 108 (e.g., a Peripheral Component Interconnect (PCI) bus interface) for communicating, for example, with a host; and/or the like. Network processor 100 may also include other components shared by the microengines, such as memory controllers 106, 112, a hash engine 101, and a scratch pad memory 103. One or more internal buses 114 are also provided to facilitate communication between the various components of the system.
It should be appreciated that
While several embodiments are described and illustrated herein, it will be appreciated that they are merely illustrative. Other embodiments are within the scope of the following claims.