This invention relates in general to memory storage devices and techniques supportive of electronic message communication and, more particularly, to enabling access of message packet data to a processor.
Network traffic processing systems are used to route electronic message traffic in communications networks. Communications networks that require electronic message routing include the Internet, intra-nets, extra-nets, computer networks, and telephony networks. The efficiency that network traffic processing systems process messages and message components, e.g. packets, often has a significant effect on the over-all efficiency of a communications network.
In addition, the increasing danger of intrusions into communications networks, and to protected sections of the communications network, by software viruses, to include software worms, are escalating both the potential traffic load that a network may bear as well as the complexity of the behavior of network traffic processing systems employed to protect against intrusions of viruses.
There is, therefore, a long felt need to provide systems and methods that increase the efficiency that network traffic processing systems process message traffic.
It is an object of the present invention to provide a system and method to process an electronic message, and a component of an electronic message, is provided. This object and other objects of the method of the present invention will be made apparent in light the present disclosure. According to principles of the method of the of the present invention in a preferred embodiment, a system and method to dynamically allocate memory in a random access memory for reading to, and writing from a network traffic processing system, (“network processor system”) are provided. In certain preferred embodiments of the present invention, the network processor system includes a network controller processor, and optionally a system memory, communicatively coupled with a random access memory, e.g. a packet memory. The network processor system forms a software model of the memory locations of the packet memory, where the software model assigns a plurality of packet addresses to a plurality of separate block of memory addresses of the packet memory. The network processor system receives a first packet and then stores the first packet in a block of memory addresses of the packet memory that are associated with a first packet address of the software model.
Certain alternate preferred embodiments of the method of the present invention include the step of forming a packet group buffer in the software model, where the packet group buffer stores the packet addresses individual packets stored in the packet memory. The packet memory may be or include a suitable random access memory, dynamic random address memory, or other memory device known in the art. Certain still alternate preferred embodiments of the method of the present invention may further or alternatively include (1.) determining the length of each packet associated with each packet address stored in the packet group buffer, and (2.) storing the length in a packet group buffer in a memory, where each length stored in association with the corresponding packet address. The software model may present a plurality of packet group buffers, and a packet group buffer queue, where the packet buffer group queue contains designations of a plurality of packet group buffers. Each designated packet group buffer is selected for locating at least one packet from the packet addresses listed in the corresponding packet group buffer, for use in reading the packet from the packet memory. The network processor system may further include an on-chip memory, and the packet group buffer may be stored in the on-chip memory of the network processor system.
Certain yet alternate preferred embodiments of the present invention provide a network traffic processing system having a memory manager device, such as a DRAM controller module (“DCM”), where the memory manager device is communicatively coupled with a network controller processor of the system, and a random access memory. The memory manager device stores a software model of the random access memory and a device driver. The software model allocates memory blocks of the random access memory as uniquely addressed packet addresses. The device driver determines the unused memory blocks as designated by the packet addresses and then informs the network control processor of the packet addresses of the unused memory blocks. The system may also include a packet group buffer as defined in the software model, where the packet group buffer stores the packet addresses of individual packets stored in the random access memory. The packet group buffer may alternatively or additionally be stored in the random access memory of the network processing traffic system. The packet group buffer may further include a stored length of each packet associated with each packet address as stored in the packet group buffer, where the length of each packet is stored in the packet group buffer in association with the corresponding packet address.
Certain additional alternate preferred embodiments of the present invention provide a method to manage (1.) packet memory storage and (2.) access in and from a packet memory. Accordingly, a network processor system having a CPU and a system memory is provided, where the CPU requests a packet memory block designation from a FreeList stored in the system memory. The FreeList has internally stored a plurality of packet memory block designations of packet memory blocks of the packet memory, where each listed packet memory block is available, i.e. “free”, to accept storage of a memory packet. The CPU receives a selected packet memory block designation from the FreeList and then writes the packet from the network processor to the packet memory block of the packet memory corresponding to the selected packet memory block designation.
An updated copy, or data mirror of the PBG FreeList 21 may be stored in the, the DRAM and/or the system memory in additional FreeList buffer(s), whereby the memory manager device, the packet memory and the system memory maintain substantively identical FreeLists substantively contemporaneously. Alternatively or additionally, an update copy, or second data mirror, of the packet buffer group data structure may be recorded in a second packet buffer group memory of the memory manager device, the packet memory and/or the system memory, whereby the packet buffer group data structure and the second packet buffer group memory are maintained substantively identical and contemporaneously.
Certain still other alternate preferred embodiments of the method provide a packet group buffer queue data structure in the memory manager device, system memory and/or the packet memory. The packet group buffer queue data structure contains addresses of packet memory block designations, and stores at least one packet memory block designation of a packet scheduled for, or intended for, egress from the packet memory.
Other objects, advantages, and capabilities of the present invention will become more apparent as the description proceeds.
In summary, what has been described above are the preferred embodiments for a system and method for processing data packets in a network traffic message processing system. While the present invention has been described by reference to specific embodiments, it will be obvious that other alternative embodiments and methods of implementation or modification may be employed without departing from the true spirit and scope of the invention.
These, and further features of the invention, may be better understood with reference to the accompanying specification and drawings depicting the preferred embodiment, in which:
The following description is provided to enable any person skilled in the art to make and use the invention and sets forth the best modes contemplated by the inventor of carrying out his or her invention. Various modifications, however, will remain readily apparent to those skilled in the art, since the generic principles of the present invention have been defined herein.
Referring now generally to the drawings, and particularly to
Referring now generally to the Figures, and particularly
While packets are stored using 128 byte buffers of the DRAM 6, the DRAM 6 also stores address and packet length information for packets in special buffers 22 that serve as the units for egress queue enqueue and dequeue operations. These buffers are referred to as a PGB's 22 (Packet Group Buffers). As shown in
The packet length field 22B in the PGB 22 is used by the egress packet schedulers in an IOM016A (as per
The PGB FreeList 23 is kept in the on-chip system memory 18, or CPM memory 18. PGBs 22 are consumed from the PGB FreeList 23 by the CPU 12 when a group of contiguous packets received from the fabric are enqueued for egress scheduling. PGBs are freed by the DCM 14 upon their request by the egress packet schedulers. DCM 14 contains a PGFL (PG FreeList) register that provides the memory address of the PG FreeList so that DCM 14 can update the free/use status of a PGB when the PGB is released upon egress packet scheduling.
Referring now generally to the Figures, and particularly
Each PGBQ 26 is implemented as a queue-using read and write pointers of the DRAM memory 6. The read pointer points to the first non-empty location containing a PGB address 22A. This pointer is used to fetch the next PGB 22 upon a request by the egress packet schedulers in the IOM 16. The write pointer points to the first empty location where the address of a new PGB may be stored. All of the 96 PGBQs 46 are resident in contiguous memory in the DRAM 6 starting after the PGB area 24. The maximum number of entries in each PGBQ 26 is configured by a software model 28. The maximum number of entries in each PGBQ 26 is indicated by a corresponding PGQN (PGB Queue NumEntries) register. The DCM 14 contains registers PGQN0-PGQN95 to indicate the maximum number of entries in the 96 PGBQs 26 (only 8 registers are used 10 Gigabit Ethernet mode). The maximum number of entries in each PGBQ 26 is constrained to be a multiple of 16. Since each PGB pointer entry in the PGBQ 26 is 4 bytes, this implies, that the memory required for a PGBQ 26 is always a multiple of 64 bytes. The PGQN registers are 16 bit registers thus allowing each PGBQ 26 to contain a maximum of 64K PGBs.
The software model 28 may reside, be stored, or be distributively stored in the system memory 18, the DMC 14, the CPU 12, and/or elsewhere on the network processor 4. The software model 28 creates a data structure that associates selected memory blocks of the DRAM 6 with individual packet addresses. A PGQS (PGB Queue Size) 29 register is associated with each PGBQ 26 that indicates the current size of the corresponding PGBQ 26. The DCM 14 contains 96 registers PGQS0-PGQN95 which indicate the current size of the PGBQs 26 in terms of the number of packet buffers currently allocated to each of the PGBQs 26. The PGQS registers 29 are used by traffic management software of the first version 2 to enforce congestion based packet drops. These PGQS registers 29 are incremented when packets are enqueued by the CPU 12 upon their arrival from the fabric (not shown). The optional plurality of CPU's 12 may use parallel add operations to increment PGQS registers. The PGQS registers 29 are decremented upon the release of a packet buffer by DCM 14. The PGQS registers 29 are 32 bits wide: however, the upper 8 bits are reserved and always read zero.
The DCM 14 contains PGBS registers 29 PGQB0-PGQB95 that indicate the beginning address of the PGBQs 26 in DRAM memory 6 and registers PGQE0-PGQE95 that indicate the ending address of the PGBQs 26 in DRAM 6. These registers are 32 bits wide; however, the upper 8 bits are reserved and always read as zero.
The enqueuing and dequeuing of PGBs22 in each PGBQ 26 requires read and write pointers. Thus, DCM 14 contains registers PGQR0-PGQR95 that contain the read pointers for all the PGBQs 26. DCM 14 also contains registers PGBW0-PGBW97 that contain the right pointers for all the PGBQs 26. The read and write pointers are 32 bit registers; however, the upper 8 bits are reserved and always read zero. In addition to these registers, there are 32 bit registers PGQF0, PGQF1, and PGQF2 of the DCM 14 that indicate the empty/full status of the 96 PGBQs 26.
Referring now generally to the Figures, and particularly
The ReadRegion 34 and WriteRegion 36 of each PGBQ 26 are cached on-chip in the system memory 18. This area of CPM memory 18 is referred to as the PGQCache 24. Since the ReadRegion 34 and WriteRegion 36 of PGBQ are 64 bytes each, the size of the PGB Cache is 96*64*2=12 KB. DCM 14 contains a 32-bit register, called the PGCA (PGBQCache Address) register that contains the base address of the PGBQCache 24.
Referring now generally to the Figures, and particularly
The steps for PGB Dequeue are shown in
Referring now generally to the Figures, and particularly
Packet buffers are consumed from the PBG FreeList 23 by the CPU 12 after processing of the packets that arrive from the fabric. DCM 14 contains a PBFL (packet buffer FreeList) register that provides the memory address of the packet buffer FreeList 23. The CPU 12s reserved packet buffers by invoking a FreeList Malloc operation on the FreeList 23. The DCM 14 can update the free/use status of a packet buffer by invoking the FreeList free operation when the packet buffer is released upon egress packet scheduling.
An updated copy 28, or data mirror 28 of the PBG FreeList 23 may be stored in the DRM 14, the DRAM 6 and/or the system memory 18 in additional FreeList buffer(s) 38, whereby the DRM 14, the packet memory 6 and the system memory 18 maintain substantively identical FreeLists 23 substantively contemporaneously. Alternatively or additionally, an update copy 38, or second data mirror 38, of the packet buffer group data structure may be recorded in a second packet buffer group memory 40 of the DCM 14, the packet memory 6nand/or the system memory 18, whereby the packet buffer group data structure and the second packet buffer group memory are maintained substantively identical and contemporaneously.
Referring now generally to the Figures and Particularly to
The SDU 44 interfaces between CPMs 18, MCUs48 and the IOM016A for egress packet processing, off chip packet storage, off chip table/queue storage. The CPMs 18 store the packets in a Packet Buffer area 24 using Packet DMA services. The CPM 18 finds in-sequence packets and queues (enQueue's) the packets by means of PGBQs 26 for scheduling to IOM016A. The DCM 14 maintains the pointers in form of PGBs. The CPM 18 can also retrieve the packets back to the CPM memory 18 by using packet DMA commands. The IOM016A recovers (deQueue's) the PGB 22 at appropriate time, parses the PGB 22 to extract packet pointer and size. The packets are then read out by the IOM016A in chunks of packet buffer and sent over the egress port.
The SDU 44 provides one or more of the following services:
The SDU 44 interfaces to the other modules and circuits in a semiconductor chip 46 via the ring interconnect 48. Transactions are received from a ring interconnect block (“RIB”) 50. RX queues from one or all the on-chip modules 12, 14, 16, 18 & 20 may be polled in round robin sequence. The requests are stored in request pool for each serving block and forwarded on availability. The RIB 50 interprets the OCC transaction and depending on the transaction, sends them over to the Memory Operation block (MOPB), PGBQ block (PGBQB) 52, Packet DMA block 54, or the Flow Manager Block 56. If the block is currently busy, the decoding stalls until the block is available. The responses are dispatched to the individual blocks directly.
Referring now generally to the Figures and particularly to
Referring now generally to the Figures and particularly to
Referring now generally to the Figures and particularly to
One or more of the following smart memory operations are supported by the second version 42:
A memory controller interface block 64 interfaces the two MCU channels to one or more other request decoder blocks. The memory controller interface block splits the transactions to the memory controller unit so that the load on each channel is balanced. In addition, the memory controller interface block optionally keeps track of outstanding requests for each channel. The memory controller interface block may also optionally maintain read response buffers for out of order responses and reassembly of the split transactions.
The blocks that require communication with another on chip block may send request to one or more transaction encoder blocks. These requests are then encoded in OCC packet format and sent to the relevant module over TX links.
Referring now generally to the Figures and particularly to
A transaction decoder block 66 (“TDB”) is a component of the RIRB 64. The TDB 66 decodes the requests from the Ring Bus and sends the appropriate commands to the other blocks for further processing.
The interface to the TDB block 66 is shown in the Table 1. The data is transferred in 64-bit chunks. The first assertion of the dcm_rirb_rx_data_v signal signals the start of an OCC transaction transfer. The end bf a transaction is signaled by the assertion of dcm_rirb_rx_data_last signal along with dcm_rirb_rx_data_v signal. The dcm_rirb_rx_data is valid for each cycle in which dcm_rirb_rx_data_v is asserted. The transaction decoder can stall the data transfer at any time by asserting dcm_tdb_rirb_stall in which case the dcm_rirb block will continue to hold the same data during next cycle.
The TDB starts decoding the transaction when the TDB receives the first data word. An internal counter of the TDB is reset at the start of the decode and is incremented for each word the TDB receives. The decoding process is primarily controlled by the first word and second word of the transaction, which contains the destination address, opcode and the counter value. The opcodes handled by the decoder block are shown in Table: 2.
A command from the TDB block 66 to the PGBQB, PDB, FTMB or MOPB block can take one or more cycles. A command starts with the assertion of dcm_tdb_xxx_v and ends with the assertion of dcm_tdb_last. The OCC packet data is transferred from TDB on the dcm_tdb_data bus qualified by the dcm_tdb_xxxx_v signal. The intended block can hold off accepting new commands by asserting dcm_xxxx_tdb_stall.
The Ring Interconnect Tx block (RITB) 64 implements the transmit portion of the Ring Interconnect interface. There are three instances of the RITB, wherein each instance caters to one TX Link. One Tx link may be dedicated to the IOM0 to ensure low latency high traffic path for the egress packet data. The other two Tx links connect to CPMs in clockwise and anti-clockwise route. These two Tx links have balanced traffic due to symmetry of processing on CPMs.
The PGBQ block (“PGBQB”) contains the logic to implement the PGBQ 26 functionality. The PGBQB supports one or more of the following operations:
Referring now generally to the Figures and particularly to
A State SRAM holds the queue state. There can be a total of 96 PGBQs 46. For each PGBQ, a set of registers is stored in the SRAM. The registers are described in Table 3.
The registers take up a total of 11 bytes. Assuming a 128-bit wide SRAM and allocating 16 bytes per PGBQ, it takes accesses to retrieve the complete PGBQ state. The total memory used is (16*96)=1536 bytes.
A PGBQB SRAM holds two region blocks for each of the PGBQs 46. Each region block is 32 bytes and contains eight PGB pointer entries. The lower bits of the PGBQ_READ_INDEX and the PGBQ_WRITE_INDEX act as pointers to point to entry within the block while PGBQ number is used to select a particular block in this area. Update of this block is mentioned in the State machine section. The total memory used is 96*2*32=6144 Bytes
A PGB cache SRAM area holds PGB entry for the Enqueue command. The packet pointers provided with the command are written to this area. Once the Cache is full, it is written to DRAM. If Dequeue command is detected before cache full condition and cached entries are the only entries in the PGBQ, then current set of entries are sent to IOM. The total memory used is 96 times 128 Bytes, or 12288 Bytes. The PGB cache SRAM size is therefore 19968 Bytes.
A PGBQB Region Block (“PGBQB_RGNB”) interfaces to one or more PGBQ state machines to carry out PGBQ region related commands. The state machines send requests to read and write PGBQ regions to the Region Block. The PGBQB_RGNB block can process one request at a time from the state machine. The interface between the PGBQB_RGNB and the PGBQB_SM blocks is shown in Table 5: PGBQB_RGN Interface. The different command types accepted by the PGBQB_RGNB block as indicated by the pgbqb_smb_rgnb_rwn are as below:
2. WriteRegion: This command writes the indicated region to the external RAM and indicates completion of the operation to state machine.
The operation of a PGBQB state machine is now discussed. There are four branches to the state machine taken depending on the command and current state of the queue, namely:
The branching may be done after a common MEM_RD_0 state, which loads the current state of the PGBQ.
Referring now generally to the Figures and particularly
PGBQB Enqueue State Machine:
PGBQB Dequeue State Machine:
The PGBQ_SM interface is shown in Table 6: PGBQ_SM Interface
A PGBQB packet composer block (“PGBQ_PCB”) receives the PGB address 22B or partial PGB information for a Dequeue command or the state information for the PGBQ 26 from the state machine. In case of the PGQ address, PGBQ_PCB sends the OCC header information to the IOM0 TEB and acquires a token from the IOM0 TEB. The token is then passed to the Memory controller interface along with the request to fetch the PGB. The MCIB is instructed to send the data directly to the TEB of IOM0. The PGB is then released to the Packet Buffer FreeList once the Read request is posted to the MCU (indicated by the mci_clear signal). In case of StatusRead response, the PGBQ_PCB looks at the destination field of the Packet to be formed and selects the TEB accordingly. The PGBQ_PCB then sends the information to the TEB indicating that the data should be picked up from the IP immediately the TEB does not issue a token for such request. The interface of the Packet Composer is shown in Table 7: PGBQ_PCB Interface.
Referring now generally to the Figures and particularly to
There are multiple instances of AtoEBlock 74 considering longer processing time and the amount of traffic throughput required (All egress packets and packets for temporary storage)
The is single EtoABlock 76 is sufficient as the traffic required is much less (Temporary storage packets)
There is single instance of Packet Buffer Read Block 78 since the processing requirement is modest.
The AtoEBlock 74 supports two commands AtoE conversion and AtoLin conversion. The data from CPM memory 16 that was stored in “A” formatted packet is moved to off chip memory. The storage format for off chip can be E1/E2 or as linear storage.
Referring now generally to the Figures and particularly to
Referring now generally to the Figures and particularly to
The sub blocks of the A parser may include one or more of the following:
>Response Decoder Module 82
A response decoder module 82 decodes the destination address and sends the link buffer data to memory. The header buffer is decoded. The valid links are queued in sequence in the Link Sequencer, The SoD and EoD fields are sent to header module and the payload is copied to the memory module. An indication of completion of the transfer is sent to the Link Sequencer.
>Link Sequencer 84
A link sequencer module 84 receives the buffer links in the input queue. Each entry is marked as Header entry of NBL entry. These entries are forwarded to a Request composer as and when the space is available in the memory module. The entry is then moved to pending queue. When the response is available for the request is available and payload is written, the response decoder module informs link sequence about the same. The entry for which data is written is then marked as available. The payload for a selected buffer 22 may then be scheduled for sending to a byte lane serializer 86. If the selected Payload is from header buffer, a header module is indicated to put start and end address, for link blocks the start address is driven as “0” and the end address is computed from remaining payload and size of data buffer. A dispatcher advances on detecting last signal from the memory module. When a last byte of the packet is sent through the memory module, an eop signal is generated for clearing the pending information.
>Header Module
The header module stores the sod and the eod fields of packets. These fields are driven as start and end address when instructed by the Link Sequencer 84.
>Byte Lane Serializer 86
The byte lane serializer module takes data from Memory module. It takes 32 bit data along with sod and eod position if valid within the dowrd. It packs the data in 32 bit dwords and maintains pending data. Only restriction is if sod and eod are valid within a word the eod should not be less than sod. The packing process is reset on detection of eop signal. The eop signal may have once cycle latency. New start should not be issued for one cycle after sending eop signal.
Memory Module 88
A memory module encapsulates memory and provides a view of buffers. It recognizes two kinds of buffers viz. Two “H” Buffers of 96 Bytes each and three “D” buffers of 128 bytes each. The H buffers store the payload data of the Headers while the D buffers store the Link Buffer data.
The Write Interface consists of wr_data, wr_data_v, wr_buf, wr_last. Internally the module contains 7 bit counter indicating offset in the buffer where data is written. This offset is reset to 0 with initial reset and every time wr_last is detected. The data writing takes place every time wr_data_v is detected wr_data is written at current offset in wr_buf.
The Read Interface consists of rd_buf, rd_begin, rd_end, rd_v, rd_last, rd_data, rd_data_valid, rd_sod, rd_sod_v, rd_eod, rd_eod_v. The read is triggered by setting rd_buf to point to relevant buffer, rd_begin indicates offset within the buffer, rd_end indicates last address of the buffer, the rd_v is asserted to indicate that all above parameters are valid and read shall start. The block starts providing rd_data with rd_data_valid. The rd_sod indicates the byte position of the valid data in the word the rd_sod_v indicates the rd_sod is valid for this byte. The lower two address bits of rd_begin when internal offset matches the rd_begin indicate rd_sod. The rd_eod and rd_eod_v are similarly generated comparing internal offset and rd_end. The rd_last signal is generated by this module to indicate that it has completed the read operation.
Referring now generally to the Figures, and particularly to
>Packet Buffer Sequencer 90
This module is started by the top-level transaction decoder by providing number of packet buffers and type of packet “E1” or “E2”. The module then fetches adequate number of packet buffer pointers to constructs the header of E1 or E2. The header information, including the NBLs is sent to memory controller interface for writing at header block address. The payload extractor is activated during payload area and the Data block areas. The memory address for the payload area is also provided by the packet buffer sequencer. It also supports bypass mode where E formatting is not done but a continuous address is maintained and provided to a write logic.
>Payload Extractor 92
This module interfaces with the “A” packet parser. It pulls 32 bit data from the packet payload area on instruction from Link Sequencer and forwards it to the write logic.
>Write Logic 94
The write logic module receives the 32 bit data from link sequencer or the payload extractor and associated address from the link sequencer. It assembles this information as single write request to the memory interconnect block and dispatches it.
Referring now generally to the Figures, and particularly to
Regarding the DMA request broker 80, the pointer to the first header buffer or the linear area address is passed to the E packet parser 96. This communication also indicates that this first header is the start of the packet. In case of a LintoA command the DMA request broker 80 also sends the count information. The E packet parser 96 may include one or more of the following elements:
>Link Sequencer 98
A link sequencer block of the E parser 96 receives the first header pointer from the DMA Request broker 80. The packet buffer along with the type is forwarded to the Read logic. The Read size is kept at 128 Bytes for first header buffer. The Read Logic extracts the header information and sends it back to the Link Sequencer 98 for queuing the next links. The Link Sequencer 98 also sends the actual PDL value to the A Composer.
>Read Logic 100
A read logic block 100 of the E parser may a request to the MCI based on packet buffer address and size provided by the Link Sequencer. The received data is then split into header data, which is sent to Link Sequencer, and payload data, which is sent to a payload extractor.
>Payload Buffer 102
The payload is buffered in a payload buffer 102 for an A frame composition. The payload is delivered to an A frame composer 104 on demand. It also does flow control by using stall when TX is busy sending other information.
Referring now generally to the Figures and particularly to
>Link Sequencer 106
A composer link sequencer module 106 that is invoked with PDL and on-chip buffer count as parameters. The relevant block then obtains the on-chip buffer pointers from the cache. The first header block is filled with information about NBLs. The NBLs are then queued in another queue for direct filling. The data for payload area is extracted from a Payload Buffer 108.
>OCC Write Logic 110
An A composer OCC write logic 112 picks up type of packet information. The link pointers and the other header information are picked up from the link sequencer 106 while the payload is received from the Payload Buffer 108. The composed packet is then forwarded to the TEB of TX0 or TX1 depending on the DA field.
An A composer packet buffer read block transfers the read request from IOM to the MCI for UnCached accesses and frees the block to the packet buffer FreeList. The detailed steps are as follows:
A plurality of Packet Buffers and PGB FreeLists 23 are maintained in off chip area. The FreeList(s) 23 provide packet buffers for both the requirements. Each DCM 14v provides 8 segments (16 segments for 512 Mbit parts) of Maximum of 8 MByte memory. Each segment holds maximum of 64K packet buffers. A 16 Bit handle, along with a 3 bit (4 bits for 512 Mbit parts) segment information, uniquely identifies a packet buffer. The size of a segment must be in multiple of 64 packet buffers i.e. 8 Kbytes. The handle 0×FFFF is considered as an invalid handle. This means that in case of 8 MByte mode one of the buffers cannot be used for storing data.
Referring now generally to the Figures and particularly to
Top Level Processing may comprise one or more of the following processes and steps: Alloc Operation:
Free Operation:
Information maintained per FreeList segment may include the following:
Initialization of the FreeList may be done in software. During initialization, Base Address may be set to an address of a first packet buffer in segment. The Free Bucket chain may be established in the packet buffer area. The Next Free Index may be initialized to first a Bucket of the Free Bucket of the chain. A Curr Ptr is initialized to 0×80, whereby an indicated cache is listed as empty.
When a Next Free Index equals 0×FFFF and the Curr Ptr equals 0×80, a corresponding segment is listed as completely utilized and the FreeList is empty. This condition is detected at the top level and Alloc requests are not forwarded to the segment.
The Operations are described in pseudo-code as follows:
Alloc Operation:
Referring now generally to the Figures and particularly to
The following description is common both the MCIBs 114. The MCIB ports 118 for SDU blocks 116 are of two types, viz. Request Ports 120 and Response Ports 122. Each block connecting to an SDU request port 120 must have a CMD connection and at least one of the WR_CMD or RD_CMD connections. The read responses are sent to the Response Ports 122. Each response port 122 has unique number and the read command indicates the response port where read data should be sent. The tag sent with the read command is returned to response port while sending data. The read data is sent in sequence on the Response Port 122. The commands send two acknowledgements mci_done and mci_clear. The mci_done is sent when the command is taken for processing and the block is allowed to post new command. The mci_clear is sent when the command is dispatched to the MCU. The mci_clear is used by the blocks that read the packet buffer and free the buffers.
A request arbiter block arbitrates MCU requests from different SDU blocks 116. The read request is given higher priority over write request. Within read and write requests the arbitration is round robin. The rationale behind giving higher priority is read is based on the following assumptions:
Read Request Processing
When RdGnt is available from both the channels, the read request is taken from the requestor. An entry of Lower 2 bits of address, Size of request, Destination address and Destination tag is made in the Response Collector. The response collector returns the Refid for the stored entry. This Refid is then forwarded to both the channels along with read requests for the channels. The read request for an individual channel is composed as follows:
Channel 0 Address=ReqAddr[22:1]+ReqAddr[0]
Channel 0 Size=ReqSize[7:1]+(ReqSize[0]& !ReqAddr[0])
Channel 1 Address=ReqAddr[22:1]
Channel 1 Size=ReqSize[7:1]+(!ReqSize[0]& ReqAddr[0])
Write Request Processing
The Write Request to this block are all 16 byte requests. The arbitration logic checks for WrGnt from the channel indicated by the ReqAddr[0]. If the WrGnt is available, the write is handed over to the Channel.
A response collector block plays main role in the read request handling. It supports maximum 32 outstanding requests. It maintains three resources to consolidate responses of the requests.
Refid FreeList
A 32-bit register acts as a FreeList of Refids. On allocation request to FreeList position a first 0 is returned as Refid and the bit is set. The free request clears the bit at position indicated by the Refid.
Req Info Array
A req info array provides a 32-entry array that is maintained in a register file. The array may be indexed with Refid. Each entry in the array contains the following fields;
Finish List
A finish list comprises a set of 64 flops arranged in two rows of 32 flops. Each flop indicates a response status completion. The two rows indicate two channels. The bits are set on receipt of completion status from the channel response blocks. A separate logic finds completed requests and dispatches them to request dispatcher based on destination field stored in Req Info Array. All these bits are cleared during initialization and on transfer of response data to SDU blocks.
>Algorithms in Response Collector Block
The Request block generates an allocation request to this block when read request is detected. An entry is allocated in the Refid FreeList. The information regarding the read request is then stored in the Req Info Array. When all the data for a given request is collected the MCU Response Blocks indicate a completion to this block. The Finish list bits are set based on this information. The finish list is scanned continuously to find next completion. If completion is detected on both the channels for requests, the entry in the Req Info Array is pulled out. The destination field in the information is decoded. Rest of the entry along with Refid is passed to Request Dispatcher. The dispatcher sends the completion indication to the Response collector. The entries in the FreeList and finish list are updated to indicate completion upon arrival of this signal. The write request is sent directly to the MCU. The read request is split in multiple read requests of 16 byte each. A 4-bit entry is sent to a MCU Response block. The entry has one bit cleared per 16-byte request sent to the MCU, rest of the bits are kept set. The requests are then sent to the MCU with Refid. WrGnt to request is removed when MCU is not able to accept any more write requests. The RdGnt is removed during read processing as well as when the MCU Response Block is not able to allocate space for the Response data.
The MCU response block may maintain one or more of the following data structures.
The Response storage block may receive requests from either the MCU or the response dispatcher. The one from MCU is taken at higher priority. These requests write data in the storage area. The requests from Response dispatcher are read requests. These are served in round robin fashion.
All the Response storage blocks are initially placed in the FreeList. They are allocated when MCU request block forwards the bit mask with request. The entries are deposited in this list when any of the Response dispatchers read out a block from the storage.
The Response Completion Array is accessed by Refid as an index. When an MCU request Block 124 sends a bit mask the associated entry is written. The indices of Response Data Storage are taken from the FreeList. Each time a response is received corresponding bit of the Bit mask is set. When the Bit Mask becomes 4′b1111 an indication is generated to the Request Composer block. The indication is held for one clock cycle. When a request with all bits set to 4′b1111 is received an immediate indication is sent to the Request Composer block and no further allocation is done.
There is one response dispatcher block per destination. A response composer, after detecting completion on both the channels send request to the appropriate dispatcher. The Refid, Address Bits, Size and tag information are passed as the request. The request dispatcher then makes requests to the MCU response blocks for reading the data. The requests are made in appropriate sequence based on values of size and the address bits. The information may be sent to an SDU destination under flow control. The Tag information is provided to the destination.
Each SDU Block 116 that has to communicate with other modules over the ring bus does so using a transaction encoder block. There is one transaction encoder block per TX link. When an SDU block intends to generate OCC on the RIB the SDU block makes a request to appropriate TEB. The OCC Request consists of the following fields:
A request arbitration logic looks at the requesting ports in round robin manner. If a request is valid, the following steps are taken
A response arbitration state machine looks at the request arbitration port and the memory ports for valid request. If a request arbitration port makes a request, the OCC header information is sent to the TX FIFO. If the Size of the Payload is non-zero then the payload from the indicated block data port transferred to TX FIFO. This completes the transactions where OCC payload is zero or all the data to be sent is from on chip block. If the Request is detected from the Memory Port, the tag associated with the port is used for accessing the Posted Req Array. The OCC header information from the array is sent to the TX FIFO. The “Size of Data from block” field and “Port of Request” fields are used for accessing the preamble data to be sent from the SDU block. The remainder of the payload is extracted from the memory port that has provided the tag. This mechanism allows cases where SDU block needs to send 0 or more data words from the SDU block and rest of data from the RLDRAM.
Referring now generally to the Figures and particularly to
Conflict Determination for Hash Map Machine 132
Requests for same hash map table are considered conflicting. The hash map state machine 132 stores the table index of operation under progress for each state machine. This index is provided along with request by the CPM.
Conflict Determination for the Data Operation Machine 134
Operation in each state machine 130 is indicated by a pair of registers viz. the address register indicating the base address of operation and the mask register. The mask register is generated from size parameter passed along with the request. The mask is set to exact size for naturally aligned accesses while as it spans two regions of given size in case of non-aligned accesses. Each incoming address is masked and compared with relevant bits of the base address. The matching cases are the conflicting requests, which are stalled.
The hash map operations are explained below as pseudo-code. The code assumes the following supporting blocks:
HashMapGet
Load Hash Map Context;
A data operation block performs three kinds of operation viz. Memory Read, Memory Write and Parallel Add.
Memory Read Operation
If the request size is greater than or equal to 16 bytes, request is made to TEB with OCC header. The tag returned by the TEB is sent to MCI along with the request and destination or response data is set to the TEB.
If request size is less than 16 Bytes the request is made to MCI and data is picked up from the MCI. The appropriate data (after lane shifting) is sent to TEB along with OCC.
Memory Write Operation
If the request size is greater than or equal to 16 bytes, request is broken into 16 Byte requests and sent to the MCI.
If request size is less than 16 Bytes, a read request is made to MCI and data is picked up from the MCI. The appropriate data is modified. A write request is made to the MCI with modified data.
Parallel Add Operation
A read request is made to MCI to retrieve 64 bytes of data from the specified address. Adder block is triggered to perform parallel add. The parallel add operation completes after “n” number of cycles. The results of addition are written back to same location.
The number of cycles taken by the parallel operation will be determined by the speed of the adders obtained from synthesis. The actual number will be 4, or 8 based on 16 bit or 8 bit adder.
The following table 8 shows the configuration registers for PGBQ implementation. This set is replicated for each of the queues.
Note 1:
“_n” indicate the queue number (0-95)
Configuration registers for FreeList implementation are listed in Table 9.
Note 2:
“_n” indicate the segment number (0-15)
Configuration Registers for Hash Map implementation are noted in Table 10.
Note 3:
“_n” indicate the Hash Map Table number (0-31)
From block level verification perspective, the following sequence may be preferred to develop and verify blocks:
The MCI and the Packet Buffer FreeList Block does not have direct command interface from RIU. There should be thin wrapper developed to interface these blocks to the standard block level verification environment. Other blocks may optionally interface to RIU an can be connected to a module level verification environment with null RIU (RIU that does not implement queues but connect one RIB RX port to these block directly with stall implementation) These blocks interface to the Memory through pre-verified MCI.
A block can be verified using behavior model of the MCU. The cases that can be verified are
A packet buffer FreeList block does not have direct connection from the RIU. It provides services to the PGBQ block and the Packet DMA block. This block shall be verified first. Conditions to check for
A transaction encoder block shall be verified for the following conditions
The PGBQ block should be exercised by providing commands directly to the interface. It should be hooked to previously verified Packet Buffer FreeList block, TEB, and the MCI. The following test cases indicate basic test cases.
The memory operations block can verified with the following test cases
The possible test cases are for a packet DMA block may include the following:
The following memories may be used in various alternate preferred embodiments of the SDU block of the second version 36:
Many features have been listed with particular configurations, options, and embodiments. Any one or more of the features described may be added to or combined with any of the other embodiments or other standard devices to create alternate combinations and embodiments. The features of one of the functions may also be used with other functions. Although the examples given include many specificities, they are intended as illustrative of only one possible embodiment of the invention. Other embodiments and modifications will, no doubt, occur to those skilled in the art. Thus, the examples given should only be interpreted as illustrations of some of the preferred embodiments of the invention, and the full scope of the invention should be determined by the appended claims and their legal equivalents.