This invention relates to the field of computer memory access, with particular reference to command queuing methods and structures for DRAM memory access.
A number of functional units are utilized by a network processor to manage the flow of data. Among these are memory interfaces to provide memory arbitration; state machines to provide functionality of processing command messages; and receive, transmit and dispatch controllers, just to name a few. A memory interface attempts to provide buffer management and data movement at media speed. To eliminate the memory access bandwidth bottleneck, the main function of the memory interface is to provide an efficient memory access scheme while meeting the requirements of sustained data throughput at the required data rate at the memory controller interface to the memory device.
Network traffic management requires hardware implementation for scheduling the delivery of network packets, and for traffic shaping. For this, a computer employs a scheduler which is a computer program designed to perform advanced scheduling algorithms to control functions, such as network packet scheduling, traffic shaping, and initiation and termination of specified tasks. Hardware schedulers contain a plurality of network interface and switch ports, an internal memory for DRAM write access command queues as well as buffers for received packets, an internal memory for DRAM read access command queues and finite state machines for memory management. The system utilizes external SRAM and DRAM memory devices to store control blocks of scheduling elements. It is necessary to be able to quickly and accurately execute searches for programs with complex flow patterns.
A number of features are found in related art devices, but none of these devices embody the combination of features that are found in the present invention. For example, some conventional DRAM access arbiters consider only one access request across several memory banks at a time, thereby leading to low memory bandwidth utilization. Other DRAM access arbiters employ schemes to increase the memory access bandwidth, but access command queues are global queuing structures which contain access requests to all the memory banks of the memory device.
Neither of these schemes imposes any limitation on how many access requests for the same memory bank can be presented in the command queues. The command queues can be populated with access requests to the same memory bank. Command queues are limited resources, such that new access requests for different memory banks cannot be inserted into the command queues while these queues are full. This problem results in low memory bandwidth utilization for a period of time.
In some applications where a “cut-and-paste” processing model is used, the “paste” of a packet header also contributes to the write accesses traffic to the memory. In this case, flow control on the regular write access traffic is required in order to guarantee that the “paste” operation of packet data is given the highest priority to access the memory devices.
One of the objects of the present invention is to solve the problems associated with the above schemes. This is achieved by use of one or more arbiters to maximize the memory bandwidth available for the read and the write operations by avoiding consecutive accesses to the same memory bank and associated dead cycles.
This objective is accomplished by the use of a system and a method for (1) dividing the access requests across several memory banks into access requests per memory bank. Each memory access to a certain memory bank is independent from accesses to other memory banks; thus, memory access bandwidth can be optimized; (2) Providing queuing structure for access requests per memory bank; (3) Preventing accesses to a certain memory bank from occupying the whole command queue by imposing a threshold per memory bank access request queue such that memory access bandwidth will not be degraded; (4) Using bank rotation for writing receive packet to memory based on the write queue status; (5) Providing a best fit for systems where multiple “users” access independently in different memory locations; and (6) Using the status of write queue per memory bank for flow control of the system.
The invention relates to a system and a method of maximizing DRAM memory access utilization wherein the DRAM memory consists of a plurality of memory banks. The method comprises first dividing DRAM accesses into write accesses and read accesses. The read and write access requests are further divided into accesses per memory bank. A configurable threshold limit is imposed on the number of accesses to each memory bank. The writing receive packets are rotated among the banks based on the write queue status. The status of the write queue for each memory bank may also be used for system flow control. The method also typically includes the additional steps of determining access windows based on the status of the command queues, and performing arbitration on each access window.
The invention also relates to an article of manufacture comprising a computer usable medium having a computer readable program embodied in said medium. The program when executed on a computer causes the computer to maximize access to DRAM memory using an arbiter that maximizes the memory bandwidth available for the read and the write operations by avoiding consecutive accesses to the same memory bank and associated dead cycles. The program causes the arbiter to divide DRAM accesses into write accesses and read accesses, to divide the access request into accesses per memory bank, to impose a threshold limit on the number of accesses to each memory bank, and to rotate the write receive packets among the banks, wherein the write packets are rotated among the banks based on the write queue status. The program can cause the arbiter to use the status of the write queue for each memory bank for system flow control. It can cause the arbiter to determine access windows based on the status of the command queues, and perform arbitration on each access window. In one embodiment, the memory banks are embedded in fast cycle random access memory devices.
These, as well as other objects and advantages, will become apparent upon a full reading and understanding of the present invention.
The present invention will now be described with specific reference to the drawings in which
A network processor scheduler is typically provided with sufficient memory access bandwidth to avoid bottlenecks and conflicts caused by different functional entities attempting to simultaneously access the same memory. This is done by using a combination of static random access memory (SRAM) and dynamic random access memory (DRAM) devices. The present invention is specifically concerned with the controlled access to DRAM memory useful for supporting high (e.g. 10 Gbps) data rates. For these high rates, a particularly useful memory device is a fast cycle dynamic random access memory (FCRAM chip), a double data speed fast cycle dynamic random access memory. (DDS FCRAM), or a reduced latency dynamic random access memory (RLDRAM). An FCRAM device is capable of delivering a random cycle time of 20 nanoseconds (ns) which is about 4 times faster than the speed of a conventional DRAM device. Another feature of the FCRAM is that it combines non-multiplexed addressing and complete address decoding and pipelining, thereby enabling both row and column address to be designated simultaneously, whereas with a conventional DRAM, there is a time lapse between these two activities.
In one embodiment, there are two FCRAM memory devices with one being the logical high part and the other the logical low part, with the two parts being generally transparent to the system software. Each FCRAM memory device contains four internal banks. A bank can be organized as a 4M×16-bits or 8M×8 bits format. The read and write access to each bank share the same bus. Each data chunk in the memory of the bank comprises 64 bytes, with 32 bytes in the high part and 32 bytes in the low part to achieve a wide bus. Typically, there is no need to read or to write both a high and a low in a given bank at the same time because the address buses of the high and low parts can be implemented independently.
As previously noted, the memory interface provides buffer management and data movement at media speed. To eliminate the memory access bandwidth bottleneck, the main function of the memory interface is to provide an efficient memory access scheme while meeting the requirements of sustained data throughput at the required data rate at the memory controller interface to the FCRAM memory device. This requires that a memory arbiter collect read requests from transmit FIFOs and write requests from receive FIFOs, and schedule efficient accesses to the memories. Because of hardware limitations, the access to FCRAM devices has the following timing constraints:
1. Consecutive accesses (read or write) to the same memory bank shall be spaced out by three dead memory cycles.
2. A write access following a read access shall be spaced out by two dead memory cycles to allow data bus turnaround. If the write operation is accessing the same memory bank as the previous read operation, one additional dead memory cycle is needed for a total of three dead cycles.
In order to provide a better understanding of the invention, reference is made to the drawings, and particularly with respect to
According to one feature of the present invention, a decision flow chart 200 is shown in
Returning to the queue status inquiry at 204, if the answer is ‘no’, thereby indicating that no queues are empty, the same question is asked at 220 as at 218, namely if any queue is overflowed. A ‘no’ answer then fixes the size of the access window at 18 memory cycles at 222 and the access window 210 is started. If, however, any queue is overflowed in response to the inquiry at 220, the ‘read overflow’ question at 224 is either answered ‘yes’ whereupon the read window is extended at 226 or ‘no’ whereupon the write queue is overflowed and the write window is extended at 228. Either way, the extended read or write window goes to the start access window at 210.
As can be readily seen from
Referring first to
Turning first to
A third example of a data store access analysis is shown in
By its very nature, network traffic tends to be bursty. To accommodate the peak bandwidth requirement, it is desirable to extend either the read or the write access window as shown in
If the number of read requests exceeds the predetermined threshold, while all of the non-empty write request queues are below the threshold, the arbiter will extend the access window by another eight memory cycles to accommodate the increased number of read requests. This is shown in
The net result of the present invention is that the arbiter maximizes the memory bandwidth available for the read and the write operations by avoiding consecutive accesses to the same memory bank and associated dead cycles.
The present invention can be realized in hardware, software, or a combination of the two. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited. A typical combination of hardware and software could be a general purpose computer system that, when loaded and executed, controls the computer system such that it carries out the methods described herein. The present invention can also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which, when loaded in a computer system, is able to carry out these methods.
Computer program instructions or a computer program in the present context mean any expression, in any language, code (i.e., picocode instructions) or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following occur: (a) conversion to another language, code or notation; (b) reproduction in a different material form.
While the invention has been described in combination with specific embodiments thereof, there are many alternatives, modifications, and variations that are likewise deemed to be within the scope thereof. Accordingly, the invention is intended to embrace all such alternatives, modifications and variations as fall within the spirit and scope of the appended claims.
The present application is a continuation of application Ser. No. 10/899,937, filed Jul. 27, 2004, now U.S. Pat. No. 7,277,982.
Number | Name | Date | Kind |
---|---|---|---|
5327570 | Foster et al. | Jul 1994 | A |
5937428 | Jantz | Aug 1999 | A |
6137807 | Rusu et al. | Oct 2000 | A |
6260099 | Gilbertson et al. | Jul 2001 | B1 |
6523036 | Hickman et al. | Feb 2003 | B1 |
6526462 | Elabd | Feb 2003 | B1 |
6532509 | Wolrich et al. | Mar 2003 | B1 |
6539487 | Fields et al. | Mar 2003 | B1 |
6731559 | Kawaguchi et al. | May 2004 | B2 |
6938133 | Johnson et al. | Aug 2005 | B2 |
20030115403 | Bouchard et al. | Jun 2003 | A1 |
Number | Date | Country | |
---|---|---|---|
20070294471 A1 | Dec 2007 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10899937 | Jul 2004 | US |
Child | 11832220 | US |