This invention relates to random access memories (RAM), and in particular to systems and methods for controlling a RAM having multiple data banks.
Data processing systems such as personal computers, digital video players, and wireless communications devices often include multiple data processing clients which share access to random access memory (RAM). Conventional RAM includes static RAM (SRAM) and dynamic RAM (DRAM). Commonly-used DRAM includes synchronous DRAM (SDRAM) such as double-data-rate SDRAM (DDR-SDRAM). Technological improvements have led to great increases in computing speed and RAM capacity. Furthermore, typical systems include increasing numbers of clients sharing access to RAM. In this context, efficient usage of available RAM bandwidth is becoming increasingly important.
Some RAM units, such as typical SDRAM units, can include a plurality of memory banks. Each memory bank comprises an array of memory locations organized in pages (rows). Each memory location within the RAM is characterized by a row (page) address, and a column address. Reading or writing data at a given location within the RAM typically requires a number of pre-read/write memory operations such as activate and precharge. Activate commands open pages within a bank, while precharge commands close the bank. Such operations can involve a latency overhead of a few to tens of clock cycles per read/write transactions. Typically, only one page at a time can be open within any given bank. Consequently, if one or more clients require consecutive access to different pages within the same bank, a significant number of clock cycles can be wasted as the first page is closed and the second page is opened. The latency overhead associated with pre-read/write commands can substantially constrain the utilization of available RAM bandwidth.
The present invention provides an improved memory controller for a multi-bank random access memory (RAM). In the preferred embodiment, the memory controller includes a transaction slicer for slicing complex client transactions into simple slices, wherein each slice fits within a single page of a memory bank, and a command scheduler for re-ordering preparatory memory commands (such as activate and precharge) in an order different from the client transactions corresponding to the commands. The slicing and out-of-order command scheduling allow a reduction in memory latency. The data transfer to and from clients can be kept in order.
The foregoing aspects and advantages of the present invention will become better understood upon reading the following detailed description and upon reference to the drawings where:
In the following description, a pipestage is understood to be a circuit which includes a finite state machine (FSM). A core is understood to be a circuit including plural interconnected pipestages. A set of elements is understood to contain one or more elements. Any reference to an element is understood to encompass at least one element. Any described connection can be a direct connection or an indirect connection through intermediary structures/logic. A complex memory request or command is understood to mean a memory request or command corresponding to more than one memory page within one bank. A simple memory request or command is understood to mean a memory request or command that corresponds to memory addresses within a single page within a bank. The statement that a first request is derived from the second request is understood to mean either that the first request is equal to the second request, or that the first request is generated by processing the second request and (optionally) other data.
The following description illustrates embodiments of the invention by way of example and not necessarily by way of limitation.
Preferably, each of client arbiter 36, VPAT unit 38, and PMC 40 is preferably a data-driven core capable of communication with other cores according to a ready/request handshake protocol. In the preferred implementation, a token is transferred from a first core to a second core if and only if the first core asserts a ready signal to the second core, and the second core asserts a request signal to the first core on the same clock cycle (synchronously). For further information on the presently preferred ready/request protocol and core architecture see U.S. Pat. No. 6,145,073, herein incorporated by reference. Generally, client arbiter 36, VPAT unit 38 and PMC 40 could be interconnected using other desired protocols/connections.
Client arbiter 36 receives memory access (read/write) requests from plural clients, and arbitrates the requests. Client arbiter 36 may include appropriate buffers which can be allocated to different clients. Client arbiter 36 allows only one request at a time to proceed to VPAT unit 38, and attaches a client identification (ID) label to each request allowed to proceed. The client selection decision made by client arbiter 36 can be made according to a predetermined protocol, for example using a round-robin priority scheme. The client requests received by client arbiter 36 can have one of a plurality of virtual (client, logical) address formats, each corresponding to a mode of operation of memory controller 26. For example, modes such as linear, frame, field, and array can be useful for accessing stored video images. In linear mode, the client request includes a virtual (client) start address, a transaction length, and a transaction type (read/write). In frame mode, the client request can include start X and Y image coordinates, ΔX and ΔY image extents, and a transaction type. The field mode is similar to the frame mode, except that only every other line in the image is accessed. In array mode, the client request includes a virtual start address, a transaction length, a stride, and a period number N. Such a request corresponds to accessing a transaction length following the start address, then skipping a stride minus transaction length to arrive at a new start address, and then repeating the above process N times.
VPAT unit 38 receives one client request at a time from client arbiter 36, breaks each block request (e.g. frame, field or array mode request) into a set of linear requests, and translates any virtual addresses from the client requests into physical addresses to be sent to PMC 40. A physical address can be equal to, for example, the sum of a function of the received virtual address (e.g. f(X,Y), where X and Y are start image coordinates) and a base address for the corresponding client, wherein each client has a different base address. VPAT unit 38 sends the resulting linear requests to PMC 40.
Transaction slicer 52 is connected to command scheduler 54 over a plurality of dedicated parallel slice connections 66. Each slice connection 66 is dedicated to carrying a predetermined transaction slice from slicer 52 to command scheduler 54, as explained in further detail below. Command scheduler 54 is connected to command arbiter 64 through a plurality of dedicated parallel connections 68, each corresponding to one BFSM 60 or RFSM 58. Each connection 68 connects command scheduler 54 to a corresponding BFSM 60 or RFSM 58, and on to command arbiter 64.
Transaction slicer 52 can receive complex memory transactions from VPAT unit 38, and slice each complex transaction into a plurality of tile- or page-optimized slices. Each slice contains a slice address, a transaction (slice) length, and a transaction type. Transaction slicer 52 determines an appropriate optimized slicing of each received complex transaction, and slices the transaction accordingly. Each slice is chosen such that the entire slice fits within a memory page or tile, wherein each page within a memory bank contains an integer (2n, n≧0) number of tiles.
Tiling can be better understood by considering two potential correspondences between logical address and physical memory location for an exemplary digital video image 80 illustrated in
In a tiled memory arrangement, illustrated in
Consider a client memory request for accessing an arbitrary subimage 90 contained within image 80. In the linear, non-tiled configuration shown in
Referring to
Command scheduler 54 can send pre-read/write (Activate and Precharge) commands for a slice before the corresponding pre-read/write commands for a previously received slice, if such command re-ordering allows a reduction in SDRAM latency. Preferably, in order to eliminate the need for extra buffering, command scheduler 54 ensures that all read and write commands sent out are in order, notwithstanding any re-ordering of pre-read/write commands. Requiring in-order read and write commands ensures that the client receives or stores requested data in order. Command scheduler 54 sends the SDRAM control commands to each corresponding BFSM 60. A priority is assigned to each command, in order to arbitrate for SDRAM access and to preserve the Read and Write order of execution. Only the instruction with the highest priority can execute a Read or a Write command, as described below.
Refresh control FSM (RFSM) 58 periodically generates memory refresh commands for refreshing the SDRAM banks. BFSMs 60 receive memory access commands from command scheduler 54, transmit the commands to command arbiter 64, and send SDRAM status information to command scheduler 54. Command arbiter 64 allows only one of FSMs 58, 60 to send commands to the SDRAM at any given time, according to a priority scheme which takes into account the refresh requirements of the SDRAM and the command priorities assigned by the command scheduler to different slice commands. For example, RFSM 58 can be set to have a higher priority than BSFMs 60 if needed, in order to ensure that RFSM 58 is able to transmit sufficient refresh commands to prevent loss of data from the SDRAM.
Slicer 52 (shown in
In slice command sequences 206, slices S0-3 correspond to read request (token) T0, slice S4 corresponds to token T1, slice S5 to token T2, and the last slice S′0 corresponds to token T3. Command sequence 206 illustrates out-of-order precharge and activation controlled by command scheduler 54: a precharge (P) command for slice S5 occurs on clock cycle #10, two clock cycles before a precharge command for slice S4; and an activate (A) command for slice S5 occurs on clock cycle #13, two clock cycles before an activate command for slice S4. As shown in
As shown in the data sequence 224 of
Stall cycles such as the one described above can be eliminated by allowing the reordering of the read and write commands, in addition to the reordering of activate and precharge commands.
It will be clear to one skilled in the art that the above embodiments may be altered in many ways without departing from the scope of the invention. For example, while the preceding discussion has used video images to illustrate the present invention, the above-described systems and methods are applicable to a variety of applications other than digital video processing, such as communications or general purpose computing. The above-described functionality can be implemented in many different ways. For example, interface protocols other than the presently preferred rdy/req handshake protocols may be used. Different component modules can be combined into a single module. Suitable multi-bank RAMs include conventional DRAM, SDRAM, DDR-DRAM, and other types of random access memory. Accordingly, the scope of the invention should be determined by the following claims and their legal equivalents.
This application claims the priority date of U.S. Provisional Application No. 60/311,735, filed Aug. 9, 2001, entitled “Superscalar Memory Controller with Out of Order Execution,” which is herein incorporated by reference.
Number | Name | Date | Kind |
---|---|---|---|
5564052 | Nguyen et al. | Oct 1996 | A |
5809563 | Yamada et al. | Sep 1998 | A |
5941983 | Gupta et al. | Aug 1999 | A |
6088772 | Harriman et al. | Jul 2000 | A |
6145073 | Cismas | Nov 2000 | A |
6219747 | Banks et al. | Apr 2001 | B1 |
6263430 | Trimberger et al. | Jul 2001 | B1 |
6297832 | Mizuyabu et al. | Oct 2001 | B1 |
6335950 | Kohn | Jan 2002 | B1 |
6456746 | Freeman | Sep 2002 | B1 |
6470433 | Prouty et al. | Oct 2002 | B1 |
6487640 | Lipasti | Nov 2002 | B1 |
6510497 | Strongin et al. | Jan 2003 | B1 |
6564304 | Van Hook et al. | May 2003 | B1 |
6615326 | Lin | Sep 2003 | B1 |
20010055427 | Freeman | Dec 2001 | A1 |
20020199072 | Fanning | Dec 2002 | A1 |
Number | Date | Country | |
---|---|---|---|
20030033493 A1 | Feb 2003 | US |
Number | Date | Country | |
---|---|---|---|
60311735 | Aug 2001 | US |