The inventive subject matter relates generally to dynamic random access memory (DRAM) and, more particularly, to apparatus to provide high-speed random read access, and to methods related thereto.
High-speed networks increasingly link computer-based nodes throughout the world. Such networks, such as Ethernet networks, may employ switches and routers to route data through them. It is desirable that network switches and routers operate at high speeds and that they also be competitively priced.
High-speed switches and routers may employ data structures, such as lookup tables (also referred to herein as “address tables”), to store and retrieve source addresses and destination addresses of data being moved through a network. The source and destination addresses may relate to data packets being sent from a network source to one or more network destinations. High-speed switches and routers need to perform frequent lookups on address tables. The lookup operations are read-intensive and must generally be performed at very high speeds.
In addition, the addresses may be random in nature, so that they may be mapped to any arbitrary location in memory. Further, relatively large address table sizes are needed for high-capacity switches.
Current high-speed switches and routers store address tables either on-chip or in off-chip memories. The off-chip memories can be static random access memories (“SRAMs”) or dynamic random access memories (“DRAMs”).
SRAMs provide random access at very high speeds. However, SRAMs are relatively higher in cost than DRAMs. SRAM-based memory systems also typically suffer from lower memory density and higher power dissipation than DRAM-based memory systems.
For the reasons stated above, and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the present specification, there is a significant need in the art for apparatus, systems, and methods that provide high-speed random access reads and that are relatively low cost, relatively dense, and relatively power-efficient.
In the following detailed description of embodiments of the inventive subject matter, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration specific preferred embodiments in which the inventive subject matter may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the inventive subject matter, and it is to be understood that other embodiments may be utilized and that structural, mechanical, compositional, electrical, logical, and procedural changes may be made without departing from the spirit and scope of the inventive subject matter. Such embodiments of the inventive subject matter may be referred to, individually and/or collectively, herein by the term “invention” merely for convenience and without intending to voluntarily limit the scope of this application to any single invention or inventive concept if more than one is in fact disclosed. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the inventive subject matter is defined only by the appended claims.
Known SRAM-based switches and routers allow up to 100% read utilization of the input/output interface channels between the on-chip memory controller and the SRAM system. However, known DRAM-based designs cannot achieve 100% read utilization, due to precharge and activation operations needed by banks.
The inventive subject matter provides for one or more methods to enable SRAM-like read access speeds on DRAMs for read-intensive memory applications. Embodiments of the inventive subject matter pertain to DRAM memory that is located on a separate chip from the memory controller.
Embodiments of the inventive subject matter have DRAM advantages with SRAM performance. In embodiments, higher read performance is traded off against lower write access speeds.
The inventive subject matter enables embodiments to achieve 100% utilization of channels during read access. This may reduce the total channel requirement and the total system cost.
Various embodiments of apparatus (including circuits, computer systems, and network systems) and associated methods of accessing memory will now be described.
ASIC 102 comprises a memory read/write controller 104 (also referred to herein simply as a “memory controller”) to control memory read and write operations in DRAMs 111-112. Read/write controller 104 controls one or more I/O (input/output) channels 107-109. A “channel” is defined herein to mean a group of address, control, and data busses coupled between a memory controller and a group of one or more DRAMs being controlled by the memory controller. For example, regarding the embodiment shown in
In addition, first and second off-chip data busses 114 and 116, respectively, are coupled between read/write controller 104 and DRAMs 111-112, respectively, through I/O channel 107. In an embodiment, each data bus 114 and 116 is 24 bits wide. Each data bus 114, 116 may also include additional bits (e.g. 4 bits in an embodiment) for error detection and correction.
In an embodiment, ASIC 102 controls three independent channels 107-109, and each channel 107-109 is coupled to a separate group of two DRAM instances (e.g. DRAMs 111-112). For simplicity of illustration, the groups of DRAM instances that would be coupled to 10 channels 108 and 109 are not shown in
Still with reference to ASIC 102, read/write controller 104 may also be coupled to one or more other circuits 106, such as suitable read/write sequencing logic and address mapping/remapping logic, which may be located either on or off ASIC 102.
“Suitable”, as used herein, means having characteristics that are sufficient to produce the desired result(s). Suitability for the intended purpose can be determined by one of ordinary skill in the art using only routine experimentation.
Different architecture could be employed for the DRAM system 100 in other embodiments. For example, more or fewer than three channels controlling three groups of DRAM pairs could be used. Also, more or fewer than two DRAM instances per group could be used. Also, more or fewer functional units could be implemented on ASIC 102. Also, multiple ASICs, integrated circuits, or other logic elements could be employed in place of or in conjunction with ASIC 102.
In the following description, the term “instance” refers to an architectural or organizational unit of DRAM. In an embodiment, each instance is implemented with a single integrated circuit device or chip. For example, DRAM 111 and DRAM 112 may be referred to herein as Instance #1 and Instance #2, respectively.
In the embodiment illustrated in
Each DRAM bank comprises at least one address bus, whose width depends upon the size of the memory. For example, a one-megabyte memory would typically have a 20-bit address bus.
Each DRAM bank also comprises at least one data bus, whose width depends upon the particular size of words stored therein. For example, if 32 bits are stored per memory location, a 32-bit data bus may be used. Alternatively, an 8-bit data bus could be used if a 4-cycle read/write access is performed.
In an embodiment, more than one instance can share the same address/control bus 110, as shown in
Further, in an embodiment, each instance may comprise its own data bus 114 or 116, as shown in
In an embodiment, DRAM Instance #1 and #2 may each contain several banks with access times of several cycles. For example, a typical DDR (double data rate) DRAM device operating at 250 MHz (megahertz) needs sixteen cycles for a read/write access of a bank.
Known commercially available DRAMs typically operate in accordance with various constraints. For example, each bank has mandatory “overhead” operations that must be performed.
Such mandatory operations typically include bank/row activation (also known as “opening” the row). Before any READ or WRITE commands can be issued to a bank within a DDR DRAM, a row in that bank must be “opened” with an “active” or ACTIVATE command. The address bits registered coincident with the ACTIVATE command may be used to select the bank and row to be accessed.
Following the ACTIVATE command (and possibly one or more intentional NOP's (no operation)), a READ or WRITE command may be issued. The address bits registered coincident with the READ or WRITE command may be used to select the bank and starting column location for a burst access. A subsequent ACTIVATE command to a different row in the same bank can only be issued after the previous active row has been “closed” (precharged). Moreover, there is a mandatory wait period between accessing different banks of the same instance. However, a subsequent ACTIVATE command to a second bank in a second instance can be issued while the first bank in the first instance is being accessed.
The mandatory operations also typically include a “closing” operation, which may include precharging. Precharge may be performed in response to a specific precharge command, or it may be automatically initiated to ensure that precharge is initiated at the earliest valid stage within a burst access. For example, an auto precharge operation may be enabled to provide an automatic self-timed row precharge that is initiated at the end of a burst access. A bank undergoing precharge cannot be accessed until after expiration of a specified wait time.
For known DDR DRAM systems, these mandatory operations, including “opening” and “closing” operations, represent significant overhead on any access, and they reduce the throughput and lower the overall bandwidth. The inventive subject matter provides a solution to the problem of enabling SRAM-like access speeds on DRAMs, as will now be discussed.
The inventive subject matter provides a technique to optimize read accesses in a DDR DRAM system by duplicating the data in several DRAM banks. It will be understood by those of ordinary skill in the art that, due to the data duplications, the write access efficiency will be reduced somewhat. However, because most memory accesses are read operations, overall efficiency is high.
Before discussing the operation of DRAM system 100 (
The data (e.g. address lookup tables) is duplicated in all of the eight banks of the first group of DRAMs (i.e. DRAMs 111-112). In an embodiment, a duplicator agent may be used to duplicate the data in all of the eight banks. One of ordinary skill in the art will be capable of implementing a suitable duplicator agent. The banks of more than one DRAM instance (i.e. Instance #1 or Instance #2) may be written to concurrently, in an embodiment, depending upon the constraints of the particular DRAM devices/system.
As mentioned earlier, a particular command sequence typically controls the operation of DDR DRAM devices. This command sequence may comprise (1) an ACTIVATE or “open bank” command; (2) a “read-write access” command, which may involve read and/or write operations on one or more organization units (e.g. pages) of the DRAM device, and which may consume a significant amount of time; and (3) a “closing” or “precharge” command, which may involve a precharge operation. These commands and operations are mentioned in the description below of the Timing Diagram.
To achieve maximum read access throughput, the individual banks of a group may be opened, accessed, and closed in a sequential manner, as illustrated in the Timing Diagram provided below.
//The first row represents sequential clock cycles within DRAM system 100 (
The following notations are used in the Timing Diagram:
The operation of an embodiment of the DRAM system will now be explained with reference to the above Timing Diagram.
As mentioned earlier, the DRAMs 111 and 112 operating at 250 MHz need sixteen cycles for a read/write access of a bank. This may be seen in the Timing Diagram wherein, for example, sixteen cycles occur between successive ACTIVATE commands to any given bank.
At time slot or cycle 0, the memory controller (e.g. read/write controller 104,
At time slot 5, the memory controller issues a READ command to the first bank of Instance #1, and it undergoes a burst read operation during time slots 6-8.
At time slot 9, an intentional NOP is inserted.
At time slot 10, the first bank of Instance #1 executes an AUTO PRECHARGE command, and it undergoes a closing operation during time slots 11-14.
At time slot 15, an intentional NOP is inserted. The purpose of this intentional NOP is to properly align the timing of commands, so that two commands do not conflict with one another on the shared address/control bus.
At time slot 16 the memory controller issues an ACTIVATE command to the first bank of Instance #1, and it undergoes an ACTIVATE operation during time slots 17-20. At the conclusion of time slot 20, a closing (e.g. precharging) operation will have been completed on the first bank of Instance #1, and it will be ready for another read access in time slot 21. The operation of the first bank of Instance #1 continues in a similar fashion.
The operation of the second, third, and fourth banks of Instance #1, and of the first through fourth banks of Instance #2 may similarly be understood from the Timing Diagram.
It will be observed from the Timing Diagram that during any given time slot, overlapping read accesses may occur. For example, during time slots 7-8, read access operations are occurring concurrently for the first bank of Instance #1 and the first bank of Instance #2. During time slots 9-10, read access operations are occurring concurrently for the second bank of Instance #1 and the first bank of Instance #2. During time slots 11-12, read accesses are occurring concurrently for the second bank of Instance #1 and the second bank of Instance #2.
A read request from the memory controller over IO channel 107 can be serviced by any bank in the group of DRAMs 111-112. Any read access issued by the memory controller over IO channel 107 will have at least one bank to read from. The redundant data in all of the banks in the group of DRAMs 111-112 allows real random access for read operations. Moreover, the access time becomes fixed irrespective of the overhead states (“opening” or “closing”) of any bank. This arrangement ensures having at least one bank in a group available for read at any time.
A side effect of this arrangement is lower write efficiency, as a write operation needs to be performed on all of the banks of a group before such write operation is declared to be complete. In an embodiment of the inventive subject matter, memory reads typically consume approximately 90% of the time, and memory writes consume approximately 10% of the time. A write operation may be required, for example, when data (e.g. address lookup tables) are updated, e.g. when a new address is learned or when one or more addresses are “aged out” by a suitable aging mechanism.
Duplication of the data across multiple DDR DRAM banks reduces the memory density. However, because DRAM density is typically more than four times that of SRAM, the overall cost is lower. In this arrangement, the duplication factor is dependent upon various factors, including the nature of a single bank and the device bit configuration.
In general, for bursty access DRAM banks normally consume a fewer number of cycles on the address/control bus than on their associated data bus. This means that a fewer number of commands on the address/control bus are needed to generate a relatively greater number of data cycles. For example, in an embodiment, a DDR DRAM needs two command cycles on the address/control bus to generate four data cycles. The inventive subject matter makes use of this fact to increase the memory density. The unused two cycles on the address/control bus are used to command a second device, which has a separate data bus. This reduces the pin count on each channel. It is desirable for the address/control bus and the data busses to be utilized 100% of the time and not to be idle at any time. In combining these techniques, the inventive subject matter provides SRAM-like read performance. The read sequence for an embodiment, as illustrated in the Timing Diagram, ensures that after an initial setup period of a few cycles, the data busses of each channel are always occupied.
In an embodiment represented by the above Timing Diagram, the overall DRAM system 100 operates at 375 MHz. The read operation of each instance is 62.5 MHz, and each channel 107-109 operates at 125 MHz, for a total of 375 MHz for a 3-channel system.
The address/control bus 110 is shared in common between two instances, and since four-word burst READ commands are issued to each bank and to each Instance #1 and #2, READ commands to both the instances can be interleaved to always keep 100% read utilization on the data busses 114, 116.
Thus, the inventive subject matter duplicates data (e.g. address lookup tables) across multiple banks of DRAM within any one group, to maximize the read access bandwidth to the data. A read access efficiency equivalent to that of commercially available SRAM devices may be achieved at a relatively lower cost. In addition, the number of banks can be expanded because of the relatively higher density of DRAM compared with SRAM.
Computer system 200 can be of any type, including an end-user or client computer; a network node such as a switch, router, hub, concentrator, gateway, portal, and the like; a server; and other kind of computer used for any purpose. The term “data transporter”, as used herein, means any apparatus used to move data and includes equipment of the types mentioned in the foregoing sentence.
Computer system 200 comprises, for example, at least one processor 202 that can be of any suitable type. As used herein, “processor” means any type of computational circuit, such as but not limited to a microprocessor, a microcontroller, a complex instruction set computing (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, a graphics processor, a digital signal processor, or any other type of processor or processing circuit.
Computer system 200 further comprises, for example, suitable user interface equipment such as a display 204, a keyboard 206, a pointing device (not illustrated), voice-recognition device (not illustrated), and/or any other appropriate user interface equipment that permits a system user to input information into and receive information from computer system 200.
Computer system 200 further comprises memory 208 that can be implemented in one or more forms, such as a main memory implemented as a random access memory (RAM), read only memory (ROM), one or more hard drives, and/or one or more drives that handle removable media such as compact disks (CDs), digital video disks (DVD), floppy diskettes, magnetic tape cartridges, and other types of data storage.
Computer system 200 further comprises a network interface element 212 to couple computer system 200 to network bus 216 via network interface bus 214. Network bus 216 provides communications links among the various nodes 301-306 and/or other components of a network 300 (refer to
Computer system 200 can also include other hardware elements 210, depending upon the operational requirements of computer system 200. Hardware elements 210 could include any type of hardware, such as modems, printers, loudspeakers, scanners, plotters, and so forth.
Computer system 200 further comprises a plurality of types of software programs, such as operating system (O/S) software, middleware, application software, and any other types of software as required to perform the operational requirements of computer system 200. Computer system 200 further comprises data structures 230. Data structures 230 may be stored in memory 208. Data structures 230 may be stored in DRAMs, such as DRAM 111 and DRAM 112 (refer to
Exemplary data structures, which may contain extensive address lookup tables used by high-speed switches and routers or other types of data transporters, were previously discussed in detail above regarding
In this example, computer network 300 comprises a plurality of nodes 301-306. Nodes 301-306 are illustrated as being coupled to form a network. The particular manner in which nodes 301-306 are coupled is not important, and they can be coupled in any desired physical or logical configuration and through any desired type of wireline or wireless interfaces.
Network 300 may be a public or private network. Network 300 may be relatively small in size, such as a two-computer network within a home, vehicle, or enterprise. As used herein, an “enterprise” means any entity organized for any purpose, such as, without limitation, a business, educational, government, military, entertainment, or religious purpose. In an embodiment, network 300 comprises an Ethernet network.
Nodes 301-306 may comprise computers of any type, including end-user or client computers; network nodes such as switches, routers, hubs, concentrators, gateways, portals, and the like; servers; and other kinds of computers and data transporters used for any purpose.
In one embodiment, nodes 301-306 can be similar or identical to computer system 200 illustrated in
Referring first to
In 402, a memory address is provided for a first portion of data. The memory address may be anywhere within the address space of one of a plurality of memory banks. In an embodiment, a group of memory banks (e.g. four) are provided for each DRAM instance (e.g. Instance #1 and Instance #2,
First and second groups of memory banks, one group per DRAM instance, may be coupled to a common address bus, e.g. address/control bus 110 in
In an embodiment, the data may comprise source and destination addresses within a lookup table maintained by a high-speed switch or router in an Ethernet network. However, in other embodiments, the data may comprise any other type of data, and any type of data transporter may be used.
The data is identical within each memory bank of the plurality of memory banks. As mentioned earlier, a suitable duplicator agent may be used to write identical data in each of the memory banks.
In an embodiment, each group of memory banks forms part of a double data rate dynamic random access memory (DDR DRAM). The memory bank of a DDR DRAM requires at least one mandatory overhead cycle to operate. The mandatory overhead cycle typically comprises an activation operation and/or a precharging or closing operation, as described previously herein.
Referring now to
In 406, the first read access request is serviced by any of the plurality of memory banks, e.g. a first memory bank of a first group.
In 408, a second read access request for a second portion of data may be sent over the address bus, again when the address bus is not being used to convey address information. The second read access request is serviced at least partially concurrently with the servicing of the first read access request.
The second read access request may be serviced by any of the plurality of memory banks in a second group. For example, the second read access request may be serviced by a first memory bank of a second group.
In 410, data is conveyed from the first and second read accesses concurrently on the first and second data busses.
In 412, a third read access request for a third portion of data is sent over the address bus, again when the address bus is not being used to convey address information. The third read access request is serviced at least partially concurrently with the servicing of the second read access request. The third read access request is serviced by any of the plurality of memory banks in the first group except the memory bank that serviced the first read access request, if that memory bank is still active in servicing the first read access request or if it is currently inaccessible due to mandatory overhead operations. For example, the third read access request may be serviced by a second memory bank of the first group.
In 414, data is conveyed from the second and third read accesses concurrently on the first and second data busses.
In 416, the methods end.
It should be noted that the methods described herein do not have to be executed in the order described or in any particular order. Moreover, various activities described with respect to the methods identified herein can be executed in serial or parallel fashion. In addition, although an “end” block is shown, it will be understood that the methods may be performed continuously.
The methods described herein may be implemented in hardware, software, or a combination of hardware and software.
Upon reading and comprehending the content of this disclosure, one of ordinary skill in the art will understand the manner in which one or more software programs may be accessed from a computer-readable medium in a computer-based system to execute the methods described herein. One of ordinary skill in the art will further understand the various programming languages that may be employed to create one or more software programs designed to implement and perform the methods disclosed herein. The programs may be structured in an object-orientated format using an object-oriented language such as Java, Smalltalk, or C++. Alternatively, the programs can be structured in a procedure-orientated format using a procedural language, such as assembly or C. The software components may communicate using any of a number of mechanisms well-known to those skilled in the art, such as application program interfaces or inter-process communication techniques, including remote procedure calls. The teachings of various embodiments are not limited to any particular programming language or environment, including Hypertext Markup Language (HTML) and Extensible Markup Language (XML). Thus, other embodiments may be realized.
For example, the computer system 200 shown in
The inventive subject matter provides for one or more methods to enable SRAM-like read access speeds on DRAMs for read-intensive memory applications. A memory circuit, data transporter, and an electronic system and/or data processing system that incorporates the inventive subject matter can perform read accesses at SRAM-like speed at relatively lower cost and at relatively higher density than comparable SRAM systems, and such apparatus may therefore be more commercially attractive.
As shown herein, the inventive subject matter may be implemented in a number of different embodiments, including a memory circuit, a data transporter, and an electronic system in the form of a data processing system, and various methods of operating a memory. Other embodiments will be readily apparent to those of ordinary skill in the art after reading this disclosure. The components, elements, sizes, characteristics, features, and sequence of operations may all be varied to suit particular system requirements.
For example, different memory architectures, including different DRAM sizes, speeds, and pin-outs, may be utilized. For example, in an embodiment, the data structures are 192 bits wide, so a DDR-DRAM device with a 24-bit data bus may be used with a four-cycle burst read operation, and the device returns 192 bits in four cycles.
As a further embodiment, data need not necessarily be duplicated in each bank. If data accesses are equally distributed among different banks (using a hash function, for instance) the overall method will still work, assuming that requests for different banks are statistically uniformly distributed among banks and properly scheduled.
As an example of one such embodiment, assume that we have a table T that needs to be accessed on read. As explained earlier, we may have eight copies of T distributed on eight different banks. Alternatively, we may distribute them with a hash function H defined as follows:
Assuming that H is an efficient hash function, it will distribute the data across the banks substantially uniformly.
When access is desired to an entry T[i], then B=H(T[i]) is calculated to determine to which bank the read access should be sent to.
We may queue requests to different banks and utilize the same mechanism to perform read accesses on the memory, so that the memory is operated with relatively high efficiency. If accesses come uniformly distributed to all banks, all banks will get similar amounts of requests, and all of the bandwidth of the memory will be properly used.
Although specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that any arrangement that is calculated to achieve the same purpose may be substituted for the specific embodiment shown. This application is intended to cover any adaptations or variations of the inventive subject matter. Therefore, it is manifestly intended that embodiments of the inventive subject matter be limited only by the claims and the equivalents thereof.
It is emphasized that the Abstract is provided to comply with 37 C.F.R. §1.72(b) requiring an Abstract that will allow the reader to ascertain the nature and gist of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims.
In the foregoing Detailed Description, various features are occasionally grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the inventive subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate preferred embodiment.