1. Field of the Invention
The invention generally relates to accessing memory devices and, more particularly, to accessing doubled data rate (DDR) dynamic random access memory (DRAM) devices, such as DDR-II type DRAM devices.
2. Description of the Related Art
The evolution of sub-micron CMOS technology has resulted in an increasing demand for high-speed semiconductor memory devices, such as dynamic random access memory (DRAM) devices, pseudo static random access memory (PSRAM) devices, and the like. Herein, such memory devices are collectively referred to as DRAM devices.
Some types of DRAM devices have a synchronous interface, generally meaning that data is written to and read from the devices in conjunction with a clock pulse. Early synchronous DRAM (SDRAM) devices transferred a single bit of data per clock cycle (e.g., on a rising edge) and are appropriately referred to as single data rate (SDR) SDRAM devices. Later developed double-data rate (DDR) SDRAM devices included input/output (I/O) buffers that transfer a bit of data on both rising and falling edges of the clock signal, thereby doubling the effective data transfer rate. Still other types of SDRAM devices, referred to as DDR-II SDRAM devices, transfer two bits of data on each clock edge, typically by operating the I/O buffers at twice the frequency of the clock signal, again doubling the data transfer rate (to 4× the SDR data transfer rate).
Unfortunately, as memory speeds increase, operating the I/O buffers and processing the data at twice the clock frequency presents a number of challenges. For example, modern SDRAM devices support a number of different data transition modes (e.g., interleaved or sequential burst modes) that require data to be reordered before it is written to or after it is read from the memory array. Further, for various reasons (e.g., geometry, yield, and speed optimizations) these devices often have physical memory topologies employing “scrambling” techniques where logically adjacent addresses and/or data are not physically adjacent. This data reordering and scrambling affects when and how data is passed between data pads and a memory array and typically requires complex switching logic.
Because of this complexity, conventional data path switching logic is typically designed by synthesis, which generally refers to the process of converting a design from a high-level design language (e.g., VHDL) into actual gates. Unfortunately, synthesis design has shortcomings. As an example, it typically puts all the combination logic together resulting in more gate delay and larger mask area, which hurts both performance and density. Furthermore, timing glitches and unnecessary switching operations in these designs often degrade speed performance and increase power consumption. These timing issues become more problematic as clock frequencies increase. In addition, the typically unstructured nature of logic designed by syntheses does not promote reuse, for example, across device family members with different organizations (e.g., x4, x8, and x16) or within a single device that supports different organizations.
Accordingly, what is needed is a flexible data path logic design capable of supporting switching operations required to transfer data between memory arrays and external data pads.
Embodiments of the present invention generally provide methods and devices for efficient transfer of data between data pads and memory arrays.
One embodiment provides a memory device generally including one or more memory arrays, a plurality of data pads, an input/output (I/O) buffers stage, and reordering logic. The I/O buffer stage has pad logic for receiving bits of data to be written to the memory arrays and outputting bits of data sequentially on the plurality of pads, wherein N bits of data are received or transferred in a single cycle of an external clock signal. The reordering logic is driven by a core clock signal having a lower frequency than the external clock signal and configured to reorder the N bits of data received on each data pad based at least in part on a burst transfer type prior to writing the N bits to the one or more memory arrays or prior to outputting the N bits sequentially on the plurality of pads.
Another embodiment provides a memory device generally including one or more memory arrays, a plurality of data pads, and a pipelined data path. The pipelined data path is configured for transferring data between the one or more memory arrays and the plurality of pads comprising an input/output (I/O) buffer stage with pad logic for buffering bits of data exchanged sequentially between the data pads and an external device in conjunction with a data clock signal and reordering logic for reordering bits of data received by or to be output by the pad logic in conjunction with a core clock signal having a lower frequency than the data clock signal.
Another embodiment provides a memory device capable of transferring multiple bits on each of a plurality of data pads in a single external clock signal generally including one or more memory arrays and reordering logic. The reordering logic is driven by a core clock signal having a frequency less than the external clock signal and configured to reorder bits of data received sequentially on the data pads to be written to the memory arrays and to reorder bits of data read from the memory arrays to be output sequentially on the data pads.
Another embodiment provides a method of exchanging data with a memory device. The method generally includes receiving N bits of data on each of a plurality of data pads within a single cycle of an external clock signal and reordering the N bits of data in conjunction with an internal core clock signal having a lower frequency than the external clock signal.
Another embodiment provides a method of exchanging data between data pads and one or more memory arrays. The method generally includes, during a write operation, generating, from an external clock signal, a core clock signal having a lower frequency than the external clock signal, sequentially receiving multiple bits of data to be written to the memory arrays on the data pads in a single cycle of the external clock signal, and reordering, in conjunction with the core clock signal, the sequentially received bits of data prior to being written to the memory arrays or prior to being output on the data pads.
So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
FIGS. 13A-D illustrate switch settings of the single stage shown in
Embodiments of the invention generally provide techniques and circuitry that support switching operations required to transfer data between memory arrays/banks and external data pads. In a write path, such switching operations may include latching in and assembling a number of bits sequentially received over a single data pad, reordering those bits based on a particular type of access mode (e.g., interleaved or sequential, even/odd), and performing scrambling operations based on chip organization (e.g., x4, x8, or x16) a bank location being accessed. Similar operations may be performed (in reverse order) in a read path, to prepare and assemble data to be read out of a device.
By distributing these switching operations among different logic blocks in the data path, only a portion of the operations (e.g., latching in the data) can be performed at the data clock frequency, while the remaining operations (e.g., ordering and scrambling) may be performed at a lower frequency (e.g., ½ the external clock frequency). In addition, by dividing these switching operations, the operations may be performed in parallel (e.g., in a pipelined manner), rather than placing all the complex decoding at one complex block in a serial fashion. As a result, this distributed logic approach may help reduce the speed bottleneck at the data path level and improve (DDR-II SDRAM) device performance.
As illustrated, the device 100 may include control logic 130 to receive a set of control signals 132 to access (e.g., read, write, or refresh) data stored in the arrays 110 at locations specified by a set of address signals 126. The address signals 126 may be latched in response to signals 132 and converted into row address signals (RA) 122 and column address signals (CA) 124 used to access individual cells in the arrays 110 by addressing logic 120.
Data presented as data signals (DQ0-DQ15) 142 read from and written to the arrays 110 may be transferred between external data pads and the arrays 110 via I/O buffering logic 135. As previously described, this transfer of data may require a number of switching operations, including assembling a number of sequentially received bits, reordering those bits based on a type of access mode (e.g., interleaved or sequential, even/odd), and performing scrambling operations based on chip organization (e.g., x4, x8, or x16) and the physical location (e.g., a particular bank or partition within a bank) of the data being accessed. While conventional systems may utilize a single complex logic block to this perform all of these switching operations, embodiments of the present invention may distribute the operations between multiple logic blocks.
For some embodiments, these logic blocks may include simplified pad logic 150, near pad ordering logic 160, and intelligent array switching logic 170. The simplified pad logic 150 and near pad ordering logic 160 may be integrated within the I/O buffering logic 135. As illustrated, for some embodiments, only the simplified pad logic 150 may be operated at the data clock frequency (typically twice the external clock frequency for DDR-II), while the near pad ordering logic 160 and intelligent array switching logic 170 may be operated at a slower memory core frequency (typically ½ the external clock frequency).
In general, during a write operation, the simplified pad logic 150 is responsible only for receiving data bits presented serially on external pads and presenting those data bits in parallel (in the order received) to the near pad ordering logic 160. The near pad ordering logic 160 is responsible for (re)ordering these bits based on the particular access mode and presenting the ordered bits to the intelligent array switching logic 170. The intelligent array switching logic 170 is responsible for performing a 1:1 data scrambling function, writing data on one set of data lines to the arrays into memory bank array through another set of data lines. As will be described in greater detail below, exactly how the data is scrambled may be determined by a specified chip organization (e.g., x4, x8 and x16) and a particular bank partition being accessed. These components operate in a reverse manner along the read path (e.g., when transferring data in a read operation).
The cooperative functions of the simplified pad logic 150, near pad ordering logic 160, and intelligent array switching logic 170 may be described with reference to
As illustrated, the simplified pad logic 150 may include any suitable arrangement of components, such as first in first out (FIFO) latching buffers, configured to receive and assemble a number of data bits presented serially on an external pad. Each external data pad may have its own corresponding stage 152, driven by the data clock. As previously described, in a DDR-II DRAM device, data may be transferred on rising and falling edges of the data clock, such that four bits of data may be latched in each external clock cycle.
Once four bits are latched in (e.g., each external clock cycle) by each stage 151, these bits may be transferred to the near pad ordering logic 160 in parallel, in the order in which they were received, for possible reordering based on the type of access mode. In other words, the simplified pad logic 150 merely has to latch in data signals without having to perform any ordering or scrambling based on address signals, which may reduce the chances of noise glitches as the data signals transition at the (higher) data clock frequency. This approach may also simplify signal routing, as address signals necessary for ordering do not need to be routed to the pad logic.
As illustrated, data may be transferred between the simplified pad logic 150 and the near pad ordering logic 160 via a bus of data lines referred to as spine read/write data (SRWD) lines 151. Assuming a total of 16 external data pads DQ<15:0>, there will be 64 total SRWD lines 151 (e.g., the pad ordering logic performs a 4:1 fetch for each data pad) for a DDR-II device (32 for a DDR-I device and 128 for DDR-III). While the simplified pad logic 150 operates at the higher data clock frequency, because data is transferred only after four bits are received sequentially, the pad ordering logic 160 may be operated at the lower memory core clock (CLKCORE) frequency.
As illustrated, the near pad ordering logic 160 may include, for each corresponding data pad, an arrangement of switches (herein referred to as a matrix) 162 to order the four bits of data it receives on the SRWDL lines 151 according to the access mode of the current operation (sequential or interleave, and Column Address 0 and Column Address 1 for even or odd mode). The ordered bits from each matrix 162 are output onto another set of data lines, illustratively a set of data lines (XRWDL) 161 running in a horizontal or “X” direction. In other words, each matrix 162 may perform a 1:1 data scrambling function between the SRWD lines 151 and XRWD lines 161.
The XRWDL lines 161 are connected to the intelligent array switching logic 170, which scrambles these lines onto another set of data lines, illustratively a set of data lines (YRWDL) 171 running in the vertical or “Y” direction. Depending on the active bank 110 being written to and where it is located, upper or lower buffer stages 112U or 112L connects the active YRWD lines to read/write data lines (RWDL's) connected to the memory arrays 110. As illustrated, each bank may be divided into four partitions, with a particular partition selected by column address CA11 and row address RA13. For example, referring to bank 0 (the upper left bank 1100), CA11=1 selects a partition in the upper half, CA11=0 selects a partition in the lower half, while RA13=1 selects a partition in the left side and RA13=0 selects a partition in the right side. This partitioning allows the arrays to be utilized efficiently, not only for x16 organizations, but also for x4 and x8 organizations.
In any case, the intelligent array switching logic 170 also performs a 1:1 data scrambling function at memory core frequency, writing data from the XRWD lines 161 into memory bank array through array read/write data (RWD) lines, via the YRWDs. As will be described in greater detail below, how the data is scrambled is determined by different chip organization (x4, x8 and x16). The data scrambling may also be determined based on the particular partition within a given bank being accessed (the partition may be identified by row address RA13 and column address CA11) to account for bitline twisting between banks shown in twist regions 114.
During a read access, the data propagates in the opposite direction through the intelligent array switching logic 170, near pad scrambling logic 160, and simplified pad logic 150. In other words, data may be transferred from the memory arrays 110 to the XRWD lines 161, via the intelligent array switching logic 170, to the SRWD lines 151 via the pad scrambling logic 160, and finally out to the data pads in sequence via the simplified pad logic 150. As illustrated, the near pad scrambling logic 160 may include an arrangement of switches (e.g., a matrix) 164 for each corresponding data pad, in order to reorder the data bits. As a result, the simplified pad logic 150 may simply shift the data bits out in the order it was received (at the data clock rate) without performing any complicated logic operations and without long control signal lines routed to the pads.
Operations performed by the by the simplified pad logic 150, near pad ordering logic 160, and intelligent array switching logic 170 during write and read accesses are summarized in
Referring first to a write access, the simplified pad logic 150 receives data bits sequentially on an external pad (at the data clock frequency). After receiving four bits of data, the simplified pad logic presents the four bits of data in parallel to the near pad ordering logic 160 on the SRWD lines 151 in the order received. At step 306, the near pad ordering logic reorders the data bits onto the XRWD lines 161 based on the data pattern mode. At step 308, the intelligent array switching logic 170 performs a data scrambling function, based on chip organization and the particular bank location being accessed relative to the twist region 114, to write data to the memory array (via the YRWD lines 171).
Referring next to
Exemplary circuit configurations for the simplified pad logic 150, near pad ordering logic 160, and intelligent array switching logic 170 that are capable of performing the operations described above will now be described. While described separately, those skilled in the art will recognize that these logic blocks are actually switched in parallel, thus forming an efficient pipelined data path with reduced latency.
As previously described, during a write access, each stage 162 of the near pad ordering logic 160 receives four bits of data from the simplified pad logic 150 and reorders the four bits based on a specified data access mode (i.e., sequential or interleaved burst mode). In a similar manner, during a write access, each stage 164 receives four bits of data from the intelligent array switching logic 170 and reorders it (in the order in which it should be read out).
According to DDR-II operation, data bits are latched valid at both rising and falling edge of clock. Indexes 0, 1, 2, and 3 may be used to indicate the events where data get latched at the first clock rising edge, first clock falling edge, second clock rising edge, and second clock falling edge. As illustrated in
As described above, the data bits are handled sequentially at the pad level in the order received or the order it has to be driven at the output. Therefore, these indexes are needed to identity the data order. For some embodiments, the stages 162 and 164 may be configured to reorder the data in accordance with a standard data pattern mode (e.g., defined by JEDEC STANDARD JESD79-2A), which may specify sequential or interleaved burst type transfer, as well as a starting address (CA1 and CA0) within the burst. The burst type is programmable (e.g., via a mode register), while the start address is specified by a user (e.g., presented with the read/write operation).
Utilizing separate write and read stages 162 and 164, with identical switching structures, may help balance write and read timing. By locating these switching stages in the I/O buffer logic that connects chip center data lines (SRWD) to the data pads (DQs) may contribute to saving in the timing budget by allowing the simplified pad logic 150 to merely shift data bits in and out at the data clock frequency, without having to perform reordering operations.
As previously described, in modern DRAM devices, data scrambling is often employed for various reasons, resulting in logically adjacent addresses or data locations that are not physically adjacent. Such scrambling may allow optimal geometric layout of memory cells (e.g., folding), in an effort to balance bitline and word line lengths. Scrambling may also allow array area to be optimized by sharing contacts and well areas. One type of scrambling, referred to as bitline twisting, may be employed in an effort to reduce capacitive coupling between adjacent bitline pairs.
The intelligent array switching logic 170 may account for various types of scrambling, by intelligently coupling XRWD lines to YRWD lines to perform the necessary scrambling. As illustrated in
Further, the switching logic 170 may comprise an array of single matrices to simplify the design and balance timing paths. For example, as illustrated in
In any case,
For example,
As illustrated in
As illustrated in FIGS. 13A-D, there are four cases for X4 organization. Not only are outer or inner half partitions of the memory bank arrays controlled by RA13, but upper or lower half partitions may also be selected by CA 11. If CA11 is logic “1”, an upper half partition is accessed, while if CA11 is logic “0”, a lower half partition is accessed. In summary, each bank array is divided into four partitions: upper outer, upper inner, lower outer and lower inner. Further, due to twisting of the RWDL lines between adjacent banks (see twisting regions 114 in
Due to the twisting, 32 bits of RWD lines flow through lower half of the left memory bank array and upper half of the right memory bank array, while the other 32 bits of RWDL flow through lower half of right memory bank array and upper half of left memory bank array. In order to properly identify the particular partitions being accessed (either upper or lower half of array section in which bank) CA11 and bank address bitO (BAO) may be logically XOR'd (e.g., utilizing the + symbol to represent XOR, CA11+BAO=“0”if both CA11 and BAO are logic “0” or logic “1”, while CA11 +BAO=“1” if CA11 and BAO are opposite logic values). As a result, in each of the four cases for x4 organization, a one quarter region in each adjacent bank is accessed.
This overlapping switching scheme allows a minimal number of switches, which are turned on/off based on a minimum number of conditions, which may help minimize power consumption and reduce capacitive loading on the XRWD lines. Further, because SW8 would possibly turn on for all organizations, there would not be extra delay penalty for x4 components, which typically share the same mask with the x16 and x8 components. Another beneficial aspect about the illustrated scheme is that one of four RWD lines of the x4 switching scheme is placed between any two active RWD lines of the x8 switching scheme, which may reduce line to line switching coupling effect, further improving switching performance
While embodiments have been described above with specific reference to DDR-II DRAM devices, those skilled in the art will recognize that the same techniques and components may generally be used to advantage in any memory device that clocks data in at a higher clock speed than is required to process that data. Accordingly, embodiments of the present invention may also be used in (DDR-I) DRAM devices transferring two bits of data per clock cycle, as well as any later generation DDR devices (e.g., DDR-III devices transferring four bits of data per clock cycle).
Those skilled in the art will also recognize that, while one embodiment of a DRAM device utilizing separate simplified pad logic, near pad ordering logic, and intelligent array switching logic was described, other embodiments may include various other arrangements of distributed logic to achieve similar functionality. As an example, one embodiment may include separate simplified pad logic (operating at the data clock frequency) and a single logic unit (operating at the lower memory core clock frequency) that handles both the reordering and scrambling functions performed by the separate near pad ordering logic and intelligent array switching logic. Still another embodiment may integrate the reordering with the pad logic (operating both at the data clock frequency) and utilize intelligent switching array logic (operating at the lower memory core clock frequency) to perform the scrambling functions described herein.
Embodiments of the present invention may be utilized to reduce the data path speed stress of DRAM devices with high data clock frequencies. By separating high speed pad logic from switching logic that may perform various other logic functions (e.g., reordering and scrambling logic), the switching logic performing those functions may be allowed to operate at a lower clock frequency (e.g., ½ the external clock frequency or ¼ the data frequency), which may relax associated timing requirements and improve latency due to savings in the transition time of the data from memory arrays to the DQ pads and vice versa. By utilizing optimized switch arrangements, balanced delay times across read and write paths, as well as across different device organizations, may also be achieved.
While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.