TWO-BIT PER I/O LINE WRITE DATA BUS FOR DDR1 AND DDR2 OPERATING MODES IN A DRAM

Information

  • Patent Application
  • 20080137462
  • Publication Number
    20080137462
  • Date Filed
    January 25, 2008
    16 years ago
  • Date Published
    June 12, 2008
    16 years ago
Abstract
A data bus circuit for an integrated circuit memory includes a 4-bit bus per I/O pad that is used to connect the memory with an I/O block, but only two bits per I/O are utilized for writing. Four bits per I/O pad are used for reading. At every falling edge of an input data strobe, the last two bits are transmitted over the bus, which eliminates the need for the precise counting of input data strobe pulses. The data bus circuit is compatible with both DDR1 and DDR2 operating modes.
Description
BACKGROUND OF THE INVENTION

The present invention relates to integrated circuits, and more particularly to memories.



FIG. 1 illustrates a prior art DRAM (dynamic random access memory). Memory array 110 has DRAM memory cells arranged in rows and columns. Each cell has a capacitor and an access transistor connected in series. Each memory row corresponds to a wordline WL. To read the memory, the corresponding wordline WL is activated, and the data signals for the corresponding row appear on bitlines BL. The bitline signals are amplified by sense amplifiers (not shown). Y select circuit 130 selects one or more memory columns and couples the corresponding bitlines to a data path leading to a memory output terminal DQ. In a write operation, a reverse data path is provided from terminal DQ to the memory array.


To increase memory bandwidth, multiple data items can be prefetched in parallel from memory array 110 for a serial output on the DQ terminal. For example, in DDR (double date rate) synchronous DRAMS, two data bits are prefetched in parallel for sequential output on the rising and falling edges of a clock signal in a burst read operation (one bit is provided on terminal DQ on the rising edge, the other bit on the falling edge). Likewise, in a burst write operation, two data bits are received serially at the terminal DQ on the rising and falling edges of a clock cycle, and written to array 110 in parallel.


The parallel-to-serial and serial-to-parallel conversion of data within the memory is complicated by the requirement to provide different data ordering schemes in the DDR and some other kinds of memories. The DDR standard defines the following data sequences for the burst read and write operations (see JEDEC Standard JESD79D, JEDEC Solid State Technology Association, January 2004, incorporated herein by reference):









TABLE 1







DDR BURST OPERATIONS












Data Sequence (i.e. Address




Starting CL
Sequence) within the Burst










Burst Length
Address
Interleaved
Sequential





2
A0





0
0-1
0-1



1
1-0
1-0


4
A1 A0



00
0-1-2-3
0-1-2-3



01
1-0-3-2
1-2-3-0



10
2-3-0-1
2-3-0-1



11
3-2-1-0
3-0-1-2


8
A2 A1 A0



000
0-1-2-3-4-5-6-7
0-1-2-3-4-5-6-7



001
1-0-3-2-5-4-7-6
1-2-3-4-5-6-7-0



010
2-3-0-1-6-7-4-5
2-3-4-5-6-7-0-1



011
3-2-1-0-7-6-5-4
3-4-5-6-7-0-1-2



100
4-5-6-7-0-1-2-3
4-5-6-7-0-1-2-3



101
5-4-7-6-1-0-3-2
5-6-7-0-1-2-3-4



110
6-7-4-5-2-3-0-1
6-7-0-1-2-3-4-5



111
7-6-5-4-3-2-1-0
7-0-1-2-3-4-5-6









Here A2, A1, A0 are the three least significant bits (LSB) of a burst operation's “starting address” An . . . A2A1A0 (or A<n:0>). For each burst length (2, 4, or 8), and each starting address, the DDR standard defines a sequential type ordering and an interleaved type ordering. The burst length and type are written to the memory mode register (not shown) before the burst begins. The data are read from, or written to, a block of 2, 4, or 8 memory locations. The block address is defined by the most significant address bits (bits A<n:3> for burst length of 8, bits A<n:2> for burst length of 4, bits A<n:1> for burst length of 2). The least significant address bits and the burst type define the data ordering within the block. For example, for the burst length of 4, the starting address A<n:0>=x . . . x01, and the interleaved type, the data are read or written at a block of four memory locations at addresses x . . . x00 through x . . . x11 in the order 1-0-3-2 (Table 1), i.e. the first data item is written to address x . . . x01, the second data item to address x . . . x00, the third data item to address x . . . x11, and the fourth data item to address x . . . x10 (the data ordering is the order of the address LSB's).



FIG. 1 illustrates a write data path for a DDR memory with a two bit prefetch as described in U.S. Pat. No. 6,621,747 issued Sep. 16, 2003 to Faue. Serial to parallel converter 132 performs a serial to parallel conversion on each pair of serial data bits received in a clock cycle on terminal DQ. Converter 132 drives a line IR with the first of the two bits (the bit received on the rising edge of the clock cycle), and drives another line IF with the second bit, received on the falling edge of the clock cycle. Lines IR, IF are shown at 138. Write data sort circuit 140 (WDSORT) re-orders the bits and drives a line G0 with the bit to be written to a memory location with A0=0, and the line G1 with the bit to be written to a location with A0=1. Lines G0, G1 are shown at 134. Y select circuit 130 selects the appropriate memory columns to write the two bits in parallel from lines 134 to their respective memory locations.


U.S. Pat. No. 6,115,321 (issued Sep. 5, 2000 to Koelling et al) describes a memory with a four bit prefetch. There are four lines 134 and four lines 138. Sorting circuit 140 is used for both the read and the write accesses. The proper data ordering for Table 1 is achieved via a cooperative operation of circuit 140 and Y select circuit 130.


U.S. Pat. No. 6,600,691 (issued Jul. 29, 2003 to Morzano et al) describes a read data path that can be used for a DDR2 memory. DDR2 is defined in JDEC standard JESD79-2A (JEDEC Solid State Technology Association, January 2004) incorporated herein by reference. The DDR2 standard specifies a double data rate memory (one data item on each clock cycle edge) with a four bit prefetch with the following burst data sequences:









TABLE 2







DDR2 BURST OPERATIONS












Data Sequence (i.e. Address




Starting CL
Sequence) within the Burst










Burst Length
Address
Interleaved
Sequential





4
A1 A0





00
0-1-2-3
0-1-2-3



01
1-0-3-2
1-2-3-0



10
2-3-0-1
2-3-0-1



11
3-2-1-0
3-0-1-2


8
A2 A1 A0



000
0-1-2-3-4-5-6-7
0-1-2-3-4-5-6-7



001
1-0-3-2-5-4-7-6
1-2-3-0-5-6-7-4



010
2-3-0-1-6-7-4-5
2-3-0-1-6-7-4-5



011
3-2-1-0-7-6-5-4
3-0-1-2-7-4-5-6



100
4-5-6-7-0-1-2-3
4-5-6-7-0-1-2-3



101
5-4-7-6-1-0-3-2
5-6-7-4-1-2-3-0



110
6-7-4-5-2-3-0-1
6-7-4-5-2-3-0-1



111
7-6-5-4-3-2-1-0
7-4-5-6-3-0-1-2









Improved burst operation circuitry for DDR, DDR2, and other memories is desirable.


SUMMARY OF THE INVENTION

This section summarizes some features of the invention. Other features are described in the subsequent sections. The invention is defined by the appended claims which are incorporated into this section by reference.


In some aspects of the invention, read and write sorting circuits are provided for a memory with a prefetch of four or more data items, each data item having one or more data bits (for a memory with multiple data terminals, four or more bits are prefetched for each data terminal). In the read sorting circuit, for each output data terminal, four or more transistors are provided to select from the four or more prefetched data bits and provide the selected bit for output in a burst operation. The transistors are connected in parallel between the nodes receiving the prefetched bits and a node providing the selected bit. A similar group of transistors is provided in the write sorting circuit. All of the read and write sorting transistors are controlled by signals that are functions of the starting burst address, the burst type (interleaved or sequential), and the burst length. These functions belong to a group of six functions and their inverses. In some DDR2 embodiments, the Y select signals do not relate to the data sorting, i.e. the Y select signals are only functions of the most significant address bits, not of the burst length, the burst type, or the least significant address bits. In some embodiments, the same data sorting circuitry is suitable for both the DDR and DDR2 operation. A metal mask option, a fuse, or other methods can be used to configure the memory for DDR or DDR2 as desired.


A memory may have a number of memory banks. Each bank has one or more memory arrays and the corresponding sense amplifiers and write buffers (the buffers adjacent to the array that write the data into the arrays). The memory banks are spread over a large area. This may result in a long data path between at least some of the arrays and the DQ terminal, specifically between the sense amplifier and write buffer circuitry and the DQ terminal. To speed up memory operation, buffers (amplifiers) can be placed some place in the middle of the data path. The inventors have observed that it is efficient to place the data buffers near the sorting circuitry because the sorting circuitry can weaken the data signals. Therefore, the sorting circuitry is placed in the middle portion of the data path defied by the G-lines (such as the lines G0, G1 in FIG. 1) and the I-lines (IR, IF). In some embodiments, at least some of the G-lines and/or at least some of the I-lines are used both for reading and writing. Each of the G-lines and I-lines runs uninterrupted from a driver's output to another driver's input, and each line is connected to the driver's output without a switch adjacent to the driver's output. If a switch is present in series with the line, the switch is placed adjacent to an input device (e.g. amplifier) that receives signals from the line, not adjacent to the output of the driver that drives the line.


A further embodiment of the invention is related to the data bus of the integrated circuit memory which is compatible with both DDR1 and DDR2 modes of operation.


Other features and advantages of the invention are described below. The invention is defined by the appended claims.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a block diagram of a prior art memory circuit.



FIGS. 2-4 are block diagrams of memory circuits according to embodiments of the present invention.



FIGS. 5A, 5B, 5C, 5D, 6A, 6B, 7A, 7B, 7C, 7D, 8A, 8B, 8C, 8D, 9A, 9B are circuit diagrams of memory circuits according to embodiments of the present invention.



FIGS. 10 and 11 are timing diagrams of memory signals according to embodiments of the present invention.



FIGS. 12, 13 are block an circuit diagrams of memory circuits according to embodiments of the present invention.



FIGS. 14, 15 are timing diagrams of memory signals according to embodiments of the present invention.



FIGS. 17-20 are circuit diagrams of memory circuits related to a data bus embodiment of the present invention.


FIGS. 16 and 21-23 are timing diagrams of memory signals associated with the circuits of FIGS. 16 and 18-21.





DETAILED DESCRIPTION

The embodiments described in this section illustrate but do not limit the invention. The invention is not limited to particular circuitry, memory size or number of I/O terminals, and other details. The invention is defined by the appended claims.



FIG. 2 is a block diagram of a memory with a prefetch of four data items that provides the DDR2 (Table 2) burst operations. The memory has four data banks 210.0-210.3. Each bank has four memory arrays 110.00, 110.01, 110.10, and 110.11 corresponding to the address bit A1A0 values 00, 01, 10, and 11. Array 110.00 has memory locations with A1A0=00, array 110.01 has memory locations with A1A0=01, array 110.10 has memory locations with A1A0=10, and array 110.11 has memory locations with A1A0=11. A number of bitlines BL run horizontally through each array, and a number of wordlines WL run vertically. Each wordline runs through all the arrays of a memory bank. The wordlines are driven by row address decoders (not shown) as known in the art. Bitline sense amplifiers 220 amplify the signals on the bitlines. See e.g. U.S. Pat. No. 6,011,737 issued Jan. 4, 2000 to Li et al. and incorporated herein by reference. Y select circuit 130 selects a number of bitlines from each array corresponding to column address signals (not shown in FIG. 2). The Y select circuit consists of a number of pass transistors (not shown in FIG. 2) each of which couples a bitline BL to a line YS when the transistor is on. Address bits A1A0 are “don't care” for the Y select circuit. FIG. 2 shows only one external data terminal DQ, and each Y select circuit selects a single bit of data (e.g. a single bitline or a pair of bitlines depending on the memory architecture). If the memory has a number of DQ terminals (e.g. 4, 8, or 16 terminals as in the DDR2 standard), then each Y select circuit will select a data item of 4, 8, 16, or some other number of bits, one bit for each DQ terminal.


The YS lines can be connected to the respective G-lines 134 directly or through secondary sense amplifiers in blocks 230. Each block 230 includes sense amplifier and write buffer circuitry for one of the arrays 110.00-110.11. The memory includes four G lines G0E, G0D, G1E, and G1D for each data terminal DQ. Line G0E carries data to and from the arrays 110.00 of all the memory banks. Line G0D carries data to and from the arrays 110.01. Line G1E carries data to and from the arrays 110.10. Line G1D carries data to and from the arrays 110.11. If the memory has some number N of data terminals, then the same number N of G-lines can be provided for the arrays 110.00, N G-lines for the arrays 110.01, N G-lines for the arrays 110.10, and N G-lines for the arrays 110.11. For example, if N=16, there can be sixteen lines G0E<0:15> for arrays 110.00, sixteen lines G0D<0:15> for arrays 110.01, and so on.


In burst read operations, sorting circuit 140 couples the G lines 134 to I-lines 138 in accordance with Table 2. Four I-lines IR0 (clock cycle 0, rising edge), IF0 (clock cycle 0, falling edge), IR1 (clock cycle 1, rising edge), IF1 (clock cycle 1, falling edge) are provided for each terminal DQ. Parallel to serial converter 240 (e.g. a shift register) converts the parallel data on the I-lines to a serial format in the order IR0, IF0, IR1, and IF1. Data output buffer 250 converts the data signals to suitable voltage and current levels and provides the data on terminal (or terminals) DQ in two consecutive clock cycles. These clock cycles are marked as “CLOCK 0” and “CLOCK 1” in each read operation in the timing diagram in FIG. 10. These clock cycles are numbered as T+3, T+4 for a read command issued in cycle T, and as T+6, T+7 for a read command issued in cycle T+3. The CAS latency (defined in the DDR2 and DDR standards) is three clock cycles.


For the burst length of 8, the steps described above are repeated, and four more data items are transferred to terminal DQ from lines IR0, IF0, IR1, and IF1, in that order, so that 8 data items are output in 4 consecutive clock cycles.


I-lines 138 can also carry the write data. In the embodiment of FIG. 2, only two I-lines are used for the write data. These I-lines are IR0, IF0, but any two I-lines can be chosen. Alternatively, only one I-line can be used for the write data, or all the four I-lines can be used. It is also possible not to use the I-lines for the write data. The scheme of FIG. 2 (using exactly two I-lines) is believed to provide power and timing advantages. The write data is received serially on terminal DQ and latched and amplified by data input buffer 260. Serial-to-parallel converter 270 provides two data items received in one clock cycle to respective lines IR0 (rising edge data), IF0 (falling edge data). S/P converter 270 and circuits 240, 250, 260 are located in a peripheral region of the memory near the DQ terminal. S/P converter 280, located next to the sorting circuit 140 in the middle portion of the memory between the memory banks, performs a 2:4 data conversion. In the example of FIG. 11, four data items D0-D3 were received on terminal DQ in clock cycles T+1 and T+2 (marked as “CLOCK 0” and “CLOCK 1” respectively), on the rising and falling edges of CLOCK 0 and the rising and falling edges of CLOCK 1. When data strobe signal DQS goes low after the rising edge of clock cycle T+1, data D0 and D1 begin to be driven in parallel on respective lines IR0, IF0, and when DQS goes low after the rising edge of clock cycle T+2, data D2 and D3 begin to be driven in parallel on the same lines. Thus, line IR1 carries sequentially the rising edge data D0, D2, and line IF1 carries sequentially the falling edge data D1, D3. Starting some time in clock T+2, S/P converter 280 provides the data D0, D1, D2, D3 in parallel on respective lines WD0R, WD0F, WD1R, and WD1F. Sorting circuit 140 transfers these data to lines G0E, G0D, G1E, and G1D in parallel in accordance with Table 2. Write buffers in blocks 230 and Y select circuits 130 write the data to the memory cells in parallel.


Mode register 284 stores the burst length and type information, as defined in the DDR2 standard. Address latching circuit 288 latches the input addresses. Clock signal CLK clocks the memory operation. Those and other signals are defined in the DDR2 standard.



FIG. 3 explains the placement of sorting circuit 140 and S/P converter 280 in some embodiments. The data paths between buffers 230 and terminals DQ are long paths with long RC delays. Sorting circuit 140 and S/P converter 280 are placed in a middle portion of the path, so as to minimize the total RC delay from arrays 110 to terminal DQ. As shown, each G-line 134 has a parasitic capacitance CG associated with it, and each I-line 138 has a parasitic capacitance CI associated with it. In some embodiments, each of these capacitances is about 1 pF. LG denotes the maximum length between a block 230 and sorting circuit 140 along a G-line 138. LI is the maximum length between the circuits 140, 280 on the one hand and the circuits 240, 270 on the other hand along an I-line 138. In some embodiments, LG=LI. In some embodiments,





0.25*LG≦LI≦4*LG.


Since the G-lines are used both for reading and writing, transistor switches can be provided to connect the G lines to the reading or writing circuitry as needed. Transistor switches can also be provided for the I-lines. Switches can also be used for the two I-lines for some purposes. To minimize the RC delay on each line, the switches are placed as close as possible to the input of a driver that receives signals from the line, and not at the output of a driver that drives the line. In FIG. 3, block 230 includes a sense amplifier driver 310 (a tri-state driver) that drives a G-line 134 in read operations, and also includes an amplifier (e.g. CMOS inverter) 320 that receives the data from the G-line in the write operations. G-line 134 is connected directly to the output of driver 310 and the input of write buffer 320. In some embodiments, the G-line length is at least ¼ of the total length of the conductive write path going through the G-line from the output of buffer 780 to the input of buffer 320. In some embodiments, the G-line length is ½, ¾, or even a greater portion of the total length of the conductive write path.


In the read data path, the G-line is connected to a transistor switch (pass gates 530-542 in FIGS. 5A-5D) positioned adjacent to an input of a driver 554 in sorting circuit 140 but not adjacent to G-line driver 310. The G-line length is at least ¼ of the total length of the conductive read path from the output of driver 310 to a high impedance input of driver 554 (the high impedance input is the gates of transistors, not shown, in CMOS logic gates 560, 564 described below). In some embodiments, the G-line length is ½, ¾, or even a greater portion of the total length of the conductive read path.


Similarly, in some embodiments, the I-lines and/or the WD lines are driven by drivers that have no switches adjacent to their outputs in series with the I-lines and/or the WD lines. Note the I-line drivers 554 in FIGS. 5A-5D for example. Other embodiments use switches in series with these I-lines and/or WD-lines, but the switches are placed near the other end of the lines, e.g. near the end close to an amplifier input. In some embodiments, the length of the I-line or WD-line is at least ¼ of the total length of the conductive path going through the I-line or the WD-line from the driver output to an amplifier input. In some embodiments, the length of the I-line or WD-line is at least ½, ¾, or even a greater portion of the total length of the conductive path going through the I-line or the WD-line from the driver output to an amplifier input.


As shown in FIG. 4, the four memory banks 210 define a region 410 which is the smallest rectangular region containing all the four banks. Sorting circuit 140 and S/P converter 280 are located within the region 410. Converters 240, 270 are located outside of this region, in a peripheral region of the memory, next to buffers 250, 260 and terminal DQ. In some embodiments, sorting circuit 140 and S/P converted 280 are located in a central region 420 surrounded by the four memory banks. More particularly, the memory has a region 430 running vertically between the banks 210.0, 210.1 and between the banks 210.2, 210.3. Another region 440 runs horizontally between the banks 210.0, 210.2 and between the banks 210.1, 210.3. Region 420 is the intersection of regions 430, 440.


In some embodiments, the circuits 140, 280 are outside of region 410. Also, a memory may have multiple circuits 140 and/or multiple circuits 280 for different banks 210 or groups of banks. E.g., a memory with eight memory banks may include one circuit 140 and one circuit 280 for each group of four banks. Some or all of circuits 140, 280 may be outside of region 410 (the smallest rectangular region containing all of the eight banks). Also, the DQ terminal may be inside the region 410 or 420. Also, different portions of a circuit 140, 280, or of some circuit may be located in different parts of the memory.



FIGS. 5A-5D illustrate portions of the read sorting circuitry in circuit 140. Circuits 510-R0 (FIG. 5A), 510-F0 (FIG. 5B), 510-R1 (FIG. 5C), 510-F1 (FIG. 5D) drive respective I-lines IR0, IF0, IR1, IF1. These four circuits 510 are identical except for the input signals at the gates of pass gates 530, 534, 538, 542. Each of these circuits 510 includes a multiplexer 520 selecting one of the lines G0E, G0D, G1E, G1D for connection to a node 550 at the input of a tri-state driver 554. Driver 554 drives the respective I-line. MUX 520 consists of four pass gates 530, 534, 538, 542. Each of these pass gates has one source/drain terminal connected to the respective line G0E, G0D, G1E, or G1D, and the other source/drain terminal connected to node 550. The four pass gates connected in parallel provide a low delay data path (one transistor delay). The invention is not limited to this structure however.


Node 550 is connected to one input of two-input NAND 560 and to one input of two-input NOR gate 564 in driver 554. The other inputs of gates 560, 564 receive respective complimentary signals RGICLK, RGICLKB. RGICLK is high during burst reads, and it is low during burst writes to disable the drivers 554. The outputs of gates 560, 564 are connected respectively to the gates of PMOS transistor 566 and NMOS transistor 568. PMOS transistor 566 has its source connected to a voltage source VCC and its drain connected to the respective I-line. NMOS transistor 568 has its drain connected to the I-line and its source connected to ground (or some other reference voltage).


The I-line is also connected to a latch formed by cross-coupled inverters 570, 574.


In some embodiments, all the logic gates (such as gates 560, 564) and the inverters in FIGS. 5A through 11 are CMOS circuitry, but this is not necessary.


Multiplexers 520 are controlled by signals SORT<0:5> and their complements SORTB<0:5> generated by the circuit of FIGS. 6A, 6B. Each SORT signal is a function of the address bits A<0:1> and the burst length and type signals stored in mode register 284 (FIG. 2). In FIG. 6A, signals AL<1:0> are latched versions of the address signals A<1:0>. Address signals AL<1:0> are generated by latching circuit 288 (FIG. 2) from signals A<1:0> provided in accordance with timing specified in the DDR and DDR2 standards as applicable. Signals ALB<0:1> are the complements of AL<0:1>. In FIG. 6B, the SEQUENTIAL signal is generated from the burst type signal in register 284. SEQUENTIAL is high (logic 1) if the burst type is sequential. The signal BURSTLENGTH2 is high if the burst length is 2. The DDR2 standard (Table 2) does not provide for the burst length of 2, so BURSTLENGTH2 is low for the DDR2 operation. In some embodiments, the memory also provides the DDR data sequences (Table 1). BURSTLENGTH2 is high in the DDR mode for the burst length of 2. If only the DDR2 operation must be provided, BURSTLENGTH2 can be permanently set to low with a metal option, an electrically or laser programmable fuse, or an electrically programmable cell such as EEPROM.


Signal BURSTLENGTH2 is inverted by inverter 610. The output of inverter 610 and the signal SEQUENTIAL are NANDed by NAND gate 614. The output INTERLEAVE of gate 614 is inverted by inverter 620 to provide a signal SEQUENTIALP. When BURSTLENGTH2 is low, signal INTERLEAVE is the complement of SEQUENTIAL, and SEQUENTIALP is the logic equivalent of SEQUENTIAL. When BURSTLENGTH2 is high, INTERLEAVE is also high and SEQUENTIALP is low. As shown in Table 1, the burst type is “don't care” for the burst length of 2.


The SORT signals asserted for a given A1A0 value and a given burst length are shown in Table 3 below. The last two columns show which of the SORT signals are asserted (high). The remaining SORT signals are low.









TABLE 3







SORT SIGNALS










STARTING




ADDRESS
SORT SIGNALS ASSERTED










BURST LENGTH
A1A0
Interleaved
Sequential





2 (DDR only)
00
SORT<0>
SORT<0>



01
SORT<1>&SORT<4>
SORT<1>&SORT<4>



10
SORT<2>
SORT<2>



11
SORT<3>&SORT<5>
SORT<3>&SORT<5>


4 or 8 (DDR or
00
SORT<0>
SORT<0>


DDR2)
01
SORT<1>&SORT<4>
SORT<1>&SORT<5>



10
SORT<2>
SORT<2>



11
SORT<3>&SORT<5>
SORT<3>&SORT<4>









The circuit of FIG. 6A is one possible implementation of Table 3. Address signals ALB<0>, ALB<1> are ANDed by NAND gate 630 and inverter 634 to provide SORT<0>. Signals AL<0>, ALB<1> are ANDed by NAND gate 640 and inverter 644 to provide SORT<1>. Signals ALB<0>, AL<1> are ANDed by NAND gate 650 and inverter 654 to provide SORT<2>. Signals AL<0>, AL<1> are ANDed by NAND gate 660 and inverter 664 to provide SORT<3>. Pass gates 670, 674 are configured as a multiplexer selecting the output of gate 640 when INTERLEAVE is high, and the output of gate 660 when INTERLEAVE is low (when SEQUENTIALP is high). The multiplexer output is inverted by inverter 678 to provide SORT<4>. Pass gates 680, 684 are configured as a multiplexer selecting the output of gate 660 when INTERLEAVE is high, and the output of gate 640 when INTERLEAVE is low. The multiplexer output is inverted by inverter 688 to provide SORT<5>.


Signals SORTB<0:5> are obtained by inverting SORT<0:5> with inverters (not shown).


In FIG. 5A, pass gate 530 is closed (conducting) when SORT<0> is high, and the pass gate is open otherwise. Pass gate 534 is closed when SORT<1> is high, and the pass gate is open otherwise. Pass gates 538, 542 are closed if when the respective signals SORT<2>, SORT<3> are high, and the pass gates are open otherwise. In FIG. 513, pass gates 530, 534, 538, 542 are closed when the respective signals SORT<4>, SORT<0>, SORT<5>, SORT<2> are high, and the pass gates are open otherwise. In FIG. 5C, pass gates 530, 534, 538, 542 are closed when the respective signals SORT<2>, SORT<3>, SORT<0>, SORT<1> are high, and the pass gates are open otherwise. In FIG. 5D, pass gates 530, 534, 538, 542 are closed when the respective signals SORT<5>, SORT<2>, SORT<4>, SORT<0> are high, and the pass gates are open otherwise.


If the memory has multiple DQ terminals, e.g. N such terminals, each circuit 510 may contain a multiplexer circuit consisting of N multiplexers 520. Each multiplexer will be identical to a respective multiplexer 520 of FIG. 5A, 5B, 5C or 5D except for its data inputs and outputs. For example, in the case of FIG. 5A, N lines IR0<0:N-1> can be provided, one line for each DQ terminal. Likewise, there can be N lines G0E<0:N-1>, N lines G0D<0:N-1>, N lines G1E<1:N-1>, and N lines G1D<0:N-1>. The circuit 510-R0 will have N multiplexers 520, which can be labeled, for example, as 520.0, . . . 520.N-1. Each multiplexer 520.i will select one of the lines G0E<i>, G0D<i>, G1E<i>, G1D<i> and will couple the selected line to the line IR0<i>. All the multiplexers 520.i will receive the same SORT signals as in FIG. 5A.



FIGS. 7A-7D illustrate portions of the write sorting circuitry in circuit 140. The circuitry is controlled by the SORT signals (FIG. 6A), and the Table 3 above applies to both the read and the write operations. Circuits 710-0E (FIG. 7A), 710-0D (FIG. 7B), 710-1E (FIG. 7C), 710-1D (FIG. 7D) drive respective G-lines G0E, G0D, G1E, G1D. These four circuits 710 are identical except for the input signals at the gates of pass gates 730, 734, 738, 742. Each of these circuits 710 includes a multiplexer 720 selecting one of the lines WD0R, WD0F, WD0R, WD1F (FIG. 2). MUX 720 consists of four pass gates 730, 734, 738, 742. Each of these pass gates has one source/drain terminal connected to the respective line WD0R, WD0F, WD1R, or WD1F, and the other source/drain terminal connected to the multiplexer output node 750. The four pass gates connected in parallel provide a low delay data path (one transistor delay). The invention is not limited to this structure however.


The signal on node 750 is inverted by inverter 764. The output of inverter 764 is connected to a source/drain terminal of pass gate 768. Pass gate 768 is closed when a signal GWENL is high, and is open otherwise. Signal GWENL is used to capture and latch data following the write command in a clock cycle defined by the write latency defined by mode register 284 of FIG. 2 (the write latency is the CAS latency minus one clock cycle in DDR2). The signal L− at the PMOS gate of pass gate 768 is the inverse (the complement) of signal GWENL. The other source/drain terminal of pass gate 768 is connected to one terminal of a latch consisting of cross coupled inverters 772, 776. The other latch terminal is the input of a tri-state driver 780. Driver 780 drives the respective G-line G0E, G0D, G1E, or G1E when signal GWDRV is high. Driver 780 is disabled (high impedance) when GWDRV is low. In the driver, the signal from the latch 772, 776 is provided to one input of two-input NAND 784 and one input of two-input NOR gate 788. The other inputs of gates 784, 788 receive respective complimentary signals GWDRV, DRV-. The outputs of gates 784, 788 are connected respectively to the gates of PMOS transistor 792 and NMOS transistor 796. PMOS transistor 792 has its source connected to voltage source VCC and its drain connected to the respective G-line. NMOS transistor 796 has its drain connected to the G-line and its source connected to ground (or some other reference voltage).


In FIGS. 7A-7D, the sorting (circuits 710 receive the same two signals GWENL, GWDRV and their complements. In another embodiment, a separate pair of the GWENL, GWDRV signals is provided to each individual circuit 710, to allow selective enabling of some of the circuits 710 while disabling the remaining circuits 710. This is done to save power in the DDR operation described below in connection with Tables 4 and 5. The DDR operation has a prefetch of 2, so only two of the G-lines are needed to carry the write data, as described below.


In FIG. 7A, pass gate 730 is closed when SORT<0> is high, and the pass gate is open otherwise. Pass gate 734 is closed when SORT<4> is high, and the pass gate is open otherwise. Pass gates 738, 742 are closed when the respective signals SORT<2>, SORT<5> are high, and the pass gates are open otherwise. In FIG. 7B, pass gates 730, 734, 738, 742 are closed when the respective signals SORT<1>, SORT<0>, SORT<3>, SORT<2> are high, and the pass gates are open when these respective signals are low. In FIG. 7C, pass gates 730, 734, 738, 742 are closed when the respective signals SORT<2>, SORT<5>, SORT<0>, SORT<4> are high, and the pass gates are open otherwise. In FIG. 7D, pass gates 730, 734, 738, 742 are closed when the respective signals SORT<3>, SORT<2>, SORT<1>, SORT<0> are high, and the pass gates are open otherwise.


If the memory has multiple DQ terminals, e.g. N such terminals, each circuit 710 may contain a multiplexer circuit consisting of N multiplexers 720. Each multiplexer will be identical to a respective multiplexer 720 of FIG. 7A, 7B, 7C or 7D except for its data inputs and outputs. For example, in the case of FIG. 7A, N lines G0E<0:N-1> can be provided, one line for each DQ terminal. Likewise, there can be N lines WD0R<0:N-1>, N lines WD0F<0:N-1>, N lines WD1R<0:N-1>, and N lines WD1F<0:N-1>. The circuit 710-0E will have N multiplexers 720, which can be labeled, for example, as 720.0, . . . 720.N-1. Each multiplexer 720.i will select one of the lines WD0R<i>, WD0F<i>, WD1R<i>, WD1F<i> and will couple the selected line to the line G0E<i>. All the multiplexers 720.i will receive the same SORT signals as in FIG. 7A.



FIGS. 8A-8D illustrate one embodiment of S/P converter 270. Signal DQS (FIGS. 8B, 8D) is an input data strobe. The data on terminal DQ are latched by buffer 260 on each edge of DQS, as defined in the DDR2 standard and shown in FIG. 8D. Signal CLK as a clock signal, called CK in the DDR2 standard. The DQ data provided on the rising CLK edge are latched when DQS is high, and data provided on the falling CLK latch are latched when DQS is low. DI (FIG. 8A) is the output of buffer 260 (FIG. 2).


The circuits of FIGS. 8B, 8C generate control signals for the circuit of FIG. 8A. As shown in FIG. 8B, the DQS signal is inverted by inverter 804 to provide a signal C− on the inverter output. Signal C− is inverted by inverter 806 to provide a signal C. Signal DQSFFENB is asserted (active low) to enable DQS latching by the memory. The DQS latching circuitry is not shown. DQSFFENB and DQS are NORed by NOR gate 810 to provide a signal CDQS−. CDQS− is inverted by inverter 814 to provide CDQS (“controlled DQS”).


Signal IDRVENB (FIG. 8C) is a logic equivalent of DQSFFENB. IDRVENB is provided to an input of a chain of serially connected inverters 818, 820, 822, 824. The output signal of inverter 822 is labeled IWEN. The output signal of inverter 824 is labeled IWENB.


In FIG. 8A, the input DI is connected to one source/drain terminal of pass gate 830. The pass gate is closed when C is low, to pass a data item that was received on the DQ terminal when DQS was high (as can be seen in FIG. 8B, due to inverters 804, 806 there is a time delay between DQS and C). The other source/drain terminal of pass gate 830 is connected to one terminal of a latch consisting of cross-coupled inverters 832, 834. The other terminal of the latch is connected to the input of inverter 836. The inverter output is connected to a source/drain terminal of pass gate 840 which is closed when C is high. The other source/drain terminal of pass gate 840 is connected to one terminal of a latch consisting of cross-coupled inverters 842, 846. The other terminal of the latch is connected to the input of inverter 850. The inverter output is connected to one source/drain terminal of pass gate 852 which is closed when CDQS is low. The other source/drain terminal of the pass gate is connected to one terminal of a latch consisting of cross-coupled inverters 854, 856. The other latch terminal is connected to the input of inverter 858. The inverter output 860 is connected to the input of a tri-state driver driving the line IR0 when IWEN is high. The driver is disabled when IWEN is low. The driver includes a NAND gate 862 which NANDs the signal on node 860 with the signal IWEN, and a NOR gate 864 which NORs the signal on node 860 with IWENB. The outputs of gates 862, 864 are connected to the respective gates of PMOS transistor 866 and NMOS transistor 868. PMOS transistor 866 has its source connected to VCC and its drain connected to line IR0. NMOS transistor 868 has its drain connected to line IR0 and its source connected to ground.


Input DI is connected to one source/drain terminal of pass gate 870. The pass gate is closed when C is high, to enable latching of a data item that was received on the DQ terminal when DQS was low. The other source/drain terminal of pass gate 870 is connected to one terminal of a latch consisting of cross-coupled inverters 872, 874. The other terminal of the latch is connected to the input of inverter 876. The inverter output is connected to a source/drain terminal of pass gate 882 which is closed when CDQS is low. The other source/drain terminal of the pass gate is connected to one terminal of a latch consisting of cross-coupled inverters 884, 886. The other latch terminal is connected to the input of inverter 888. The inverter output 890 is connected to the input of a tri-state driver driving the line IF0 when IWEN is high. The driver is disabled when IWEN is low. The driver includes a NAND gate 892 which NANDs the signal on node 890 with the signal IWEN, and a NOR gate 894 which NORs the signal on node 890 with IWENB. The outputs of gates 892, 894 are connected to the respective gates of PMOS transistor 896 and NMOS transistor 898. PMOS transistor 896 has its source connected to VCC and its drain connected to line IF0. NMOS transistor 898 has its drain connected to line IF0 and its source connected to ground.


When DQS becomes high and then becomes low, two bits of the DQ data received on the respective rising and filling CLK edges are driven on the respective lines IR0, IF0. See the timing diagram in FIG. 11.



FIGS. 9A and 9B illustrate S/P converter 280. FIG. 9A shows a data path from line IR1 to lines WD0R, WD1R. FIG. 9B shows a data path from line IF1 to lines WD0F, WD1F. The two data paths are identical circuits controlled by signals WDENL, SWENL, and their complements WDENLB, SWENLB. These signals are described below. In each of FIGS. 9A, 9B, the I-line IR0 or IF0 is connected to the input of inverter 910. The inverter output signal passes through pass gate 920 closed when WDENL is high, to one terminal of a latch formed by cross-coupled inverters 924, 926. The other terminal of the latch is connected to respective line WD1R or WD1F. This terminal is also connected to one source/drain terminal of pass gate 930 closed when SWENL is high. The other source/drain terminal of the pass gate is connected to one terminal of a latch formed by cross-coupled inverters 934, 936. The Other terminal of the latch is connected to the input of inverter 940 whose output is connected to respective line WD0R or WD0F.


The WDENL signal is driven high to couple the lines IR1, IF1 to the WD lines. In each burst write operation, SWENL is driven high for the first two data items of the burst, i.e. items D0, D1 in FIG. 11, so that D0 is driven on WD0R and WD1R and D1 is driven on WD0F and WD1F. SWENL is low for the next two data items D2, D3 so that D2 is driven on WD1R and D3 is driven on WD1F while the items D0, D1 continue to be driven on WD0R, WD0F. If the burst length is 8 to write consecutive data D0-D7, SWENL is high for D4, D5 and low four D6, D7. As a result, D4 is initially driven on WD0R, WD1R, and D5 is initially driven on WD0F, WD1F, but then D6 and D7 overwrite D4 and D5 on the respective lines WD1R, WD1F so that the four data items D4-D7 are driven on the respective lines WD0R, WD0F, WD1R, WD1F in parallel.



FIG. 17 shows burst write signal timing for two bursts of burst length 4 and write latency 1 as defined in the DDR2 standard. A write command (WRC) is issued on the rising edge of a clock cycle T, and another write command is issued on the rising edge of clock cycle T+2. For the write command in cycle T, DQS is active in cycles T+1 and T+2 to enable the writing of two data items (the burst length is 4). Signal AWSCLM05 is driven high in write burst operations. WDENL=(NOT CLK) AND AWSCLM05.



FIG. 10 is a timing diagram, of two consecutive interleave read operations with a burst length of 4. DATAOUT is the DQ signal. A read command is issued in a clock cycle T with A<1:0>=01. Four data items D0-D3 are read out in parallel from one of the memory banks 210 and driven on the G-lines as a result of the read command. The SORT signals become valid around the same time that the data are driven onto the G-lines. D0-D3 are transferred in parallel to the I-lines, and then read out to the DQ terminal on the edges of clocks T+3 and T+4. DQS is driven high for the rising edge data, and low for the falling edge data, in accordance with the DDR2 standard.


Another read command is issued in clock cycle T+3 with A<1:0>=10. The read operation timing is the same as for the previous read.



FIG. 11 is a timing diagram of two consecutive burst write operations for a sequential burst type and a burst length of 4. DATAIN is the DQ signal. A write command is issued in a clock cycle T With A<1:0>=01. Four data items D0-D3 are latched from the DQ terminal on the rising and falling edges of clocks T+1, T+2 synchronously with the DQS signal, as, specified in the DDR2 standard. Upon the falling edge of the DQS signal after the rising edge of clock cycle T+1, data items D0, D1 are driven on respective lines IR0, IF0 as described above, and then on respective lines WD0R, WD0F. Upon the falling edge of the DQS signal after the rising edge of clock cycle T+2, data items D2, D3 are driven on respective lines IR0, IF0, and then on respective lines WD1R, WD1F. The SORT signals become valid in cycle T+2, and the data are transferred to the G-lines and written to one of the memory banks. Another write command is issued in cycle T+3 with A<1:0>=10, and is performed with a similar timing.



FIG. 12 is a block diagram of Y select and decoding circuitry suitable for the DDR2 functionality. FIG. 13 is a block diagram of Y select and decoding circuitry suitable for a memory providing both the DDR2 and the DDR functionality. Identical circuits can be used for the four memory banks, and only one memory bank is shown. Y select circuit 130 includes four circuits 130.00, 130.01, 130.10, 130.11 for the respective arrays 110.00, 110.01, 110.10, 110.11. Each of these circuits 130.ij has pass transistors 1210 coupling the bitlines BL of the respective array to the respective line YS. The column address is denoted as A<c:0>, and its latched version as AL<c:0>. The memory bank 210 is selected by the row address. Bits AL<1:0> select an array 110.1j out of the four arrays of the memory bank. The remaining bits AL<c:2> select a column within the array. The column contains one bitline or a pair of bitlines for each DQ terminal. In FIG. 12, the columns having the same column address within the four arrays are activated simultaneously, so the gates of the pass transistors for these columns are tied together. Thus, each output of Y decoder 1220 is shown connected to four pass transistor gates in the respective four circuits 130.00-130.11. Y decoder 1220 receives column address signals AL<c:3> and a signal A2D generated by circuit 1230 from column address signal AL2 (i.e. AL<2>). If the DDR2 burst length is 4, then A2D=AL2. If the burst length is 8, then A2D=AL2 for the prefetch of the first four data items (i.e. when the first four data items are being transferred between the arrays 110 and the G-lines), and A2D is the inverse of AL2 for the prefetch of the last four data items. Y decoder 1220 includes a number of AND gate circuits that perform AND operations on groups of address signals and their compliments in a known in the art. The Y decoder outputs are connected to the gates of pass transistors 1210 as shown.



FIG. 13 shows the Y circuitry suitable for both the DDR2 operation (Table 2, prefetch of 4) and the DDR operation (Table 1, prefetch of 2). In the DDR mode, only two of arrays 110.ij are accessed at a time, and further the DDR sequential type bursts of burst length 8 may require simultaneous activation of columns with different address bits A<c:2>. Therefore, the gates of pass transistors 1210 in different circuits 130.ij are not tied together. Y decoder circuit 1310 generates the signals for the gates of pass transistors 1210 from the address bits AL<c:2>. In the DDR2 operation, the same signals can be generated as in FIG. 12. In the DDR operation, the signals are generated as defined by Table 1 and explained immediately below.


In the DDR burst read operation, two data items are read from two of the arrays 110.ij in the selected bank to the respective G-lines. Sorting circuit 140 (FIGS. 2, 5A-7D) transfers the data items to the lines IR0, IF0 in accordance with Table 1. P/S converter 240 converts the data to the serial format, and D0 buffer 250 sequentially provides the data on the DQ terminal on the rising and falling edges of a clock cycle synchronously with the DQS signal, as defined in the DDR standard.


In a burst write operation, buffer 260 latches the data item pairs received on a rising and falling clock edges. S/P converter 270 drives each data item pair on the lines IR0 (rising edge data), IF0 (falling edge data). In S/P 280 (FIGS. 9A, 9B), the signal SWENL is forced DC high in the DDR operation. Therefore, the rising edge data item is driven on both lines WD0R, WD1R, and the falling edge data item is driven on both lines WD0F, WD1F. Because each data item is provided on two of the lines, the design of sorting circuit 140 is simplified, and in particular the same SORT signals can be used for the DDR and DDR2 operation for the burst lengths of 4 and 8 as shown above in Table 3.


Tables 4 and 5 below show the G-lines for the DDR operation. The first column (Burst Length) is the same as in Table 1. In the second column (A1A0, Data Sequence, or A2A1A0, Data Sequence), A1A0 or A2A1A0 is the starting address. The Data Sequence is as in the last two columns (data sequence columns) in Table 1. Table 4 includes the interleaved type data sequences, and Table 5 the sequential type sequences.


The last five columns show the correspondence between the WD lines and the G-lines in different prefetch clock cycles. A prefetch clock cycle is a cycle in which data are transferred between the arrays 110 and the G-lines. If the burst length is 2, only one prefetch cycle CLK0 is present. For the burst length of 4, two prefetch clock cycles CLK0 and CLK1 are present. For the burst length of 8, four prefetch cycles CLK0, CLK1, CLK2, and CLK3 are present.


For the burst length of 2, starting address A1A0=00, the data sequence is 0-1. The data from lines WD0R, WD0F, WD1R, WD1F are transferred to the respective lines G0E, G0D, G1E, G1D as defined by the SORT signals (Table 3 and FIGS. 6A-7D). In the data sequence 0-1, the line G0E carries the data item 0, and G0D carries data item 1. This is shown as G0E(0), G0D(1) in Tables 4 and 5. The lines G1E, G1D will not be coupled to the arrays due to the action of the Y circuitry (FIG. 13). This is shown as G1E(none), G1D(none).


For A1A0=01, the operation is similar. For A1A0=10, the data sequence is shown as “2-3” instead of “0-1” because A1=1. The correspondence between the WD lines and the G-lines is the same as for A1A0=0, but this time the data from lines G1E (item 2) and G1D (item 3) is written to the arrays. Lines G0E, G0D carry the same data (because the lines WD0R, WD0F carry the same data as WD1R, WD1F) but lines G0E, G0D are not coupled to the arrays by the Y circuitry.


For A1A0=11, the operation is similar. The burst length 2 entries are the same in Tables 4 and 5.


For the burst length of 4 in Table 4, A1A0=00, the lines WD0R, WD0F, WD1R, WD1F are coupled to respective lines G0E, G0D, G1E, G1D. In clock CLK0, lines G0E (data sequence item 0) and G0D (item 1) are coupled to the respective arrays 110.00 and 110.01. In clock CLK1, lines G1E (item 2) and G1D (item 3) are coupled to the respective arrays 110.10, 110.11. The operation for the remaining starting addresses is similar. Lines G0E, G0D, G1E, G1D always carry the respective items 0, 1, 2, 3 of the data sequence.


For the burst length of 8, if A2=0, the data lines G0E, G0D, G1E, G1D carry the respective items 0-3 in cycles CLK0, CLK1, and the respective items 4-7 in cycles CLK2, CLK3. If A2=1, the lines G0E, G0D, G1E, G1D carry the respective items 4-7 in cycles CLK0, CLK1, and the respective items 0-3 in cycles CLK2, CLK3. Therefore, if A2=0, Y decoder 1310 (FIG. 13) selects the columns with A2=0 in cycles CLK0, CLK1, and the columns with A2=1 in cycles CLK2, CLK3. If A2=1, Y decoder 1310 (FIG. 13) selects the columns with A2=1 in cycles CLK0, CLK1 with A2=0 in cycles CLK2, CLK3.


In Table 5, for the burst length of 4, lines G0E, G0D, G1E, G1D always carry the respective items 0, 1, 2, 3. For the burst length of 8, line G0E carries item 0 or 4, line G0D carries item 1 or 5, line G1E carries item 2 or 6, and line G1D carries item 3 or 7. The Y circuitry may have to activate columns for with different A2 bits in the same clock cycle. For example, for the starting address 001, clock CLK1, the lines G1D, G0E carry the respective items 3 (A2=0) and 4 (A2=1).


A memory may provide both the DDR and DDR2 operations, or the memory may be configurable by a metal mask option, a fuse, or an input signal to provide only the DDR or DDR2 operation but not both.









TABLE 4







DDR INTERLEAVED TYPE DATA PATH















Prefetch






Burst

Clock


Len

Cycle
WD0R
WD0F
WD1R
WD1F






A1A0,








Data



Sequence


2
00
CLK0
G0E(0)
G0D(1)
not used
not used



0-1



01
CLK0
G0D(1)
G0E(0)
not used
not used



1-0



10
CLK0
G1E(2)
G1D(3)
not used
not used



2-3



11
CLK0
G1D(3)
G1E(2)
not used
not used



3-2



A1A0,



Data



Sequence


4
00
CLK0
G0E(0)
G0D(1)
not used
not used



0-1-2-3
CLK1
not used
not used
G1E(2)
G1D(3)



01
CLK0
G0D(1)
G0E(0)
not used
not used



1-0-3-2
CLK1
not used
not used
G1D(3)
G1E(2)



10
CLK0
G1E(2)
G1D(3)
not used
not used



2-3-0-1
CLK1
not used
not used
G0E(0)
G0D(1)



11
CLK0
G1D(3)
G1E(2)
not used
not used



3-2-1-0
CLK1
not used
not used
G0D(1)
G0E(0)



A2A1A0,



Data



Sequence


8
000
CLK0
G0E(0)
G0D(1)
not used
not used



0-1-2-3-
CLK1
not used
not used
G1E(2)
G1D(3)



4-5-6-7
CLK2
G0E(4)
G0D(5)
not used
not used




CLK3
not used
not used
G1E(6)
G1D(7)



001
CLK0
G0D(1)
G0E(0)
not used
not used



1-0-3-2-
CLK1
not used
not used
G1D(3)
G1E(2)



5-4-7-6
CLK2
G0D(5)
G0E(4)
not used
not used




CLK3
not used
not used
G1D(7)
G1E(6)



010
CLK0
G1E(2)
G1D(3)
not used
not used



2-3-0-1-
CLK1
not used
not used
G0E(0)
G0D(1)



6-7-4-5
CLK2
G1E(6)
G1D(7)
not used
not used




CLK3
not used
not used
G0E(4)
G0D(5)



011
CLK0
G1D(3)
G1E(2)
not used
not used



3-2-1-0-
CLK1
not used
not used
G0D(1)
G0E(0)



7-6-5-4
CLK2
G1D(7)
G1E(6)
not used
not used




CLK3
not used
not used
G0D(5)
G0E(4)



100
CLK0
G0E(4)
G0D(5)
not used
not used



4-5-6-7-
CLK1
not used
not used
G1E(6)
G1D(7)



0-1-2-3
CLK2
G0E(0)
G0D(1)
not used
not used




CLK3
not used
not used
G1E(2)
G1D(3)



101
CLK0
G0D(5)
G0E(4)
not used
not used



5-4-7-6-
CLK1
not used
not used
G1D(7)
G1E(6)



1-0-3-2
CLK2
G0D(1)
G0E(0)
not used
not used




CLK3
not used
not used
G1D(3)
G1E(2)



110
CLK0
G1E(6)
G1D(7)
not used
not used



6-7-4-5-
CLK1
not used
not used
G0E(4)
G0D(5)



2-3-0-1
CLK2
G1E(2)
G1D(3)
not used
not used




CLK3
not used
not used
G0E(0)
G0D(1)



111
CLK0
G1D(7)
G1E(6)
not used
not used



7-6-5-4-
CLK1
not used
not used
G0D(5)
G0E(4)



3-2-1-0
CLK2
G1D(3)
G1E(2)
not used
not used




CLK3
not used
not used
G0D(1)
G0E(0)
















TABLE 5







DDR SEQUENTIAL TYPE DATA PATH















Prefetch






Burst

C1ock


Len

Cyc1e
WD0R
WD0F
WD1R
WD1F






A1A0,








Data



Sequence


2
00
CLK0
G0E(0)
G0D(1)
not used
not used



0-1



01
CLK0
G0D(1)
G0E(0)
not used
not used



1-0



10
CLK0
G1E(2)
G1D(3)
not used
not used



2-3



11
CLK0
G1D(3)
G1E(2)
not used
not used



3-2



A1A0,



Data



Sequence


4
00
CLK0
G0E(0)
G0D(1)
not used
not used



0-1-2-3
CLK1
not used
not used
G1E(2)
G1D(3)



01
CLK0
G0D(1)
G1E(2)
not used
not used



1-2-3-0
CLK1
not used
not used
G1D(3)
G0E(0)



10
CLK0
G1E(2)
G1D(3)
not used
not used



2-3-0-1
CLK1
not used
not used
G0E(0)
G0D(1)



11
CLK0
G1D(3)
G0E(0)
not used
not used



3-0-1-2
CLK1
not used
not used
G0D(1)
G1E(2)



A1A0,



Data



Sequence


8
000
CLK0
G0E(0)
G0D(1)
not used
not used



0-1-2-3-
CLK1
not used
not used
G1E(2)
G1D(3)



4-5-6-7
CLK2
G0E(4)
G0D(5)
not used
not used




CLK3
not used
not used
G1E(6)
G1D(7)



001
CLK0
G0D(1)
G1E(2)
not used
not used



1-2-3-4-
CLK1
not used
not used
G1D(3)
G0E(4)



4-6-7-0
CLK2
G0D(5)
G1E(6)
not used
not used




CLK3
not used
not used
G1D(7)
G0E(0)



010
CLK0
G1E(2)
G1D(3)
not used
not used



2-3-4-5-
CLK1
not used
not used
G0E(4)
G0D(5)



6-7-0-1
CLK2
G1E(6)
G1D(7)
not used
not used




CLK3
not used
not used
G0E(0)
G0D(1)



011
CLK0
G1D(3)
G0E(4)
not used
not used



3-4-5-6-
CLK1
not used
not used
G0D(5)
G1E(6)



7-0-1-2
CLK2
G1D(7)
G0E(0)
not used
not used




CLK3
not used
not used
G0D(1)
G1E(2)



100
CLK0
G0E(4)
G0D(5)
not used
not used



4-5-6-7-
CLK1
not used
not used
G1E(6)
G1D(7)



0-1-2-3
CLK2
G0E(0)
G0D(1)
not used
not used




CLK3
not used
not used
G1E(2)
G1D(3)



101
CLK0
G0D(5)
G1E(6)
not used
not used



5-6-7-0-
CLK1
not used
not used
G1D(7)
G0E(0)



1-2-3-4
CLK2
G0D(1)
G1E(2)
not used
not used




CLK3
not used
not used
G1D(3)
G0E(4)



110
CLK0
G1E(6)
G1D(7)
not used
not used



6-7-0-1-
CLK1
not used
not used
G0E(0)
G0D(1)



2-3-4-5
CLK2
G1E(2)
G1D(3)
not used
not used




CLK3
not used
not used
G0E(4)
G0D(5)



111
CLK0
G1D(7)
G0E(0)
not used
not used



7-0-1-2-
CLK1
not used
not used
G0D(1)
G1E(2)



3-4-5-6
CLK2
G1D(3)
G0E(4)
not used
not used




CLK3
not used
not used
G0D(5)
G1E(6)










FIG. 14 is a timing diagram of two consecutive DDR interleave read operations with a burst length of 4 and a CAS latency of 3. A read command is issued in a clock cycle T with A<1:0>=01. Two data items D0, D1 are read out in parallel from the respective arrays 110.01, 110.00 of one of the memory banks 210 and driven on the respective lines G0D, G0E as a result of the read command. The SORT signals become valid around the same time that the data D0, D1 are driven onto the two G-lines. Data D0,D1 are transferred in parallel to respective I-lines IR0, IF0, and then serially to the DQ terminal on the respective rising and falling edges of clock T+3. DQS is driven high for the rising edge data, and low for the falling edge data, in accordance with the DDR standard. I-lines IR1, IF1 are unused in the DDR read operations.


As a result of the read command in clock cycle T and of the rising edge of clock T+1, two data items D2, D3 are read out in parallel from the respective arrays 110.11, 110.10 and driven on the respective lines G1D, G1E. Data D2, D3 are transferred in parallel to respective I-lines IR0, IF0, and then read out to the DQ terminal on the respective rising and falling edges of clock T+4. DQS is driven high for the rising edge data, and low for the falling edge data.


Another read command is issued in clock cycle T+3 with A<1:0>=10. The read operation timing is similar.



FIG. 15 is a timing diagram of two consecutive burst write operations for a sequential burst type and a burst length of 4. A write command is issued in a clock cycle T with A<1:0>=01. DQSFFENB becomes asserted to enable the DQS latching, and four data items D0-D3 are latched from the DQ terminal on the rising and falling edges of clocks T+1, T+2 synchronously with the DQS signal, as specified in the DDR standard. Upon the falling edge of the DQS signal after the rising edge of clock cycle T+1, data items D0, D1 are driven onto respective lines IR0, IF0 as described above. Then item D0 is transferred to both lines WD0R, WD1R, and item D1 is transferred to both line WD0F, WD1F. Upon the falling edge of the DQS signal after the rising edge of clock cycle T+2, data items D2, D3 are driven onto respective lines IR0, IF0, and then onto respective lines WD0R/WD1R, WD0F/WD1F. The SORT signals become valid in cycle T+1. The signals GWENL of circuits 710-0D, 710-1E are pulsed as a result of the rising edge of T+2, and the data items D0, D1 are transferred to respective lines G0D, G1E and then written in parallel to the respective arrays 110.01, 110.10 of one of the memory banks. The signals GWENL of circuits 710-0E, 710-1D are pulsed as a result of the rising edge of T+3, and the data items D2, D3 are transferred to respective lines G1D, G0E and then written in parallel to the respective arrays 110.11, 110.01 of the memory bank. Another write command is issued in cycle T+3 with A<1:0>=10, and is performed with a similar timing.


The invention is not limited to the embodiments described above. For example, the burst operations of Tables 1-5 can be provided in a single data rate memory, or in a memory with one data item read or written per clock cycle, per two clock cycles, or per any number of clock cycles. Different portions of sorting circuit 140 can be located in different parts of the memory. For example, multiplexers 510 (FIGS. 5A-5D) may be grouped together in one part of the memory, and multiplexers (FIGS. 7A-7D) in another part. The circuitry of FIGS. 2-13 is exemplary and not limiting. CMOS and non-CMOS circuits can be used. Each I-line or G-line can be formed from one conductive layer or from multiple conductive layers separated by dielectric layers and interconnected through openings in the dielectric. The invention is not limited to a particular type of a memory cell. The invention is applicable to DRAM (pseudo-SRAM) cells disclosed in U.S. Pat. No. 6,285,578 issued Sep. 4, 2001 to Huang and incorporated herein by reference, and to other DRAM and non-DRAM memory cells, known or to be invented.


Write Data is captured on every edge (rising and falling) of the data input strobe. This strobe is nominally coincident with the clock. However, the DDR2 specification dictates that a new address is only supplied at a maximum of every other clock cycle. This allows the memory to be characterized as a “14-bit prefetch” design. Given one address (read or write), four distinct bits of data per I/O pad can be written or read from the part.


A further embodiment of the present invention is directed at the data bus that connects the I/O buffers to the main banks (arrays) of the memory chip and is operable in both DDR1 (DDR) and DDR2 modes of operation. Typically the I/O buffers are located away from a central access point connecting the memory banks, and a data bus is provided to send all the data connecting these main parts of the memory chip.


In the DDR2 mode, the four pieces of input data must be synchronized to the correct input address. This is made more difficult because while the input data strobe is nominally aligned with the main clock, this alignment is not exact and typically has a +/−25% skew specification.


The data bus circuitry described in further detail below is logically correct and cannot be broken by ambiguity in the input data strobe during hi-Z periods at the start and end of write cycles. The additional power consumed by the data bus is low. The main clock-based signals are kept close to the main memory arrays and are not routed to the individual I/O buffer sites thereby saving area.


A 4-bit bus per I/O pad is used to connect the memory with the I/O block, but only two bits per I/O are utilized for writing. Four bits per I/O pad are preferred for reading.


Every time the input data strobe falls, the “last” two bits are transmitted over the bus. This eliminates the need for the precise counting of input data strobe pulses.


At the memory access point (the end of the data bus closest to the memory array and away from the I/O interface), signals based on the main chip clock determine the first two bits and the second two bits used for every given write address. The first two bits are temporarily stored for a cycle so they can be combined with the final two bits and be driven to the memory bus as a four-bit wide word for the actual write operation for that given address.


Two bits of the bus toggling every cycle consumes the same amount of power as all four bits if the bus is toggling on every other cycle.


Circuit diagrams associated with the data bus circuitry embodiment of the present invention are shown in FIGS. 9A, 9B and 17-20. These circuits are individually briefly described below. The interaction and operation of these circuits is described after the individual descriptions of the circuits, and also in conjunction with the descriptions of the timing diagrams of FIGS. 16 and 21-22.


A circuit diagram of the data input path for the integrated circuit memory is shown in FIGS. 9A and 9B. The circuit of FIGS. 9A and 9B includes input signals IR0 and IF0, control signals WDENLB, WDENL, SWENLB, SWENL, and output signals WD1R, WD0R, WD1F, and WD0F. A first circuit path includes input inverter I1, pass gate M1/M2, latch I2/I3, pass gate M7/M8, latch I4/I5, and output inverter I6. A second circuit path includes input inverter I7, pass gate M3/M4, latch 17/18, pass gate M5/M6, latch 19/110, and output inverter Ill.


A circuit diagram of the write “G-line” enable circuit is shown in FIGS. 17 and 18. The circuit of FIG. 17 includes input signals AWSCLM05, Q, and Q−. The circuit of FIG. 17 generates output signals WSEN and WGEN. A first circuit path includes NAND gate U1 and inverter I1. A second circuit path includes NAND gate U2 and inverter I2. The circuit of FIG. 18 includes input signals DDR2 and AWSCLM05 and JCKLWD. The circuit of FIG. 18 includes NAND gates U1 and U2 coupled to D-type flip-flop DFFNS.


A circuit diagram of the write cycle circuit is shown in FIG. 19. Input signals include JCLK, AWSCLM05, SWEN, ENSWEN, and GWENG. Output signals include JCLKWD and WDEN. The write cycle circuit includes NAND gate U1, inverter I1, inverter string 12-19 selected with metal mask path options R1-R10, NOR gates U2-U4, inverters I10 and I11, pass gate M1/2, and transistor M3. Inverter string I2-I9 can be programmed to provide a predetermined delay for the purpose of trapping write data on the IR0 and IF0 lines relative to the main chip clock. Pass gate M1/2, inverter I10, and transistor M3 forms a circuit for the purpose of disabling the SWEN input if the circuit is configured in DDR1 mode. (If DDR1 mode is set, “ENSWEN”=0).


The “G-line” write circuits are shown in FIG. 20. The “G-line” write circuit receives the JCCLK, JCLKWD, and WGEN signal, and generates a plurality of GWEN output signals. The “G-line” write circuit includes NAND gates U6-U9 coupled to inverters I6-I11, NAND gate U4 coupled to inverter I4, and NAND gate U5 coupled to inverter string 15, 110, and Ill. The purpose of the “G-line” write circuit is to generate the SWENL and GWENx signals. The SWENL signal is the clock that fires in the first half of a DDR2 operation, moving the IR0 and IF0 data, trapped by the WDENL signal, to the temporary WD0R and WD0F holding place. The GWENx signals fire in the subsequent cycle, driving the four WD bits onto the four G-lines. There are actually five GWEN signals in an embodiment of the invention. The GWENm signal is used for the masking bits, and has no address associated with it. The GWEN<0:3> signals are only enabled based on the state of the CLEV and CLOD addresses. For the DDR2 mode of operation, all of these addresses are high and all GWEN<0:3> signals fire. For the DDR1 mode of operation, only two of these four C1 signals are high and therefore only two of the G-lines are driven by their designated GWENx driver circuits. Which of the two C1EV or OD signals are activated is a function of the starting column address.


Note that the total gate count used in generating the SWENL or GWENx signals from JCCLK is identical. This keeps the timing of the SWENL and GWENx signals identically matched relative to the chip clock (JCCLK), even though they never fire in the same cycle. The top half of the circuit generates the SWENL signal. SWENL is enabled if WSEN and JCLKWD=1 (U1) and JCCLK=1 (U2) and ENSWEN=1 (U3). The SWENL signal is a redriven version of the SWEN signal (I2 and I3). If ENSWEN=0 (DDR1 mode), then SWEN=1 all the time (U3).


The GWENG signal is generated in the same way, except that the WGEN signal is used instead of the WSEN signal. The GWENG signal is then compared with the C1EV/OD addresses to generate the four GWEN<0:3> signals. The GWENM (masking) signal is a redriven version of the GWENG signal (I10 and I11).


In an embodiment of the present invention, the WGEN and WSEN signals are generated using the circuits shown in FIGS. 17 and 18. In the D-type flip-flop shown in FIG. 18, the D input is set to the inverted Q output (Q−) in the DDR2 mode of operation. This ensures that Q and Q− toggle every cycle (also known as a “toggle” flip-flop configuration). When Q is high and the write state is active (AWSCLM05), then WSEN is active (shown in FIG. 17), thereby generating the SWEN pulse. On the next cycle, Q− is active, so WGEN is valid, thereby generating a GWENx pulse. Thus, the SWEN and GWEN pulses fire on alternating cycles. It is important to note that, in the reset phase (non-writing phase), Q is high first, ensuring that the process always starts with a SWEN pulse, followed by a GWEN pulse. In the DDR1 mode of operation, the toggle flip-flop is disabled and both Q and Q− are high. The GWEN signals fires on every cycle in the DDR1 mode.


A timing diagram in FIG. 23 shows the timing diagram for the JCLKWD, AWSCLM05, Q, Q−, WSEN, WGEN, SWEN, and GWEN signals as described above.


Following a WRITE command, the actual number of cycles later that the input data is valid is variable. This is known as the “write latency”. Since this is variable, all the timing for the data bus circuit according to the present invention is relative to the cycles when input data is actually valid.


In operation, the DINFF circuit (shown in FIGS. 8A-C, with an accompanying timing diagram in FIG. 8D) traps the buffered data input signal from the I/O pin on each DQS edge, and outputs two new I-line data bits on each falling edge of the DQS signal. Only two of the four I-line bits per I/O are used in this write sequence.


The two I-lines (IR0 and IF0) are input to the circuit of FIGS. 9A and 9B. The WDENL signal goes low some fixed delay from the rising chip clock edge. This fixed delay is such that the I-data from the previous cycle is trapped at that point. This trapped data is referred to as WD1R and WD1F. The WDENL signal remains low, keeping the WD1R/F lines fixed, until the subsequent SWEN or GWEN signal is finished. The WDENL signal is controlled such that it only falls on cycles following those in which data was expected to be gathered and output by the DQS strobe.


On the next cycle after the first cycle associated with the DQS strobe, the SWEN signal pulses. The SWEN signal moves the WD1 R/F data to the WD0 R/F lines respectively. The WD0 R/F lines are the temporary holding positions while the I-lines toggle again in the next cycle. When SWEN returns low, WDENL is allowed to go high and the ILAT input is opened up again to receive more “I” data. The next cycle, two after the first DQS, WDENL again falls trapping the I-line data into the WD1 R/F positions. On this second cycle however, the GWEN signal pulses (not SWEN) and all four WD lines are driven to a “G” line, which connects to the array for writing into the selected sense amplifiers. Again, while the GWEN signal is active, the WDENL is held low so the WD data cannot change.


The process as described above repeats for all subsequent write commands.


It is important to note that the data bus circuit of the present invention works both in DDR1 and DDR2 operating modes. In the DDR2 operating mode, all four G-lines are used for writing, and in the DDR1 mode, only two G-lines are used for writing. Both modes only use two I-lines for writing.


A first timing diagram is shown in FIG. 16 showing the relationship between the WRC, DQS, YCLKW, AWSCLM05, WDEN, SWEN, and GWEN signals. The WRC signal is the “write command” issued by the user. The DQS signal is the data input strobe signal. The YCLKW signal coincides with the time that the particular data on the G-bus that is written to the array. The AWSCLM05 is an internal write timing signal, showing when to start the SWEN and GWEN signal process as described above.


A second timing diagram is shown in FIG. 21 showing the relationship between the CHIP CLOCK, WDENL, SWEN, and GWEN signals. The WDENL signal traps the data on the data bus. As previously described, the SWEN control signal is used to direct the first two bits of data into temporary storage, and the GWEN control signal is used to combine the last two bits of data and drive all four bits of data onto the data bus.


A third timing diagram is shown in FIG. 22, showing the relationship between the DQS strobe signal and the data for three conditions. In the first set of waveforms, the strobe signal is operating under normal conditions and the correct data is trapped. In the second set of waveforms, the strobe signal is early, but the correct data is still trapped. In the third set of waveforms, the strobe signal arrives late, and again the correct data is trapped.


A 4-bit bus connecting the I/O sections of the chip to the main memory interface has been described according to an embodiment of the present invention in which only two bits are used for writing. The technique of the present invention can be extended to an N-bit read/write bus, wherein two or a subset of M bits is used for writing.


Data is trapped using a fixed delay from the clock edge immediately following every cycle associated with a DQS input pulse. The trapped data is alternately stored for a cycle, and the stored data plus new trapped data is driven into to the array for writing. The trapping signal (WDENL) is held low while either the storing signal (SWEN) or the driving signal (GWEN) is active.


Based on a DDR1 configure signal, the SWEN signal is held permanently on, thus allowing the GWEN signal to fire every cycle so as to support DDR1 operation.


[The basic GWEN signal is combined with column address information to create several GWENx signals, one for each possible address combination such that in the DDR1 mode, only the necessary GWEN signals fire (and thus only the minimum number of G-lines toggle to save power). The GWEN address information defaults such that for DDR2 chips, all GWENx signals fire and all the G-lines toggle.


While there have been described above the principles of the present invention in conjunction with specific components, circuitry and bias techniques, it is to be clearly understood that the foregoing description is made only by way of example and not as a limitation to the scope of the invention. Particularly, it is recognized that the teachings of the foregoing disclosure will suggest other modifications to those persons skilled in the relevant art. Such modifications may involve other features which are already known per se and which may be used instead of or in addition to features already described herein. Although claims have been formulated in this application to particular combinations of features, it should be understood that the scope of the disclosure herein also includes any novel feature or any novel combination of features disclosed either explicitly or implicitly or any generalization or modification thereof which would be apparent to persons skilled in the relevant art, whether or not such relates to the same invention as presently claimed in any claim and whether or not it mitigates any or all of the same technical problems as confronted by the present invention. The applicants hereby reserve the right to formulate new claims to such features and/or combinations of such features during the prosecution of the present application or of any further application derived therefrom.

Claims
  • 1. An integrated circuit memory comprising: a data bus having a first end coupled to an I/O section and a second end coupled to a main memory interface;circuitry at the second end of the data bus for receiving control signals based on a main chip clock to determine a first group of two data bits and a second group of two bits to be used for every given write address.
  • 2. The integrated circuit memory of claim 1 further comprising circuitry for temporarily storing the first group of two data bits.
  • 3. The integrated circuit memory of claim 2 further comprising circuitry for combining the first group with the second group to provide a four-bit data word driven on the memory bus.
  • 4. The integrated circuit memory of claim 2 further comprising circuitry for allowing both DDR1 and DDR2 modes of operation.
RELATED APPLICATION

The present application claims priority from, and is a divisional of, U.S. patent application Ser. No. 11/177,537 filed on Jul. 8, 2005. The present invention is also related to co-pending U.S. patent application Ser. No. 10/794,782 filed Mar. 3, 2004 for: “Data Sorting In Memories” the disclosure of which is herein specifically incorporated in its entirety by this reference.

Divisions (1)
Number Date Country
Parent 11177537 Jul 2005 US
Child 12020352 US