1. Field of the Invention
This invention relates to a high performance domino STATIC RANDOM ACCESS MEMORY (SRAM) in which the core memory cells are organized into sub-arrays accessed by local bit lines connected to global bit lines, and more particularly to an improved domino SRAM.
2. Description of Background
A static semiconductor memory typically includes six-transistor cell in which four transistors are configured as a cross-coupled latch for storing data. The remaining two transistors are used to obtain access to the memory cell. During a read access, differential data stored in the memory cell is transferred to the attached bit line pair. A sense amplifier senses the differential voltage that develops across the bit line pair. During a write access, data is written into the memory cell through the differential bit line pair. Typically, one side of the bit line pair is driven to a logic low level potential and the other side is driven to a high voltage level. The cells are arranged in an array that has a grid formed of bit lines and word lines, with the memory cells disposed at intersections of the bit lines and the word lines. The bit lines and the word lines are selectively asserted or negated to enable at least one cell to be read or written to.
As will be appreciated by those skilled in the art, in prior art domino SRAM design the cells are arranged into groups of cells, typically on the order of eight to sixteen cells per group. Each cell in a group is connected to a local bit line pair. The local bit line pair for each group of cells is coupled to a global bit line pair. Rather than use sense amplifier to detect a differential voltage when reading a cell, in a domino SRAM the local bit lines are precharged and discharged by the cell in a read operation, which discharge is detected and determines the state of the cell. The local bit line, the precharge means, and the detection means define a dynamic node of the domino SRAM. Domino SRAMs of the type discussed here are explained in greater detail in U.S. Pat. Nos. 5,729,501, 6,058,065 and 6,657,886, which are incorporated herein by reference.
In a domino SRAM array, in the read operation the cell must produce a bit line voltage large enough to drive off the SRAM macro with no help from a sense amplifier. In this situation, the “write” operation becomes the primary design focus due to a situation called “Fast Read before Write”.
The problem occurs when a cell is slow to write but very fast to read, which can result in both of the local bit lines being pulled down to ground making the cell un-writable. For example, during a write to the opposite state, the “write transistor” in the “local bit selector” pulls down on one “local bit line”, while the cell pulls down on the opposite “local bit line”, resulting in both “local bit lines” being pulled down to ground, thereby preventing the cell from writing. A cell that is slow to write, but very fast to read, is caused by manufacturing process variations. Due to device parametric variations, the PFET could be skewed to the strong side and the NFET to the weak side, making the NFET pass gate more difficult to overcome the PFET in a write operation. If the device and metal capacitance is on the low side, and the NFET pass gate threshold voltage Vt is low, the cell could have a fast read.
A similar problem can occur when a timing mismatch takes place between the “row” select and the “column” select lines. For example, if the row line becomes active before the write signal arrives at the “local bit select”, the cell is in read mode before the write can occur, resulting in a similar situation where both “local bit lines” are pulled down to ground leaving the cell in a “un-writeable” state. (Remember, 6T cells are good at pulling down on their local bit lines, but poor at pulling up because their pass gates are NFETs.) This “Fast Read before Write” is not a problem in traditional SRAM designs using sense amp's because the “bit selector” used there has bit line clamps to prevent this from occurring. Also, the traditional approach has more cells on a bit line (i.e. on the order of 128-to-256 cells vs. 8-to-16 cell in our new approach) making the bit lines much more capacitive and much slower to develop a voltage differential; therefore, making it less likely to have the “Fast Read before Write” situation even without the clamps. One way to minimize the problem in Domino Read SRAMs is to “push-out” the “row” select signal to guarantee the “write data” is available to the local bit line before the cell is selected. However, some cells will still cause a “Fast Read before Write” because they are “slow to write but very fast to read” even though they are within the normal manufacturing window. This solution results in a performance slow-down and does not solve or prevent the un-writeable state.
An object of this invention is the provision of domino SRAM circuit that allows both the read function and the write function to be optimized. For example, larger write transistors can be used without affecting the read performance.
Another object of the invention is the provision of a domino SRAM circuit that prevents the cell from being in a state in which it cannot be written to because of a just previous read.
Briefly, this invention contemplates the provision of a domino SRAM in which active pull-up PFET devices overwhelm “slow to write but very fast to read” cells and allow the cells to recover from the timing mismatch situations described above. This approach allows the traditional “bit select” clamp to actively control the “local select” through “wired-or” PFET pull-up transistors. Separate read and write global “bit line” pairs allow the read and write performance to be optimized independently. For example, larger write transistors will not effect the read performance as is the case in the traditional “local bit select” approach where a single bit line pair is used for reading and delivering the write data to the SRAM cells. As a result, this solution does not slow down the read/write operation, and in fact it improves the performance over the traditional “local bit select” approach. This global dual bit line pair approach also prevents a fast reading cell from corrupting the “write data”.
The subject matter, which is regarded as the invention, is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The forgoing and other objects, features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the drawings in which:
Referring to
Referring now to
Consider a specific case of a fast read before write where a six-transistor cell is fast to read and slow to write due to mismatched pass gates. Assume the cell currently holds the value ‘1’ and a ‘0’ is to be written. The “row” and “column” select lines activate the write function. When the RowSelect line (also know as the word line) is activated, with a ‘1’ stored in the memory cell (not shown), the cell begins to pull down the ‘bic’ line in the local bit select. At the same time, the Local-Write-Control line is turned ON. For writing a ‘0’ into the cell, the global bit line glc is activated (i.e., pull high), driving the the NFETs N2 and N4 to pull down the ‘blt’ line. In this conventional design, both local bit lines blc and blt are falling to ground due to a fast read before write situation. The fast read and slow write situation could result in both bit lines falling to ground, and cause fighting between the cross coupled PFETs Pc1 and Pc2. Another drawback of this approach is that transistors P1 and P2 tend to amplify any dip or glitch on the local bit lines, which tends to aggravate the problem. As a result, a malfunction in write-thru (write data is passing out to the read port) will occur. Looking at this design under the situation discussed above, i.e. writing a ‘0’ into a 6T cell holding a ‘1’, where the cell is fast to read and slow to write, as ‘blc’ is pulled low, ‘glt’ is pulled high through P1. Since we are writing a ‘0’, P1 is fighting to pull ‘glt’ high while the write data is trying to pull it low. When glt is pulled high by P1, it turns on N1, further pulling blc down to enforce a “1” in the memory cell, therefore preventing a “0” to be written. Write malfunction thus occurs.
This same situation results when the “row” select signal arrives before the “column” select signal. Assuming the same parameters as above, the cell will begin to pull down on ‘blc’ because it is in read mode due to the arrival of the “row” select signal. The cell will continue to pull down ‘blc’ until the “column” select signal arrives to activate glc, and N2/N4 are allowed to pull down the ‘blt’ line. If the delta between the “row” and “column” signal is too great, we see the same result as above. There is no way for this circuit to pull up the correct side to perform a correct write operation.
The PFETs P4 and P5 also prevent the fast read before write situation due to a mismatch in the “row” and “column” selects signals. Assume we wish to write a ‘0’ into a cell that holds a ‘1’, as in the above example. If the “row” select signal arrives first, the cell begins to pull down on the ‘blc’ line in the local bit select. This line will continue to fall until the “column” select signal arrives and the write data can be written. However, without P4 and P5, if the “row” select is active for too long before the “column” select signal arrives, the cell may pull down ‘blc’ to a point where it cannot be recovered (pulled high). With the addition of P4 and P5, when the “column” select signal arrives, the data on the ‘glt’ line allows ‘blc’ to be pulled high through P5, while it writes a ‘0’ to ‘bit’ through N0, allowing a successful write to occur.
There are other advantages to this new design. Referring to
Referring now to
Another advantage of splitting the two global bits lines of a conventional design into the 4 bit lines (2 for read/2 for write) is that there is a performance gain. The bit lines now have less loading on them, because the devices needed to control reading/writing to the bit lines are now divided onto two separate groups of bit lines, making them faster. For example, if larger write transistors are needed, the read performance is not burdened by to additional capacitance in the new circuit described here.
The global bit line circuit shown in
This application contains subject matter that is related to the subject matter of the following co-pending applications, each of which is assigned to the assignee of this application, International Business Machines Corporation of Armonk, N.Y. Each of the below listed applications is hereby incorporated herein by reference in its entirety. High Speed Domino Bit Line Interface Early Read and Noise Suppression, Attorney Docket POU9 2004 0217; Global Bit Select Circuit With Dual Read and Write Bit Line Pairs, Attorney Docket POU9 2004 0214; Local Bit Select Circuit With Slow Read Recovery Scheme, Attorney Docket POU9 2004 0224; Global Bit Line Restore Timing Scheme and Circuit, Attorney Docket POU9 2004 1234.