The disclosure herein relates to memory systems, and more specifically to in-system memory repair apparatus and methods.
Error codes are used in a variety of signaling systems to detect and, in some cases, correct errors relating to data transmission and storage. The codes generally provide redundancy to the original data so that, when the data is encoded via a particular error code algorithm, a limited number of data errors may be identified and possibly corrected upon decoding. The redundant portions of the encoded data may take the form of checksums, parity bits, or the like, depending on the type of error code employed.
For memory systems that employ error codes, the overhead often employed with the encoding generally limits the effectiveness of the code to a single-bit error in a given word. As a result, only a certain number of hard errors, such as those caused by storage cell failures, may be acceptable for a given memory component before the component fails as a reliable device. The failures become even more costly when memory devices are assembled onto memory modules, and the modules discarded for failing to pass final testing.
Embodiments of the disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
Embodiments of memory systems, modules, buffer devices and memory devices, and associated methods are disclosed herein. One embodiment of a memory module includes a substrate, a memory device that outputs read data, and a buffer. The buffer has a primary interface for transferring the read data to a memory controller and a secondary interface coupled to the memory device to receive the read data. The buffer includes error logic to identify an error in the received read data and to identify a storage cell location in the memory device associated with the error. Repair logic maps a replacement storage element as a substitute storage element for the storage cell location associated with the error. This in-module repair capability prevents hard errors from recurring, thereby preserving the error correction capability for detecting and correcting one or more other errors. Moreover, by incorporating the repair capability in a buffer circuit on the memory module, the memory device and memory controller designs may remain unchanged, while the memory system benefits from the additional layer of error correction capability.
In a further embodiment, a method of operation in a memory module is disclosed. The method includes accessing a read data word in a group of storage cells in response to a read operation request from a memory controller. The read data word is then transferred from the group of storage cells along a secondary data bus and buffered. The buffering includes determining whether an error exists in the read data word and repairing a failure associated with the error. Information regarding the repairing is stored for a subsequent read operation. The buffered read data word is transferred to a memory controller along a primary data bus as the determining and repairing takes place in the buffer.
In yet another embodiment, a method of operation in a memory module is disclosed. The method includes receiving data from a memory controller along primary data paths and buffering the received data. The buffering includes comparing addresses associated with the data to stored address information associated with known failures. If the comparing identifies a correlation between the data addresses and the stored address information, a data bit corresponding to the correlation is extracted from the data word and stored in a substitute storage location. The data is then transferred for storage in a memory device along secondary data paths.
Referring to
With continued reference to
Further referring to
Further referring to
With continued reference to
Referring now to
In one specific embodiment, and with continued reference to
For the following error detection and repair circuitry descriptions below, the specific error coding algorithm employed to encode the data is a “Chipkill” error correction code having a (144, 128) format capable of correcting a single random error, or a four-bit burst error in a 128-bit word. A total of 144 bits are generated from a coding of a 128-bit data word, with the result being the original 128-bit data interspersed with 16 parity bits that, when decoded, generate an error syndrome. Similar Hamming-type coding schemes may be extended to 512 bit data chunks (576, 512), or higher. In general, decoding the syndrome following receipt of a data word allows for the detection of an error, and provides a pointer to the location of the error in the word. By detecting errors in the buffer, defective storage cells that form the basis for “hard” errors may be repaired for subsequent data reads to the defective cells. This ensures that the error correction coding generated at the controller does not become overwhelmed by errors that may develop over time, thereby maintaining its error coding strength. Further, by handling the error detecting and repair in the buffer, minimal changes to the circuitry in the memory device and/or controller are needed to achieve the desired error tracking and repair functionality.
Further referring to
Coupled to the data transfer path 308 are plural syndrome generation circuits 310A-310D. Each syndrome generation circuit includes a parity bit path 312 to route a portion of the overall parity bits (for this specific example, 4 bits) to a multiplier 314. A Kij polynomial coefficient register 316 provides a corresponding number (here 4) of coefficients to the multiplier 314 for multiplication with the extracted parity bits. The result from the multiplier 314 is then fed to a summer 318 which performs an exclusive-OR (XOR) operation. The summer 318 is disposed in the path of a 4-bit portion of the private bus 306 and receives the output of the multiplier 314 and a shifted 4-bit portion of a syndrome associated with a prior stage error decoder. The summer 318 acts as a shift register by outputting the accumulated 4-bits along the private bus to the next adjacent error decoder. For this specific example, employing four syndrome generation circuits in parallel for each error decoder enables the generation of a 16-bit syndrome for each 128-bit read data word. The accumulating and shifting functionality carried out by the summers allows for a relatively low-cost pipelining of the error syndromes associated with various read data words from different devices to propagate to the repair circuitry with little impact on performance of the memory system as a whole.
As noted above, and still referring to
Further referring to
For subsequent write operations to the faulty address, the address logic 318 uses a tag comparison circuit 324 to compare incoming addresses to known defective addresses stored in the address memory 320. When a “hit” is detected, indicating a matching address to a known faulty location, the bit designated for writing to the faulty cell is extracted via an extraction circuit 326 (disposed on each error decoder), and directed to the assigned substitute cell in the redundant memory 322. For data reads, an insertion circuit 328 accesses the bit in the redundant memory 322 and inserts it into the proper read data word location prior to the read data word being transferred across the DQ data path 308. For some embodiments, compare circuitry (not shown) may be employed to compare the previously determined defective bit with the repair bit to more accurately determine the presence of a “hard” or “soft” error. In this manner, if a “soft” error was involved, and did not repeat, the spare bit location may be used elsewhere, thereby freeing redundant resources.
The memory architecture above lends itself well to carrying out repairs at the manufacturing stage, such as when a memory module undergoes final assembly and test, or during normal operation as a main memory system for computing resources. Failures identified during manufacture, such as in final module assembly and test, may be repaired, and the repair information stored in the nonvolatile memory 228, until retrieved upon usage in a memory system operating environment.
One embodiment of the error tracking and repair step 416 from
Further referring to
Further referring to
The memory module 200 described above is shown as a specific implementation having two shared buffer circuits 212 and 214. In some embodiments, a single buffer circuit may be shared across all of the memory devices. An alternative embodiment of a buffered memory module 800 that employs dedicated buffer circuits 802 for each memory device, often referred to as “micro-buffers”, is shown in
For other embodiments the employ a large number of memory modules, the error tracking and repair functionality may be shared across two or more modules. In such scenarios, the private syndrome bus that interfaces the DQ data paths within the buffer circuits may be extended from one module to another via an appropriate routing scheme.
Those skilled in the art will appreciate that the various embodiments described herein improve error correction abilities for memory systems that employ error correction schemes. For embodiments that allow for corrections of an additional bit for a given data word, error coverage may be extended by several orders of magnitude. Further, for some of the embodiments described herein, changes to the memory devices or the memory controller may be minimized, and instead incorporated into a buffer circuit that lends itself well to logic process technologies.
When received within a computer system via one or more computer-readable media, such data and/or instruction-based expressions of the above described circuits may be processed by a processing entity (e.g., one or more processors) within the computer system in conjunction with execution of one or more other computer programs including, without limitation, net-list generation programs, place and route programs and the like, to generate a representation or image of a physical manifestation of such circuits. Such representation or image may thereafter be used in device fabrication, for example, by enabling generation of one or more masks that are used to form various components of the circuits in a device fabrication process.
In the foregoing description and in the accompanying drawings, specific terminology and drawing symbols have been set forth to provide a thorough understanding of the present invention. In some instances, the terminology and symbols may imply specific details that are not required to practice the invention. For example, any of the specific numbers of bits, signal path widths, signaling or operating frequencies, component circuits or devices and the like may be different from those described above in alternative embodiments. Also, the interconnection between circuit elements or circuit blocks shown or described as multi-conductor signal links may alternatively be single-conductor signal links, and single conductor signal links may alternatively be multi-conductor signal links. Signals and signaling paths shown or described as being single-ended may also be differential, and vice-versa. Similarly, signals described or depicted as having active-high or active-low logic levels may have opposite logic levels in alternative embodiments. Component circuitry within integrated circuit devices may be implemented using metal oxide semiconductor (MOS) technology, bipolar technology or any other technology in which logical and analog circuits may be implemented. With respect to terminology, a signal is said to be “asserted” when the signal is driven to a low or high logic state (or charged to a high logic state or discharged to a low logic state) to indicate a particular condition. Conversely, a signal is said to be “deasserted” to indicate that the signal is driven (or charged or discharged) to a state other than the asserted state (including a high or low logic state, or the floating state that may occur when the signal driving circuit is transitioned to a high impedance condition, such as an open drain or open collector condition). A signal driving circuit is said to “output” a signal to a signal receiving circuit when the signal driving circuit asserts (or deasserts, if explicitly stated or indicated by context) the signal on a signal line coupled between the signal driving and signal receiving circuits. A signal line is said to be “activated” when a signal is asserted on the signal line, and “deactivated” when the signal is deasserted. Additionally, the prefix symbol “I” attached to signal names indicates that the signal is an active low signal (i.e., the asserted state is a logic low state). A line over a signal name (e.g.,
While the invention has been described with reference to specific embodiments thereof, it will be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. For example, features or aspects of any of the embodiments may be applied, at least where practicable, in combination with any other of the embodiments or in place of counterpart features or aspects thereof. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
This application is a Continuation of U.S. patent application Ser. No. 16/872,929, filed May 12, 2020, entitled MEMORY REPAIR METHOD AND APPARATUS BASED ON ERROR CODE TRACKING, now U.S. Patent No. 11,385,959, which is a Continuation of U.S. patent application Ser. No. 15/829,682, filed Dec. 1, 2017, entitled MEMORY REPAIR METHOD AND APPARATUS BASED ON ERROR CODE TRACKING, now U.S. Pat. No. 10,664,344, which is a Continuation of U.S. patent application Ser. No. 15/250,677, filed Aug. 29, 2016, entitled MEMORY REPAIR METHOD AND APPARATUS BASED ON ERROR CODE TRACKING, now U.S. Pat. No. 9,836,349, which is a Non-Provisional that claims priority to U.S. patent application Ser. No. 14/285,481, filed May 22, 2014, entitled MEMORY REPAIR METHOD AND APPARATUS BASED ON ERROR CODE TRACKING, now U.S. Pat. No. 9,430,324, which is a Non-Provisional that claims priority to U.S. Provisional Application No. 61/827,383, filed May 24, 2013, entitled MEMORY REPAIR METHOD AND APPARATUS BASED ON ERROR CODE TRACKING, all of which are incorporated herein by reference in their entirety.
Number | Name | Date | Kind |
---|---|---|---|
4139148 | Scheuneman et al. | Feb 1979 | A |
4748627 | Ohsawa | May 1988 | A |
5459742 | Cassidy et al. | Oct 1995 | A |
5754753 | Smelser | May 1998 | A |
6240525 | Chiang | May 2001 | B1 |
6601194 | Dahn et al. | Jul 2003 | B1 |
6973613 | Cypher | Dec 2005 | B2 |
6996766 | Cypher | Feb 2006 | B2 |
7188296 | Cypher | Mar 2007 | B1 |
7193895 | Jin | Mar 2007 | B2 |
7379361 | Co et al. | May 2008 | B2 |
7404118 | Baguette et al. | Jul 2008 | B1 |
7475326 | Hillman | Jan 2009 | B2 |
7487428 | Co et al. | Feb 2009 | B2 |
7768847 | Ong et al. | Aug 2010 | B2 |
7949908 | Yoel | May 2011 | B2 |
8010875 | Gara et al. | Aug 2011 | B2 |
8015371 | Araki | Sep 2011 | B2 |
8015438 | Bruennert et al. | Sep 2011 | B2 |
8024638 | Resnick et al. | Sep 2011 | B2 |
8208325 | Urakawa et al. | Jun 2012 | B2 |
8315116 | Kim et al. | Nov 2012 | B2 |
8386833 | Smith et al. | Feb 2013 | B2 |
8412978 | Flynn | Apr 2013 | B2 |
8436344 | Seo et al. | May 2013 | B2 |
8464090 | Nagpal | Jun 2013 | B2 |
8640006 | Carman et al. | Jan 2014 | B2 |
8687444 | Ide et al. | Apr 2014 | B2 |
8832528 | Thatcher | Sep 2014 | B2 |
8850296 | Weingarten et al. | Sep 2014 | B2 |
8913449 | Chung | Dec 2014 | B2 |
8929165 | Son et al. | Jan 2015 | B2 |
8934311 | Yu | Jan 2015 | B2 |
8984372 | Gandhi | Mar 2015 | B2 |
9043549 | Chiang | May 2015 | B2 |
9087613 | Sohn et al. | Jul 2015 | B2 |
9201717 | Yano | Dec 2015 | B2 |
9223649 | Bisen | Dec 2015 | B2 |
9455745 | Tantos | Sep 2016 | B2 |
20020046358 | Terzioglu | Apr 2002 | A1 |
20050015654 | Marr | Jan 2005 | A1 |
20050039073 | Hartmann | Feb 2005 | A1 |
20070121370 | Ellis et al. | May 2007 | A1 |
20070255981 | Eto | Nov 2007 | A1 |
20120281348 | Harashima et al. | Nov 2012 | A1 |
20130051158 | Matsuo | Feb 2013 | A1 |
20130058145 | Yu et al. | Mar 2013 | A1 |
20140056082 | Park | Feb 2014 | A1 |
Entry |
---|
Sun, Hongbin et al., “Cost-efficient built-in repair analysis for embedded memories with on-chip ECC,” 2011 1st International Symposium on Access Spaces (ISAS), pp. 95, 100, Jun. 17-19, 2011. 6 pages. |
Wu, Tze-Hsin et al. “A memory yield improvement scheme combining built-in self-repair and error correction codes,” 2012 IEEE International Test Conference (ITC), pp. 1-9, Nov. 5-8, 2012. 9 pages. |
Number | Date | Country | |
---|---|---|---|
20230028438 A1 | Jan 2023 | US |
Number | Date | Country | |
---|---|---|---|
61827383 | May 2013 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16872929 | May 2020 | US |
Child | 17852272 | US | |
Parent | 15829682 | Dec 2017 | US |
Child | 16872929 | US | |
Parent | 15250677 | Aug 2016 | US |
Child | 15829682 | US | |
Parent | 14285481 | May 2014 | US |
Child | 15250677 | US |