Flash and/or Phase Change Memories tend to fail after a limited number of write cycles. Embodiments of the present invention may pertain to minimizing and correcting read errors in such non-volatile storage. More specifically, such embodiments may pertain to a combination of techniques, such as modified wear leveling, varying orthogonal error correction with defect levels, and/or logical to physical address mapping utilizing a serial Content Addressable Memory (CAM).
Non-volatile (NV) memories, such as electrically erasable programmable read only memories (EEPROMs) or NOR and NAND Flash memories typically have limited write cycles before failing, and may exhibit adjacent bit failures after too many read cycles. In NV memory, reads tend to be more robust than writes, so most uses of these memories have been in read-intensive modes such as streaming audio and video, but recent developments have seen an increasing use of Flash Memories in Solid State Disks (SSDs), which require a higher frequency of writes and much higher data integrity. As such, there are numerous patents, such as U.S. Pat. No. 6,601,211 by Norman, granted Jul. 29, 2003, and U.S. Pat. No. 8,010,876 by Hsieh et al., granted Aug. 30, 2011, which describe including Error Correction Codes (ECC) in each page of memory to detect and correct flash read errors. In addition, Keeler suggests a two-dimensional application of ECC to blocks of data storage in U.S. Pat. No. 6,910,174, granted Jun. 21, 2005. The present inventor has been granted a number of patents, including U.S. Pat. No. 7,412,636, granted Aug. 12, 2008, U.S. Pat. No. 7,421,563 granted Sep. 2, 2008, and their continuations, all of which may be applicable to serial ECC generation and error correction.
Still, block erasures and page or word writes, where blocks may be much larger than the pages or words, may fail after a number of cycles. To extend this limited life, numerous patents, such as U.S. Pat. No. 6,732,221 by Ban, granted May 4, 2004, and U.S. Pat. No. 8,001,318 by Iyer et al., granted Aug. 16, 2011, describe techniques called wear leveling. Wear leveling may be used to reduce the maximum number of write cycles to any specific block of memory by writing to unused memory before erasing and reusing previously used memory.
As the non-volatile memory wears out, the errors increase. Individual pages may fail on write or read cycles, and whole blocks may fail on erasure. These failures may constitute an accumulation of individual bit failures after any given operation. As the number of these bit failures increase, ECC may no longer correct them. In some systems these pages or blocks may be marked bad and removed from the available storage. In such systems, the memory capacity appears to reduce as the NV memory begins to wear out. Some NV memory systems begin with failures, which may be marked as bad blocks or pages before being used.
Non-volatile memories, such as phase-change memories or flash memories, are typically slower than volatile memory such as Static RAM (SRAM) or Dynamic RAM (DRAM). In order to improve the read-write performance of the NV memory 10, the RAM 11 is employed to temporarily hold the recently accessed data, for transmission either to the NV memory 12 or to the external host 13.
One patent, U.S. Pat. No. 5,479,638 by Assar et al. granted Dec. 26, 1995, suggests applying traditional CAMs to a wear leveling technique to reduce the read latency. The inventor has previously patented a serial CAM structure in U.S. Pat. No. 7,369,422 granted May 6, 2008, which has the advantage of using a regular two-port memory.
Such techniques have resulted in a multi-chip solution, which tracks the use of NV memory, not the actual errors due to its use. It is known that these errors increase gradually, and eventually make the blocks and pages unusable, and it is also known that while errors increase with use, they may otherwise be very random. And yet none of the current methods measure the actual errors or attempt to continue to use defective storage, beyond simple error correction.
As such, various embodiments in this disclosure may address ways to improve the error measurement and storage selection to minimize data errors, while selectively increasing the error correction capability to correct increasingly defective NV memory, which may help to improve the life and usage of the NV memory.
An organization to facilitate such improved life and usage of the NV memory may contain various record types including Data, Erased, Fix and Deleted.
The next available record, selected by minimum error count, may employ serial CAM logic to perform the minimum selection.
The next block to erase may be chosen by selecting the block with a minimum combination of total error count and number of used records.
Serial CAM logic may also be used to perform the logical to physical address translation. The serial CAM logic may select one or many records with the same logical address, and the number of selected records may be used to detect errors in the logical address.
Pages of user data may be stored in Data records, and each page's ECC segments may be stored in Fix records, where the number of Fix records may be increased to correspondingly increase the error correction capability as needed to cover the known errors in the records. The increased number of ECC segments may be applied to successively smaller slices of data. The ECCs may correct single or multiple bits, and may be correcting vertical or horizontal slices of the data.
An initial number of Fix records for a page of data may be determined by the total error count of the data records, which may be calculated using serial CAM logic.
A process of repeatedly checking and correcting vertical and horizontal slices of data may be used to correct excessive vertical and/or horizontal defects, which may not be correctable by individual ECC segments. When such errors are found, more Fix records may be added to correct the defective bits.
Finally, ECC may be generated and/or used by means of a serial shift technique.
The invention will now be described in connection with the attached drawings, in which:
a and 10b are detailed diagrams of the logic in
a and 12b are detailed diagrams of the logic that may be used in some embodiments of the logic shown in
a and b are parts of a flowchart of a process to write a page of data into potentially defective records; and
Embodiments of the present invention are now described with reference to diagrams in
NV memories may contain blocks of memory, where each block may contain multiple pages. While in some NV memories, individual words may externally appear to be read or written, internally, an entire page may be read or written; and if rewritten, the original page may be marked for deletion, and a new, modified page may be written elsewhere in the memory. To reuse a deleted page, the entire block in which the page resides may generally need to be erased.
As such, reference is now made to
Typically, flash memories may have a range of page and block sizes, with, e.g., between 32 and 128 pages per block. Using this structure, one or more Data records with the same logical address may form a page 29, and one or more Data records with one or more Fix records may form a page 28, but in all cases, all records contained in a page may have the same logical address. The size of a block may be hardwired into the NV memory, but page size and corresponding number of pages per block may be set when the memory is formatted.
An embodiment of the present invention may include a non-volatile (NV) memory augmented with a serial content addressable memory (CAM) containing comparison logic for each record to detect one or more matches of logical addresses and minimum logic to find the least used available records.
Reference is now made to
Reference is now made to
The NV memory 41 may contain the header information for each record such that the bit line 42 for each record may be serially compared to a serially inserted logical address 43, thus selecting with a multiplexor 44 either the first or all matching logical records.
In another embodiment, to write a page in a modified “wear leveling” manner, by successively selecting the bits in the error count, from the most significant bit (MSB) to the least significant bit (LSB), the multiplexor 45 may be used to find and select the Erased records with the fewest errors.
Also, if insufficient Erased records exist to write the page, a block of records may be erased. The selection of the next block to erase may be a minimum function of the number of errors in the block and the number of Used records in the block.
In another embodiment, the number of Used records may be determined for each block by selecting the Used records, selecting the block 46, selecting the first match through multiplexor 44, and using the clear multiplexor 47 on successive clock cycles to clear each match while incrementing a counter (not shown) until the match line 48 transitions low. The counter then may then contain the count of the Used records for that block.
In yet another embodiment, slices of count logic 50 may be employed to speed up the process. After selecting the records with the fewest writes, the number of Used records may be determined for each block by selecting the Used records and incrementing through the block addresses 46 using the slices of count logic 50 to obtain the Used record count from which the block with the fewest number of Used records may be chosen. The block address lines 46 may be either all 1s when not addressing a block or either the positive or negative decode of the block address when used.
Reference is now made to
The number of errors contained within a block is equivalent to the sum of the errors residing in each of the records within the block. This may be determined by clearing a block count register in the control logic (35 in
In another embodiment, the error count logic 50 may be used to determine a block's error count. More specifically, this may be done by: a) clearing a block count register in the control logic; b) selecting a block (46 in
In another embodiment, the process of selecting the next block to erase may be to find the block with the fewest combination of errors and Used records, by performing the process according to the flowchart in
The selected block may be erased, each of the erased records may be read, their errors, the bits that failed to erase, may be counted, and the results, for each record, may be written back into the record's error count (22 in
In another embodiment, the Error Count Logic (60 in
Again referring to
In one embodiment of the present invention, the number of Data records per page may be used to check the correctness of the Data records' headers.
Whenever a Used logical address is selected, if the number of Data records is greater than the number of records in a page, then one may presume that one or more Data records incorrectly has this used logical address. In this case each Data record's header may be corrected with its header ECC, and if the Data record's logical address was corrected, the Data record may be deleted and rewritten into a new record. This procedure may be called a “corrected read”.
On the other hand, if the number of Data records is less than the number of records in a page, one or more of the Used logical addresses' Data records may have single bit errors in their logical addresses. In this case, the missing Data records may be found by selecting and counting the Data records for all logical addresses that differ from the specified logical address by a single bit. A “corrected read” may then be performed on all the Data records of those logical address(es) that contain extra Data records, thereby retrieving the “lost” data record(s). For example, given a four bit logical address of 5 [0101], one may start by inverting the lowest order bit, yielding an address of 4 [0100]. If there is an extra data record after selecting and counting the data records of logical address 4, a “corrected read” may be applied to all the data records of logical address 4. This process may continue with address 7 [0111], address 1 [0001] and address 13 [1101], by inverting the second, third and highest order bit of the initial logical address 5, or until all the “lost” data record(s) have been found. Note that the same result may be obtained by starting with any bit, providing that all bits are eventually inverted, as is shown in the flowchart depicted in
Reference is now made to
In another embodiment, if the need exists for greater error correction of the Data records additional Fix records may be generated.
Reference is now made to
In another embodiment, the ECC may be serially generated using a shift register (SR). Reference is now made to
Reference is now made to
In another embodiment, 2N bits of data may be corrected by inputting N bits of corrected ECC data into a Circular Shift Register (CSR). Reference is now made to
Reference is now made to
It should be understood that such SR and CSR structures may be any power of 2 in size, where N (i.e., the power of 2) may be any integer. It is further contemplated that such CSR structures may be used for transmitting and receiving serial data externally out of and into the NV memory system, or other devices, for example, but not limited to, integrated circuits, thereby providing error correcting capability during the transfer of data to and from the system. In another embodiment, a logical address may be concatenated with a record address such that each record has a unique address. In this case, the order of the record addresses may be used to determine which Fix records apply to which section of a page's data.
In another embodiment, upon reading a page of records, errors may be corrected by iteratively using the Vertical and Horizontal ECC segments to correct all the correctable portions of data. For this discussion, a “portion” of data is the amount of data that is common between a horizontal ECC segment and a vertical ECC segment. Often it is either a bit or a byte, in that each ECC segment detects and/or corrects bits or bytes of the data, but the ECC segments may address other sizes of data. The original ECC segment may be exclusive-ORed with ECC segment using defective data, the generated results of which may then address portions of data to be corrected, or may indicate that an error exists in the data. Single- or multi-bit error correcting ECC with a checksum may reliably detect an error in at least one more portion of data than it may address to correct. Unfortunately, when more errors exist in the slice of data than may be corrected or detected by the ECC, the results of error correction and/or detection may be unreliable. Excessive errors may cause the ECC to either address the wrong data or fail to detect an error. As shown in
In this process, each portion of data is examined, and if both the horizontal and vertical ECC segments are marked, and if either ECC error address matches the opposite coordinate of the portion of data being examined, then the portion may be corrected, and the error addresses and error detections may be regenerated. The “Page Error Correction” process may continue until all portions of data have been examined without correction.
To better illustrate this process, Table 2 contains an example, to which the invention is not limited, of a defective page of 8 data records, each with 8 portions of data in each record. The example has 8 horizontal ECC segments and 8 vertical ECC segments, one for each column and row of the two dimensional matrix of data portions, each with double error detection and single error correction. Table 2 below contains the 8 by 8 matrix of data errors marked with Xs, the ECC segments error detection marked as Ys, with the ECC error addresses shown in the bottom and right-most columns.
The “Page Error Correction” may proceed from the address of a first portion [0,0] until portion [3,0] without changes. At portion [3,0], both the Horizontal and Vertical ECC segments show an error, and the Horizontal ECC segment [0] shows an error address equal to the Vertical address [3], thereby branching to block 151 (of
The “O” marks the location [3,0] of the previous error. By regenerating the error addresses and error detection, the error address in the Horizontal ECC segment[0] may be cleared, and the Vertical ECC segment[3] may be changed to an erroneous error address 0. The process may continue correcting the portion of data at [3,1] because the Horizontal ECC segment[1] has a correct error address [3], and [2,2] because the Horizontal ECC segment[2] has a correct error address [2], as shown in Table 4 below:
The Vertical ECC segment [2] may then have an error address [3] causing the portion of data at [2,3] to be corrected. The next error at [4,3] may not be corrected because neither the Horizontal ECC segment[3] error address nor the Vertical ECC segment[4] error address, which are shown as [0] and [1] respectively, match [4] and [3], the address of the portion of data. Thereafter, the [6,3] portion of data may be corrected because the Vertical ECC segment [6] has an error address [3] which matches the horizontal address of the portion being corrected, as can be seen in Table 5 below:
Next [4,3], may be corrected, clearing the Horizontal ECC segment [3] error, and then [3,4] and [4,4] may be skipped over because neither of the Horizontal and Vertical directions have correct error addresses. The portion of data at [3,5] may then be corrected because the Horizontal ECC segment [5] has the correct error address [3]. Regenerating the error addresses may result in TABLE 6 below:
Finally, as a result of the regenerated, corrected error addresses, the portion at [3,4] may be corrected because the Vertical ECC segment [3] error address is [4], and the last two errors may be similarly corrected because their Horizontal error addresses are correct.
It should be noted that unlike the example shown above, some combinations of four or more portions with errors in the vertical or horizontal direction may be hidden from both detection and correction, as shown in Table 7 below:
In this case the errors in Vertical Segment[3] may be masked, which may result in the culmination of the “Page Error Correction” procedure with the state shown in Table 8 below:
To rectify this condition, further steps may be added to the “Page Error Correction” procedure as follows.
Hloc at 150 (again, referring to
Similarly, if the flag at 150 was set and not cleared prior to completing the “Page Error Correction” procedure, then errors may remain. These errors may be corrected by adding more Fix records. Therefore, in another embodiment, at the End 154 of the “Page Error Correction” procedure, if the flag was set at node 152 and not cleared by block 151, then an additional Fix record may be added, and the “Page Error Correction” procedure may be repeated.
In yet another embodiment, a process to write a page of data into potentially defective records using the “Page Error Correction” process may be seen in the flowchart in
The above techniques, separately and/or together, may gracefully degrade the available storage in a NV memory as the write errors increase. As the number of Fix records increases, the available records for data decreases, but the written pages may still be correctable.
In yet another embodiment, a Bad record type (0000 in Table 1) may be included, and error count limits may be defined for the records. If so, then if Fix record limits are exceeded during a page read or write, the Used records may be marked as Bad, and if error count limits are exceeded after an erase, the record may be marked as Bad. It should be noted that such processes may result in “retrieving” Bad records because their error count was reduced below the maximum in subsequent erases. As such, blocks and pages may also have maximum error counts that, if exceeded, may result in all the records in that block or page being marked Bad.
Finally, in another embodiment, given the amount of used storage, which may be defined as the count of the Data records, and the amount of unused storage, which may be defined as the count of the Deleted and Erased records, the amount of available storage may be defined as the sum of the Used and unused (i.e., not Used) storage, and the amount of bad storage may be defined as the count of Bad records; then, if the amount of available storage falls below some limit, or if the amount of bad storage rises above some limit, the control logic may send its host a bad memory warning.
It should be noted that the above procedures do not necessarily require that all records within a logically addressed page reside within the same block. As a result, erasing a block may result in displacement of part, but not necessarily all, of a logically addressed page's records. It is expected that, over time, a page's records may be physically scattered over a wide physical area.
It will be appreciated by persons skilled in the art that the processes and procedures presented hereinabove may be implemented either, in hardware, in software, or in some combination of both hardware and software. Furthermore, the hardware may be composed of one or more chips, where the control logic 35, serial compare logic 33, associated registers 36, 37 and 38, and interface logic 34 as shown in
It will be appreciated by persons skilled in the art that the present invention is not limited by what has been particularly shown and described hereinabove. Rather the scope of the present invention includes both combinations and sub-combinations of various features described hereinabove as well as modifications and variations which would occur to persons skilled in the art upon reading the foregoing description and which are not in the prior art.
This application is a continuation of U.S. patent application Ser. No. 13/659,368, filed on Oct. 24, 2012, which is incorporated by reference herein. This application is also related to U.S. patent application Ser. No. 13/667,352, filed on Nov. 2, 2012, having the same title as the present application, and also incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
3685014 | Hsiao et al. | Aug 1972 | A |
4316285 | Bobilin et al. | Feb 1982 | A |
5479638 | Assar et al. | Dec 1995 | A |
5553082 | Connor et al. | Sep 1996 | A |
5729559 | Bright et al. | Mar 1998 | A |
5732066 | Moriya et al. | Mar 1998 | A |
5930359 | Kempke et al. | Jul 1999 | A |
6601211 | Norman | Jul 2003 | B1 |
6640327 | Hallberg | Oct 2003 | B1 |
6732221 | Ban | May 2004 | B2 |
6910174 | Keeler | Jun 2005 | B2 |
7369422 | Cooke | May 2008 | B2 |
7412636 | Cooke | Aug 2008 | B2 |
7421563 | Cooke | Sep 2008 | B2 |
8001318 | Iyer et al. | Aug 2011 | B1 |
8010876 | Hsieh et al. | Aug 2011 | B2 |
20010048613 | Al-Shamma et al. | Dec 2001 | A1 |
20040123199 | Tan | Jun 2004 | A1 |
20040123223 | Halford | Jun 2004 | A1 |
20070050596 | Cooke | Mar 2007 | A1 |
20070195570 | Hager Cooke | Aug 2007 | A1 |
20080016427 | Namekawa et al. | Jan 2008 | A1 |
20080209121 | Cooke | Aug 2008 | A1 |
20090106523 | Steiss | Apr 2009 | A1 |
20100122025 | Fusella et al. | May 2010 | A1 |
20110022931 | Eleftheriou et al. | Jan 2011 | A1 |
20110138103 | Iiiadis et al. | Jun 2011 | A1 |
Entry |
---|
International Search Report and Written Opinion issued is corresponding PCT Patent Application No. PCT/US2013/066540, date of mailing May 2, 2014. |
Office Action issued Jun. 25, 2014 in U.S. Appl. No. 13/659,368. |
Number | Date | Country | |
---|---|---|---|
20140115423 A1 | Apr 2014 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13659368 | Oct 2012 | US |
Child | 13667298 | US |