Data reliability schemes for data storage systems

Description

BACKGROUND

1. Technical Field

This disclosure relates to data storage systems, such as solid state drives, for computer systems. More particularly, the disclosure relates to data reliability schemes for data storage systems.

2. Description of the Related Art

Many data storage components such as hard disks and solid state drives have certain advertised reliability guarantees that the manufacturers provide to customers. For example, certain solid state drive manufacturers guarantee a drive failure rate of 10⁻¹⁶or 10⁻¹⁷. To increase data reliability, a data redundancy scheme such as RAID (Redundant Arrays of Independent Disks) is used to increase storage reliability. The redundancy may be provided by combining multiple storage elements within the storage device into groups and providing mirroring and/or error checking mechanisms. For example, various memory blocks of a solid state storage device may be combined into stripe groups in which user data is stored.

BRIEF DESCRIPTION OF THE DRAWINGS

Systems and methods that embody the various features of the invention will now be described with reference to the following drawings, in which:

FIG. 1 illustrates a combination of a host system and a storage system that implements data reliability schemes according to one embodiment of the invention.

FIG. 2 is a diagram illustrating a super-page (S-page) including flash pages (F-pages) of multiple dies according to one embodiment of the invention.

FIG. 3 is a diagram illustrating an S-page including F-pages of multiple dies according to another embodiment of the invention.

FIG. 4 is a flow diagram illustrating a process of implementing a data reliability scheme according to one embodiment of the invention.

FIG. 5 is a flow diagram illustrating a process of data retrieval and recovery according to one embodiment of the invention.

FIG. 6 is a flow diagram illustrating a process of data retrieval and recovery according to another embodiment of the invention.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS

While certain embodiments are described, these embodiments are presented by way of example only, and are not intended to limit the scope of protection. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms. Furthermore, various omissions, substitutions, and changes in the form of the methods and systems described herein may be made without departing from the scope of protection.

In some embodiments, “coding” or “to code” data as used in this disclosure refer to the process of encoding data and/or the process of decoding data. For example, encoding and/or decoding can be performed using error correcting codes. In some embodiments, a non-volatile solid state memory array, such as flash memory, can be divided into physical data units, such as blocks, pages, etc. In some embodiments, a flash memory page (F-page) can correspond to a smallest unit of flash memory that can be programmed in a single operation (e.g., atomically) or as a unit. In some embodiments, a block can include multiple pages, and a block can correspond to a smallest unit of flash memory that can be erased in a single operation (e.g., atomically) or as a unit. In some embodiments, a flash memory page can comprise multiple groupings of memory locations (e.g., E-pages).

Overview

Disclosed data reliability schemes can reduce an amount of time needed to process (e.g., read) data stored in a data storage system and perform data recovery when necessary. In some embodiments of the present invention, a controller is configured to implement an inter page parity (e.g., Reed-Solomon code in a RAID configuration) for data stored in a data storage system. The inter page parity includes parity data determined or generated for multiple F-pages (Flash pages) of user data. In one embodiment, among a set of F-pages, F-pages that are deemed more reliable (e.g., having a highest quality) for storing data can be selected for storing the inter page parity data while other F-pages can be used for storing user data. In addition, the controller can pad user data before processing so that coded data units have the same size for processing. The controller can further manage the inter page parity using a granularity matching a size of F-pages or a finer granularity, such as the granularity of E-pages (Error Correcting Code pages), where an F-Page includes multiple such smaller E-Pages. In some embodiments, when the inter page parity is managed using a finer granularity than F-page size, data can be recovered in the event of a memory failure without using all user and parity data from each F-page of a redundancy sequence.

In some embodiments of the present invention, a controller is configured to implement both intra and inter page parity for data stored in a data storage system. The intra and inter page parity can enable two levels of protection for stored data. For example, intra F-page page parity data (e.g. low-density parity-check code) provides an initial redundancy in the event of a detected error correcting code (ECC) error associated with reading or decoding data stored in an E-page. The intra F-page parity data can initially be used to attempt to correct the detected ECC error for the E-page, and to recover user data stored in the E-page. If the controller is unable to correct the detected ECC error, the controller may use inter F-page parity data to attempt perform data recovery for the E-page.

In some embodiments of the present invention, a controller is configured to exhaust multiple options for attempting to correct ECC errors before returning a data read error. For instance, the controller may perform rereads of the E-page using adjusted voltage threshold levels and re-decode the data from the E-page using adjusted decoding parameters. In addition, the controller may reread or re-decode other E-pages to attempt to successfully recover the data of the E-page when performing an inter page parity recovery for the E-page.

System Overview

FIG. 1 illustrates a combination 100 of a host system 110 and a storage system 120 that implements data reliability schemes according to one embodiment of the invention. As is shown, a storage system 120 (e.g., hybrid hard drive, solid state drive, etc.) includes a controller 130 and one or more non-volatile memory (NVM) arrays 140. The NVM arrays 140 can be included on a single die or multiple dies. The controller 130 includes an error correction module 132, which can implement and maintain one or more data redundancy schemes for the storage system 120. For example, the error correction module 132 can implement an inter page parity and an intra page parity for a set of F-pages that include E-pages. The inter and intra page parities can be maintained at an F-page and/or E-page granularity level. Further, the error correction module 132 can assign pages of NVM dies for use in storing user data or inter page parity data depending on a quality (e.g., reliability) of the dies for storing data. In addition, the error correction module 132 can perform a two-phase data recovery approach when uncorrectable intra page errors are detected. In a first phase, multiple intra page reread or re-decode attempts of a page of memory can be performed. If the first phase is unsuccessful in recovering stored data, in a second phase, an inter page recovery can be performed that may include intra page reread or re-decode attempts directed to other pages of memory.

The controller 130 can receive data and/or storage access commands from a storage interface module 112 (e.g., a device driver) of the host system 110. Storage access commands communicated by the storage interface module 112 can include write and read commands issued by the host system 110. The commands can specify a logical block address in the storage system 120, and the controller 130 can execute the received commands in the NVM arrays 140. In one embodiment, data may also be stored in one or more magnetic media storage modules (not shown in FIG. 1). In one embodiment, other types of storage modules can be included instead of or in addition to NVM arrays 140 and/or magnetic media storage modules.

The storage system 120 can store data received from the host system 110 so that the storage system 120 can act as memory storage for the host system 110. To facilitate this function, the controller 130 can implement a logical interface. The logical interface can present to the host system 110 storage system memory as a set of logical addresses (e.g., contiguous address) where data can be stored. Internally, the controller 130 can map logical addresses to various physical memory addresses in the one or more of the non-volatile memory arrays 140 and/or other memory module(s).

The one or more of the non-volatile memory arrays 140 can be implemented using NAND flash memory devices. Other types of solid-state memory devices can alternatively be used, such as array of flash integrated circuits, Chalcogenide RAM (C-RAM), Phase Change Memory (PC-RAM or PRAM), Programmable Metallization Cell RAM (PMC-RAM or PMCm), Ovonic Unified Memory (OUM), Resistance RAM (RRAM), NOR memory, EEPROM, Ferroelectric Memory (FeRAM), Magnetoresistive RAM (MRAM), other discrete NVM (non-volatile memory) chips, or any combination thereof. In one embodiment, non-volatile memory arrays 140 preferably include multi-level cell (MLC) devices having multi-level cells capable of storing more than a single bit of information, although single-level cell (SLC) memory devices or a combination of SLC and MLC devices may be used.

Inter Page Parity Schemes

FIG. 2 is a diagram illustrating a super-page (S-page) 200 including F-pages from multiple NVM array dies according to one embodiment of the invention. The S-page 200 can advantageously provide a RAID scheme for supporting inter page parity and storing data across F-pages of multiple dies. In some embodiments in which one or more dies include multiple planes (e.g., dual-plane dies), the F-pages may be spread across the multiple planes of those dies as well.

In one embodiment, the S-page 200 includes one F-page from each of 128 NVM array dies identified as Die 0 through Die 127. Each die corresponds to a die of memory and includes multiple blocks (not shown) of storage, and each block further includes multiple F-pages. The S-page 200 can comprise F-pages selected from one or more dies and/or blocks within one or more dies. In one embodiment, this selection can be performed based on a physical location of F-pages of dies or a selection stored in firmware. Further, each F-page includes multiple codeword stripes, such as codeword stripe 212, for managing a RAID sequence of the F-pages of the S-page 200.

Although dies of a non-volatile memory array may have the same physical size, some dies can have a different memory quality (e.g., reliability) for storing data than other dies. For instance, particular dies may be better suited to store data without read or write errors than other dies due to differences in a memory manufacturing process and/or wear due to use. As a result, F-pages of different dies can advantageously store different amounts of user and parity data depending on the quality of the dies which the F-pages are part of.

Although not shown in FIG. 2, each F-Page can be protected by a default minimum amount of intra page parity data. When a die is deemed to have a high quality, little or no additional intra page parity data beyond that default minimum amount may be used to protect data stored on the F-Pages on that die. However, in other dies with less than optimal quality, their F-Pages may use some additional amount of intra page parity data. In the example shown in FIG. 2, each of the F-Pages is illustrated to have a plain portion 214, and for some F-Pages, an additional dotted portion 216. The dotted portion 216 can denote the portion used to store that additional intra page parity data beyond the default minimum.

In some embodiments, one or more dies having a highest quality for storing data can be selected to store inter F-page parity data for the S-page 200 while other F-pages can be used to store user data. For example, F-page 220 of Die 2 and F-page 230 of Die 4 have been determined to have the highest quality of the 128 F-pages of the S-page 200. As such, they do not need to have any additional intra page parity data beyond the default minimum (hence they have no dotted portions 216). In one embodiment, because they can accommodate the highest amount of data as compared to the other F-pages, the F-pages 220 and 230 are used to store inter page parity data (e.g., RAID parity data). The other F-pages, such as F-pages 210 and 240, as they have less storage area due to having additional intra page parity data, are used to store user data. In some embodiments, any suitable method for determining and tracking the quality of dies and pages can be used. For example, number of data errors can be tracked for dies and/or pages of dies. As another example, a wear level indicator for blocks of a die can be determined, tracked, and used for indicating the quality. As yet another example, number of data errors, wear level indicators, and one or more other indicators of memory quality can be combined into a single metric.

Coders used for determining inter F-page parity data may support F-pages having a set amount of data. To enable such coders to support the multiple F-page sizes of the S-page 200, data from F-pages may be padded with padding data up to a maximum F-page size before coding by the coder. For instance, a coder can support F-pages having the same number of octets of data as the F-pages 220 and 230 since the F-pages 220 and 230 may have the same maximum F-page size of the F-pages of the S-page 200. The other F-pages of the S-page 200, such as F-pages 210 and 240, can then be padded with padding data 216 to include the same number of octets of data as the F-pages 220 and 230. The padding data 216 can include a data set of entirely zeros, entirely ones, etc., or any known or pre-defined data pattern. It can be noted that the padding data 216 can be characterized as “virtual padding” since the padding data itself may not be written to the F-pages with user data. In addition, the padding data may not be used to generate the intra page parity data.

FIG. 3 is a diagram illustrating an S-page 300 including F-pages of multiple dies according to another embodiment of the invention. The S-page 300 of FIG. 3 is similar to the S-page 200 of FIG. 2 except that the illustrated padding data, such as padding data 312, is allocated to individual E-pages of F-pages. Advantageously, in some embodiments, because the padding data is allocated to individual E-pages, RAID recovery across F-pages can be performed on an E-page level without using all data from the F-pages of the S-page 300. Thereby, the amount of time to process each F-page and perform an inter F-page parity recovery for an E-page may be reduced for certain cases where there is one or two E-page errors. Also, a uniform buffer can be used regardless of the number of E-pages in an F-page since the buffer can be configured to process one set of E-pages at a time.

The S-page 300 includes one F-page from each of 128 dies identified as Die 0 through Die 127. Each die corresponds to a die of memory and includes multiple blocks (not shown) of storage, and each block further includes multiple F-pages. Each F-page of the S-page 300 includes four E-pages although, in other embodiments, the F-pages may include greater or fewer than four E-pages. In some embodiments, E-pages of a die can have the same size as other E-pages of the same die, but sizes of E-pages may differ from one die to another based on the memory quality of the dies. Each E-page may be protected by intra page parity data (not shown), such as LDPC-based parity data. The intra page parity data may be stored at the end of each E-Page or at the end of each F-Page. Again, as with FIG. 2, FIG. 3 does not show the intra page parity data. The intra page parity data for one E-page, such as E-page 310, can be determined based on the data of the E-page, excluding the padding data, such as padding data 312. By extension, the intra page parity data for an F-page may be determined based on the data of all the E-pages within that F-page, excluding any padding data. In this manner, the E-pages are aligned by E-page stripes, such as E-page stripe 380 shown in FIG. 3. In one embodiment, an E-page stripe comprises multiple codeword stripes (previously shown in FIG. 2).

The E-pages of the S-page 300 are illustrated with corresponding padding data, such as the padding data 312. The padding data includes data sets of entirely zeros in the example of FIG. 3. The amount of padding data for each E-page depends in part on an amount of user data capacity of each E-page based on the quality of the E-page. The amount of padding data can further be selected so that the total user data capacity of the E-page plus the corresponding padding data matches the size of one or more of the E-pages 330 and 350, which are used to store inter F-page parity data.

The user data and corresponding padding data of the E-pages of the S-page 300 can be used to determine corresponding inter F-page parity data for storage to E-pages of Dies 2 and 4. For example, the user data and corresponding padding data of each E-page 0, excluding the E-pages 330 and 350, can be processed by a coder and used to determine corresponding parity data for storage in the E-pages 330 and 350. If an uncorrectable error is detected during a read of an E-page of the S-page 300, corresponding E-pages of the S-page 300 can then be used to attempt to recover the data. For example, if a read of E-page 360 encounters an error that is not correctable by using its accompanying intra page parity data, then the corresponding E-Pages 310, 320, 330, etc. across the S-page (including the E-Pages with inter page parity data) may be used to recover the data in E-page 360, per a RAID data recovery operation.

Data Reliability Schemes

FIG. 4 is a flow diagram illustrating a process 400 of implementing a data reliability scheme according to one embodiment of the invention. The process 400 advantageously includes both an intra and inter page parity data reliability scheme. In some embodiments, the controller 130 and/or error correction module 132 of FIG. 1 is configured to perform the process 400.

At block 405, the process 400 determines dies to reserve for inter page parity data. One or more dies having a highest quality for storing data can be reserved, for instance, based on quality metrics provided by a manufacturer and/or memory quality information determined by the controller 130 of dies of a non-volatile memory. The number of dies reserved for inter page parity data can correspond to a number of pages with errors that can be corrected using inter page parity data. For instance, in the example of FIG. 2, since one F-page of each of Dies 2 and 4 are reserved for inter F-page parity data, up to two page errors can be corrected for the S-page 200. The remaining unreserved die F-pages can be used for storage of user data.

At block 410, the process 400 writes user data and intra page parity data to F-pages of unreserved dies. For example, in the example of FIG. 3, user data and corresponding low-density parity-check (LDPC) data can be written to E-pages of the F-pages of the unreserved dies, which include all dies except Dies 2 and 4. In some embodiments, one or more suitable ECC schemes can be used for generating intra page parity data. In some embodiments, the total number of octets of the user data plus corresponding intra page parity data can be the same for each E-page so that a coder may support one processing unit size with multiple different code rates (e.g., amounts of parity data per user data). The user data and intra page parity data can be written to the pages of the reserved dies in the order of the data in some embodiments or out of order in other embodiments.

At block 415, the written user data can be padded with padding data as described above, and the process 400 generates inter page parity data using the padded user data. After the inter page parity data is generated, at block 420, the process 400 writes inter page parity data with its own intra page parity data to pages of the reserved dies. The inter page parity data can correspond to parity for F-pages as discussed with respect to FIG. 2 or parity for E-pages of F-pages as discussed with respect to FIG. 3. For instance, in the example of FIG. 3, inter F-page Reed-Solomon (RS) parity data and corresponding intra F-page LDPC data can be written to the E-pages of the reserved Dies 2 and 4. In some embodiments, one or more suitable ECC schemes can be used for generating inter page parity.

Data Recovery Schemes

FIG. 5 is a flow diagram illustrating a process 500 of data retrieval and recovery according to one embodiment of the invention. Advantageously, the process 500 provides multiple opportunities for attempting to recover data after encountering an uncorrectable error, such as an uncorrectable read data error. In addition, recovery of the data can be performed without using all data units from each page, thereby reducing an amount of time needed to process each page and perform the recovery. In some embodiments, the controller 130 and/or error correction module 132 of FIG. 1 is configured to perform the process 500.

At block 505, the process 500 reads data from a page of memory. For example, the process 500 can perform a read of E-page 310 in response to a read command from the host system 110.

At block 510, the process 500 determines whether an uncorrectable intra page ECC error is detected during the page read. If an uncorrectable error is not detected, the process 500 moves to block 515. At block 515, the process 500 returns the data from the page.

On the other hand, if an uncorrectable error is detected, the process 500 moves to block 520. At block 520, the process 500 determines whether reread options are exhausted for attempting to read the page to retrieve the stored data. For example, the process 500 can determine whether one or more single-reads or multiple-reads of the page have been performed or whether one or more reads of the page using different voltage threshold levels have been performed. If the process 500 determines that the reread options are not exhausted for attempting to read the page, the process 500 moves to block 525. At block 525, the process 500 adjusts the read parameters for reading data from the page. The process 500 then returns to block 505, and the data is read from the page using the adjusted read parameters.

If the process 500 determines that the reread options are exhausted for attempting to read the page, the process 500 instead moves to block 530. At block 530, the process 500 performs inter page RAID recovery. The inter page RAID recovery can include decoding user data and inter page parity data from corresponding pages to attempt to recover the stored data for the page. For example, if an uncorrectable intra page ECC error is detected during the page read of E-page 310 of FIG. 3, the process 500 can read data from each E-page 0 of Dies 0 through 127 of the S-page 300, and a decoder can then be used to attempt to recover the data stored in the page using all of the read E-pages. In some embodiments, the pages can be read at block 530 in an order corresponding to the order in which user data is written to an S-page. In other embodiments, the pages can be read out of order relative to the order in which user data is written to an S-page.

At block 535, the process 500 determines whether the inter page RAID recovery is successful. The inter page RAID can be deemed successful if, for instance, the recovery for the page resulted in successfully determining data contents of the page. If the inter page RAID recovery is successful, the process 500 moves to block 540, and the process 500 returns the data from the page.

On the other hand, if the inter page RAID recovery is not successful, the process 500 moves to block 545. For example, using the example in FIGS. 2 and 3, there may be instances when more than two pages have uncorrectable ECC errors, which exceeds the correction capacity of the RAID recovery in which two pages of inter page parity are used. For instance, there may be a total of three pages with uncorrectable intra page ECC errors: a page “A” that triggers the process 500 and two other pages “B” and “C” that are read as part of the RAID recovery mechanism in block 530. Note that, in the examples in FIGS. 2 and 3, the typical RAID recovery process may involve reading 128 E-pages across the S-Page.

At block 545, the process 500 determines whether reread options are exhausted for attempting to read other pages (e.g., pages “B” and “C”) to retrieve the stored data in the page. For example, for one or more other pages having detected uncorrectable intra page ECC errors (e.g., page “A”), the process 500 can determine whether one or more single-reads or multiple-reads of the other pages have been performed or whether reads of the other pages (e.g., pages “B” and “C”) using different voltage threshold levels have been performed. If the process 500 determines that reread options are not exhausted for attempting to read other pages to retrieve the stored data in the page, the process moves to block 550.

At block 550, the process 500 rereads data from other pages (e.g., pages “B” and “C”) with detected intra page ECC errors using adjusted read parameters, such as by rereading the other pages using a single-read or multiple-read or an adjusted voltage threshold. As one example, if a detected uncorrectable intra page ECC error is triggered by the page read of E-page 310 of the S-page 300 and other uncorrectable intra page ECC errors are detected during page reads of E-pages 320 and 360 while attempting an inter page RAID recovery, the process 500 can reread data from E-page 320 or 360 to attempt to recover data stored in E-page 320 or 360, respectively. The process 500 then returns to block 530, and the process 500 performs inter page RAID recovery. If the reread of data from the other pages with detected intra page ECC errors (e.g., pages “B” and “C”) results in a successful recovery of data for one or more of the other pages such that the total number of uncorrectable pages is below the RAID recovery limit (i.e., the number inter parity pages used), the inter page RAID recovery at block 530 may now be successfully performed. For instance, continuing the example of this paragraph, if the reread of E-page 320 results in a successful data recovery, the inter page RAID recovery at block 530 may now be successful since two parity pages can be used to correct the two detected intra page ECC errors of the E-pages 310 and 360. This is because the location(s) of the error(s) are known in this example implementation of RAID.

On the other hand, if the process 500 determines that reread options are exhausted for attempting to read other pages to retrieve the stored data in the page, the process 500 moves to block 555, where the process 500 returns a read error for the page.

FIG. 6 is a flow diagram illustrating another process 600 of data retrieval and recovery according to one embodiment of the invention. The process 600 is generally the same as the process 500 of FIG. 5 except for the substitution of blocks 520, 525, 545, and 550 with blocks 620, 625, 645, and 650. Although the processes 500 and 600 are illustrated as separate approaches, the rereading portions of process 500 and the re-decoding portions of process 600 are options that can both be attempted or used at certain times and not other times according to system conditions. For instance, when a memory array is substantially new, either rereading or re-decoding may be performed; however, as a quality of memory diminishes due to wear, both rereading and re-decoding may be performed. In some embodiments, the controller 130 and/or error correction module 132 of FIG. 1 is configured to perform the process 600.

At block 620, the process 600 determines whether re-decode options are exhausted for attempting to determine the stored data for the page. For example, the process 600 can determine whether the data from the page has already been decoded using one or more different decoding parameters. If the process 600 determines that re-decode options are not exhausted for attempting to determine the stored data, the process 600 moves to block 625. At block 625, the process 600 decodes the data from the page using adjusted decoding parameters. The process 600 then returns to block 510, and the process 500 determines whether an uncorrectable intra page ECC error is detected when decoding the data from the page.

At block 645, the process 600 determines whether re-decode options are exhausted for attempting to decode other pages (e.g., pages “B” and “C” of the example of FIG. 5 where there may be a total of three pages with uncorrectable intra page ECC errors: a page “A” that triggers the process 600 and two other pages “B” and “C” that are read as part of the RAID recovery mechanism in block 530) to determine the stored data in the page. For example, for one or more other pages having detected uncorrectable intra page ECC errors (e.g., page “A”), the process 600 can determine whether data from the other pages (e.g., pages “B” and “C”) has already been decoded using one or more different decoding parameters. If the process 600 determines that re-decode options are not exhausted for attempting to decode other pages, the process moves to block 650.

At block 650, the process 600 decodes data from other pages (e.g., pages “B” and “C”) with detected intra page ECC errors using adjusted decoding parameters. The process 600 then returns to block 530, and the process 600 performs inter page RAID recovery. If the re-decode of data from the other pages with detected intra page ECC errors (e.g., pages “B” and “C”) results in a successful recovery of data for one or more of the other pages such that the total number of uncorrectable pages is below the RAID recovery limit (i.e., the number inter parity pages used), the inter page RAID recovery at block 530 may now be successfully performed.

Other Variations

Although this disclosure uses RAID as an example, the systems and methods described herein are not limited to the RAID redundancy schemes and can be used in any data redundancy configuration that utilizes striping and/or grouping of storage elements for mirroring or error checking purposes. In addition, although RAID is an acronym for Redundant Array of Independent Disks, RAID is not limited to storage devices with physical disks and is applicable to a wide variety of storage devices including the non-volatile solid state devices described herein.

In addition, those skilled in the art will appreciate that in some embodiments, other approaches and methods can be used. For example, the coding techniques disclosed herein can apply to codes other than or in addition to Reed-Solomon and LDPC codes. For example, a multi-dimensional XOR code can be used as the inter page parity code, and other codes like turbo codes or Bose, Chaudhuri, and Hocquenghem (BCH) codes can be used as the intra page parity code. Further, although E-pages and F-pages are discussed in this disclosure, E-pages and F-pages are illustrative working units for the data redundancy scheme and included herein as examples. The data redundancy scheme and its aspects can apply to other working units where F-pages may correspond to RAID stripes and E-pages may correspond to sub-units of RAID stripes. Accordingly, the scope of the disclosure is intended to be defined only by reference to the appended claims.

While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the protection. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms. Furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the protection. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the protection. For example, the systems and methods disclosed herein can be applied to hard disk drives, hybrid hard drives, and the like. In addition, other forms of storage (e.g., DRAM or SRAM, battery backed-up volatile DRAM or SRAM devices, EPROM, EEPROM memory, etc.) may additionally or alternatively be used. As another example, the various components illustrated in the figures may be implemented as software and/or firmware on a processor, ASIC/FPGA, or dedicated hardware. Also, the features and attributes of the specific embodiments disclosed above may be combined in different ways to form additional embodiments, all of which fall within the scope of the present disclosure. Although the present disclosure provides certain preferred embodiments and applications, other embodiments that are apparent to those of ordinary skill in the art, including embodiments which do not provide all of the features and advantages set forth herein, are also within the scope of this disclosure. Accordingly, the scope of the present disclosure is intended to be defined only by reference to the appended claims.

Claims

1. A data storage system, comprising: a non-volatile memory array comprising a plurality of data pages; anda controller configured to: store a plurality of data units and intra page parity units in each data page of a first set of data pages of the plurality of data pages and a plurality of inter page parity units in each data page of a second set of data pages of the plurality of data pages;in response to detecting an error using the intra page parity units of a first data page of the first set when a first data unit is read from the first data page, perform a recovery for data of the first data unit using corresponding data units from the data pages of the first set and corresponding inter page parity units from the data pages of the second set; andin response to determining that the recovery did not result in successfully determining the data of the first data unit because a number of data pages from the first set for which data read errors are detected exceeds an error correction capability for inter data page error correction: read one or more data units of the first set other than the first data unit using a modified read parameter or decode the one or more data units of the first set other than the first data unit using a modified decoding parameter, andin response to determining that the number of data pages from the first set for which the data read errors are detected no longer exceeds the error correction capability for inter data page error correction, successfully determine the data of the first data unit using the corresponding data units from the data pages of the first set and the corresponding inter page parity units from the data pages of the second set,wherein the controller is configured to perform the recovery without using all data units from the data pages of the first set and all inter page parity units from the data pages of the second set.
2. The data storage system of claim 1, wherein each data unit of the first set comprises a common number of octets of data after padding with a pre-defined data set, and each inter page parity unit of the second set comprises the common number of octets of data and corresponds to a parity based at least in part on the corresponding data units from the data pages of the first set.
3. The data storage system of claim 1, wherein each data page of the plurality of data pages comprises a flash page (F-page), and each data unit comprises an error-correcting code page (E-page).
4. The data storage system of claim 3, wherein the first set and the second set together form a super-page (S-page) comprising one F-page from each array die of a plurality of array dies.
5. The data storage system of claim 1, wherein the plurality of intra page parity units comprise low-density parity-check (LDPC) parity units, and the plurality of inter page parity units comprise Reed-Solomon (RS) parity units.
6. The data storage system of claim 1, wherein the controller is further configured to read the first data unit using the modified read parameter in response to detecting the error using the intra page parity units of the first data page when the first data unit is read from the first data page.
7. The data storage system of claim 6, wherein the modified read parameter comprises at least one of: a single-read/multiple-read and a voltage threshold adjustment.
8. The solid-state storage system of claim 1, wherein the controller is configured to read the one or more data units of the first set other than the first data unit using the modified read parameter in response to determining that the recovery did not result in successfully determining the data of the first data unit.
9. The data storage system of claim 1, wherein the controller is further configured to decode the first data unit using the modified decoding parameter in response to detecting the error using the intra page parity units of the first data page when the first data unit is read from the first data page.
10. The data storage system of claim 1, wherein the controller is configured to decode the one or more data units of the first set other than the first data unit using the modified decoding parameter in response to determining that the recovery did not result in successfully determining the data of the first data unit.
11. The data storage system of claim 1, wherein the controller is further configured to: assign individual data pages of the plurality of data pages to either the first set or the second set based on quality data about the individual data pages, anddetermine each inter page parity unit of the second set based at least in part on the corresponding data units from the data pages of the first set.
12. In a data storage system comprising a controller and a non-volatile memory array including a plurality of data pages, a method of performing data recovery, the method comprising: storing a plurality of data units and intra page parity units in each data page of a first set of data pages of the plurality of data pages and a plurality of inter page parity units in each data page of a second set of data pages of the plurality of data pages;in response to detecting an error using the intra page parity units of a first data page of the first set when a first data unit is read from the first data page, performing a recovery for data of the first data unit using corresponding data units from the data pages of the first set and a-corresponding inter page parity units from the data pages of the second set; andin response to determining that the recovery did not result in successfully determining the data of the first data unit because a number of data pages from the first set for which data read errors are detected exceeds an error correction capability for inter data page error correction: reading one or more data units of the first set other than the first data unit using a modified read parameter or decoding the one or more data units of the first set other than the first data unit using a modified decoding parameter, andin response to determining that the number of data pages from the first set for which the data read errors are detected no longer exceeds the error correction capability for inter data page error correction, successfully determining the data of the first data unit using the corresponding data units from the data pages of the first set and the corresponding inter page parity units from the data pages of the second set,wherein the recovery is performed using less than all of the data units from the data pages of the first set and less than all inter page parity units from the data pages of the second set.
13. The method of claim 12, wherein each data unit of the first set comprises a common number of octets of data after padding with a pre-defined data set, and each inter page parity unit of the second set comprises the common number of octets of data and corresponds to a parity based at least in part on the corresponding data units from the data pages of the first set.
14. The method of claim 12, wherein each data page of the plurality of data pages comprises a flash page (F-page), and each data unit comprises an error-correcting code page (E-page).
15. The method of claim 14, wherein the first set and the second set together form a super-page (S-page) comprising one F-page from each array die of a plurality of array dies.
16. The method of claim 12, wherein plurality of the intra page parity units comprise low-density parity-check (LDPC) parity units, and plurality of the inter page parity units comprise Reed-Solomon (RS) parity units.
17. The method of claim 12, further comprising reading the first data unit using the modified read parameter in response to detecting the error using the intra page parity units of the first data page when the first data unit is read from the first data page.
18. The method of claim 17, wherein the modified read parameter comprises at least one of: a single-read/multiple-read and a voltage threshold adjustment.
19. The method of claim 12, further comprising reading the one or more data units of the first set other than the first data unit using the modified read parameter in response to determining that the recovery did not result in successfully determining the data of the first data unit.
20. The method of claim 12, further comprising decoding the first data unit using the modified decoding parameter in response to detecting the error using the intra page parity units of the first data page when the first data unit is read from the first data page.
21. The method of claim 12, further comprising decoding the one or more data units of the first set other than the first data unit using the modified decoding parameter in response to determining that the recovery did not result in successfully determining the data of the first data unit.
22. The method of claim 12, further comprising: assigning individual data pages of the plurality of data pages to either the first set or the second set based on quality data about the individual data pages, anddetermining each inter page parity unit of the second set based at least in part on the corresponding data units from the data pages of the first set.

US Referenced Citations (177)

Number	Name	Date	Kind
5621660	Chaddha et al.	Apr 1997	A
5768535	Chaddha et al.	Jun 1998	A
6011868	van den Branden et al.	Jan 2000	A
6289471	Gordon	Sep 2001	B1
6856556	Hajeck	Feb 2005	B1
6895547	Eleftheriou et al.	May 2005	B2
6934904	Talagala et al.	Aug 2005	B2
7072417	Burd et al.	Jul 2006	B1
7126857	Hajeck	Oct 2006	B2
7129862	Shirdhonkar et al.	Oct 2006	B1
7149846	Hetrick	Dec 2006	B2
7263651	Xia et al.	Aug 2007	B2
7346832	Richardson et al.	Mar 2008	B2
7395490	Richardson et al.	Jul 2008	B2
7409492	Tanaka et al.	Aug 2008	B2
7430136	Merry, Jr. et al.	Sep 2008	B2
7447807	Merry et al.	Nov 2008	B1
7500172	Shen et al.	Mar 2009	B2
7502256	Merry, Jr. et al.	Mar 2009	B2
7509441	Merry et al.	Mar 2009	B1
7596643	Merry, Jr. et al.	Sep 2009	B2
7653778	Merry, Jr. et al.	Jan 2010	B2
7657816	Cohen et al.	Feb 2010	B2
7685337	Merry, Jr. et al.	Mar 2010	B2
7685338	Merry, Jr. et al.	Mar 2010	B2
7685374	Diggs et al.	Mar 2010	B2
7733712	Walston et al.	Jun 2010	B1
7739576	Radke	Jun 2010	B2
7765373	Merry et al.	Jul 2010	B1
7797611	Dholakia et al.	Sep 2010	B2
7809994	Gorobets	Oct 2010	B2
7814393	Kyung et al.	Oct 2010	B2
7898855	Merry, Jr. et al.	Mar 2011	B2
7912991	Merry et al.	Mar 2011	B1
7913149	Gribok et al.	Mar 2011	B2
7936603	Merry, Jr. et al.	May 2011	B2
7962792	Diggs et al.	Jun 2011	B2
8078918	Diggs et al.	Dec 2011	B2
8090899	Syu	Jan 2012	B1
8095851	Diggs et al.	Jan 2012	B2
8108692	Merry et al.	Jan 2012	B1
8122185	Merry, Jr. et al.	Feb 2012	B2
8127048	Merry et al.	Feb 2012	B1
8135903	Kan	Mar 2012	B1
8151020	Merry, Jr. et al.	Apr 2012	B2
8161227	Diggs et al.	Apr 2012	B1
8161345	Graef	Apr 2012	B2
8166245	Diggs et al.	Apr 2012	B2
8176284	Frost et al.	May 2012	B2
8176360	Frost et al.	May 2012	B2
8179292	Nakagawa	May 2012	B2
8181089	Fernandes et al.	May 2012	B1
8243525	Kan	Aug 2012	B1
8254172	Kan	Aug 2012	B1
8261012	Kan	Sep 2012	B2
8296625	Diggs et al.	Oct 2012	B2
8312207	Merry, Jr. et al.	Nov 2012	B2
8316176	Phan et al.	Nov 2012	B1
8339919	Lee	Dec 2012	B1
8341339	Boyle et al.	Dec 2012	B1
8375151	Kan	Feb 2013	B1
8392635	Booth et al.	Mar 2013	B2
8397107	Syu et al.	Mar 2013	B1
8407449	Colon et al.	Mar 2013	B1
8423722	Deforest et al.	Apr 2013	B1
8433858	Diggs et al.	Apr 2013	B1
8443167	Fallone et al.	May 2013	B1
8447920	Syu	May 2013	B1
8458435	Rainey, III et al.	Jun 2013	B1
8478930	Syu	Jul 2013	B1
8489854	Colon et al.	Jul 2013	B1
8503237	Horn	Aug 2013	B1
8521972	Boyle et al.	Aug 2013	B1
8549236	Diggs et al.	Oct 2013	B2
8583835	Kan	Nov 2013	B1
8601311	Horn	Dec 2013	B2
8601313	Horn	Dec 2013	B1
8612669	Syu et al.	Dec 2013	B1
8612804	Kang et al.	Dec 2013	B1
8615681	Horn	Dec 2013	B2
8638602	Horn	Jan 2014	B1
8639872	Boyle et al.	Jan 2014	B1
8683113	Abasto et al.	Mar 2014	B2
8700834	Horn et al.	Apr 2014	B2
8700950	Syu	Apr 2014	B1
8700951	Call et al.	Apr 2014	B1
8706985	Boyle et al.	Apr 2014	B1
8707104	Jean	Apr 2014	B1
8713066	Lo et al.	Apr 2014	B1
8713357	Jean et al.	Apr 2014	B1
8719531	Strange et al.	May 2014	B2
8724422	Agness et al.	May 2014	B1
8725931	Kang	May 2014	B1
8745277	Kan	Jun 2014	B2
8751728	Syu et al.	Jun 2014	B1
8769190	Syu et al.	Jul 2014	B1
8769232	Suryabudi et al.	Jul 2014	B2
8775720	Meyer et al.	Jul 2014	B1
8782327	Kang et al.	Jul 2014	B1
8788778	Boyle	Jul 2014	B1
8788779	Horn	Jul 2014	B1
8788880	Gosla et al.	Jul 2014	B1
8793429	Call et al.	Jul 2014	B1
20030037298	Eleftheriou et al.	Feb 2003	A1
20040098659	Bjerke et al.	May 2004	A1
20050204253	Sukhobok et al.	Sep 2005	A1
20050216821	Harada	Sep 2005	A1
20050246617	Kyung et al.	Nov 2005	A1
20060036925	Kyung et al.	Feb 2006	A1
20060036933	Blankenship et al.	Feb 2006	A1
20060085593	Lubbers et al.	Apr 2006	A1
20070124648	Dholakia et al.	May 2007	A1
20080141054	Danilak	Jun 2008	A1
20080155160	McDaniel	Jun 2008	A1
20080168304	Flynn et al.	Jul 2008	A1
20080195900	Chang et al.	Aug 2008	A1
20080244353	Dholakia et al.	Oct 2008	A1
20080282128	Lee et al.	Nov 2008	A1
20080301521	Gunnam et al.	Dec 2008	A1
20080316819	Lee	Dec 2008	A1
20090070652	Myung et al.	Mar 2009	A1
20090193184	Yu et al.	Jul 2009	A1
20090240873	Yu et al.	Sep 2009	A1
20090241008	Kim et al.	Sep 2009	A1
20090241009	Kong et al.	Sep 2009	A1
20090249159	Lee et al.	Oct 2009	A1
20090259805	Kilzer et al.	Oct 2009	A1
20100017650	Chin et al.	Jan 2010	A1
20100020611	Park	Jan 2010	A1
20100049914	Goodwin	Feb 2010	A1
20100083071	Shen et al.	Apr 2010	A1
20100100788	Yang et al.	Apr 2010	A1
20100107030	Graef	Apr 2010	A1
20100125695	Wu et al.	May 2010	A1
20100131819	Graef	May 2010	A1
20100174849	Walston et al.	Jul 2010	A1
20100250793	Syu	Sep 2010	A1
20100268985	Larsen et al.	Oct 2010	A1
20100275088	Graef	Oct 2010	A1
20100315874	Ghodsi	Dec 2010	A1
20110066793	Burd	Mar 2011	A1
20110099323	Syu	Apr 2011	A1
20110126078	Ueng et al.	May 2011	A1
20110179333	Wesel et al.	Jul 2011	A1
20110191649	Lim et al.	Aug 2011	A1
20110213919	Frost et al.	Sep 2011	A1
20110214037	Okamura et al.	Sep 2011	A1
20110231737	Dachiku	Sep 2011	A1
20110231739	Kim	Sep 2011	A1
20110239088	Post	Sep 2011	A1
20110246862	Graef	Oct 2011	A1
20110252294	Ng et al.	Oct 2011	A1
20110283049	Kang et al.	Nov 2011	A1
20110296273	Rub	Dec 2011	A1
20110302477	Goss et al.	Dec 2011	A1
20120072654	Olbrich et al.	Mar 2012	A1
20120079189	Colgrove et al.	Mar 2012	A1
20120084506	Colgrove et al.	Apr 2012	A1
20120084507	Colgrove et al.	Apr 2012	A1
20120260020	Suryabudi et al.	Oct 2012	A1
20120272000	Shalvi	Oct 2012	A1
20120278531	Horn	Nov 2012	A1
20120284460	Guda	Nov 2012	A1
20120324191	Strange et al.	Dec 2012	A1
20130054980	Frost et al.	Feb 2013	A1
20130132638	Horn et al.	May 2013	A1
20130145106	Kan	Jun 2013	A1
20130290793	Booth et al.	Oct 2013	A1
20140059405	Syu et al.	Feb 2014	A1
20140101369	Tomlin et al.	Apr 2014	A1
20140115427	Lu	Apr 2014	A1
20140133220	Danilak et al.	May 2014	A1
20140136753	Tomlin et al.	May 2014	A1
20140149826	Lu et al.	May 2014	A1
20140157078	Danilak et al.	Jun 2014	A1
20140181432	Horn	Jun 2014	A1
20140223255	Lu et al.	Aug 2014	A1

Foreign Referenced Citations (7)

Number	Date	Country
2008102819	Oct 2006	JP
1020100076447	Aug 2011	KR
100929371	Nov 2011	KR
2012058328	May 2012	WO
2014065967	May 2014	WO
2014084960	Jun 2014	WO
2014088684	Jun 2014	WO

Non-Patent Literature Citations (7)

Entry
International Search Report and Written Opinion dated Jan. 23, 2014 from PCT/US2013/062760, filed Sep. 30, 2013, 10 pages.
International Search Report and Written Opinion dated Jan. 23, 2014 from PCT/US2013/062760, International Filing Date: Sep. 30, 2013, Applicant: Western Digital Technologies, Inc., 10 pages.
Shayan S. Garani, U.S. Appl. No. 13/417,057, filed Mar. 9, 2012, 30 pages.
Guangming Lu, et. al. U.S. Appl. No. 13/718,289, filed Dec. 18, 2012, 27 pages.
Guangming Lu, et. al. U.S. Appl. No. 13/742,243, filed Jan. 15, 2013,(This application Claims Priority from U.S. Appl. No. 61/738,764, Dec. 18, 2012), 22 pages.
Shayan S. Garani, et. al., U.S. Appl. No. 13/725,965, filed Dec. 21, 2012, 31 pages.
Guangming Lu, et. al. U.S. Appl. No. 13/742,248, filed Jan. 15, 2013,(This application Claims Priority from U.S. Appl. No. 61/738,732, Dec. 18, 2012), 32 pages.

Related Publications (1)

	Number	Date	Country
	20140149826 A1	May 2014	US

Data reliability schemes for data storage systems

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

CPC

US Classifications

Field of Search

US

CPC

International Classifications