Many data storage components such as hard disks and solid state drives have certain advertised reliability guarantees that the manufacturers provide to customers. For example, certain solid state drive manufacturers guarantee a drive failure rate of 10−16 or 10−17. To increase data reliability, a data redundancy scheme such as RAID (Redundant Arrays of Independent Disks) is used to increase storage reliability. The redundancy may be provided by combining multiple storage elements within the storage device into groups providing mirroring and/or error checking mechanisms. For example, various memory pages/blocks of a solid state storage device may be combined into data stripes in which user data is stored.
Systems and methods which embody the various features of the invention will now be described with reference to the following drawings, in which:
Overview
A common approach to overcome storage element failure is to use redundant RAID (mirroring, parity, etc.) to allow data recovery should one or more failures (e.g., a read failure) occur. Typically, a target number of storage elements (e.g., pages, blocks, etc.) per stripe is chosen to achieve a desired reliability at a given cost in storage overhead. In one embodiment of the present invention, a flash-based/solid-state storage system is configured to hold parity data in a temporary volatile memory such as RAM (Random Access Memory) and write such parity data to the non-volatile flash media when the full stripe's worth of data has been written to the non-volatile flash media.
However, there are situations when the parity for an open stripe is written to the flash media before the stripe is fully written. (Since the writing of the parity data “closes” a stripe, a stripe prior to the writing of its parity data is said to be an “open” stripe.) For example, in various embodiments of the invention, it may be appropriate to force a write of the parity data associated with an open stripe upon the detection of an uncorrectable data access error (e.g., a read or program error) on data in the stripe. In another example, the parity data may be written to the flash media when there is a detected power loss event and an open stripe present. Whatever the original cause that may have triggered the early write of parity data, embodiments of the invention are directed to writing additional data with the parity data to the flash media and then later using the additional data in data recovery operations. This approach allows the storage subsystem to easily detect the presence of a partial stripe and handle such a stripe accordingly. In one embodiment, the additional data is written in a spare area that is typically reserved for various system metadata and is used to indicate validity of pages in the partial stripe.
System Overview
The controller 150 in one embodiment in turn includes a RAID module 158 and a volatile memory 164, which may be implemented in, for example, RAM such as a DRAM or SRAM. The controller may alternatively be implemented in-whole or in-part as an ASIC, FPGA, or other device, which may but need not execute firmware. In another embodiment, the volatile memory 164 is outside of the controller 150 in the storage subsystem 140. In one embodiment, the RAID module 158 is configured to execute data access commands to maintain a data redundancy scheme in the storage subsystem. For example, the RAID module 158 may maintain data on which storage elements are assigned to which RAID stripes and determine how data are arranged in the data redundancy scheme (e.g., grouped into stripes with parity). In another embodiment, the various processes described herein may be executed by the RAID module 158 within the controller 150, by one or more components within the controller 150, or by a combination of both.
In one embodiment, the controller 150 of the storage subsystem 140 is configured to receive and execute commands from a storage interface 132 in a host system 130. The memory commands from the storage interface 132 may include write and read commands issued by the host system 130. As further shown in
Although this disclosure uses RAID as an example, the systems and methods described herein are not limited to the RAID redundancy schemes and can be used in any data redundancy configuration that utilizes striping and/or grouping of storage elements for mirroring or error checking purposes. In addition, although RAID is an acronym for Redundant Array of Independent Disks, those skilled in art will appreciate that RAID is not limited to storage devices with physical disks and is applicable to a wide variety of storage devices including the non-volatile solid state devices described herein.
Parity Handling Process
In block 206, the controller may also determine whether there is a power failure event occurring with an open stripe present. If so, the parity data for the open stripe is written to the non-volatile memory as part of the power failure handling process (block 208). In either case, the controller in block 210 may execute a recovery process using the newly written parity data. In one embodiment where RAID is implemented, the process is a standard RAID data recovery procedure. Although not shown, there are other situations besides those illustrated in blocks 204 and 206 where the controller may write the parity for an open stripe. For example, certain data transfer interface standards support a Force Unit Access (F.U.A.) command, which may require an immediate write to the non-volatile memory and trigger a corresponding writing of parity data.
Partial Stripe with Forced Parity Write Example
In this example, an uncorrectable ECC error occurred while a read was being performed on one of the six pages in Stripe 4, prompting the writing of the parity data to the non-volatile memory ahead of its scheduled write time. The result, as shown in
Parity Marking
While
Use of Metadata
In one embodiment, with the metadata in place, once a partial stripe is written with parity it does not need to be moved to a new location until the controller needs to execute a data relocation operation such as garbage collection. This is because the metadata allows the controller to detect the presence of a partial stripe and provides information to the controller to help it decipher which of the pages within the partial stripe are valid. The elimination of the need to further process the partial stripe once written (e.g., by re-writing the partial strip into a full stripe) saves time and thus does not introduce additional write amplification (additional write operations to accommodate write commands issued by a host system). By storing the metadata in the spare data area of a page, most of the time the controller can execute read/write operations on regular stripes normally without having to incur the additional penalty of looking up and check to see if the stripe is truncated. In one embodiment, as shown in
In one embodiment, with the afore-mentioned truncated stripe marking, the hardware can detect the marking and report back to firmware for special handling during a garbage collection read operation. Therefore, the normal read operation performance is not affected. The marking will indicate to the garbage collection process which pages are valid in a truncated stripe should an error arises. In one embodiment, the parity page is the last page in a stripe by default, which simplifies both normal read and write operations as well as defect management. The truncated page will be detected and handled accordingly.
Stripe 654 illustrates the result of a data recovery caused by a data access error on page D2 of stripe 652. The controller, upon encountering such an error, used the parity and data from pages D1 and D3 to recover D2. As shown, the recovered D2 is written after the parity page, as indicated by the label “D2 REC.” Accordingly, metadata 612 is updated (shown by metadata 614) to reflect the fact that the old D2 page is no longer valid (shown by a “0” bit in the second position) and that the valid D2 page is now after the old parity page (shown by a “1” bit in the fifth position). Note that metadata 614 also reflects the location of the new parity page for the stripe.
The updated metadata enables the controller to track the locations of the valid pages. Thus, for example, if a data access error occurs on D3, the controller will need to read the parity page and pages D1 and D2 to recover D3. The updated metadata 614 may serve as a bit mask to point the controller to the right location of the valid D2 page (which is in the fifth position rather than in the second position as originally written) as well as the new parity page. This enables the controller to quickly locate the needed data to recover D3.
While certain embodiments of the inventions have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms. Furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions. For example, those skilled in the art will appreciate that in various embodiments, the actual steps taken in the processes shown in
Number | Name | Date | Kind |
---|---|---|---|
5408644 | Schneider et al. | Apr 1995 | A |
5499337 | Gordon | Mar 1996 | A |
6467022 | Buckland et al. | Oct 2002 | B1 |
6523087 | Busser | Feb 2003 | B2 |
6959413 | Humlicek et al. | Oct 2005 | B2 |
7069382 | Horn et al. | Jun 2006 | B2 |
7200715 | Kleiman et al. | Apr 2007 | B2 |
7409492 | Tanaka et al. | Aug 2008 | B2 |
7640390 | Iwamura et al. | Dec 2009 | B2 |
7653778 | Merry, Jr. et al. | Jan 2010 | B2 |
7779294 | Corrado et al. | Aug 2010 | B2 |
7856528 | Frost et al. | Dec 2010 | B1 |
8321597 | Yu et al. | Nov 2012 | B2 |
8402217 | Burd | Mar 2013 | B2 |
20040015657 | Humlicek et al. | Jan 2004 | A1 |
20040123032 | Talagala et al. | Jun 2004 | A1 |
20050066124 | Horn et al. | Mar 2005 | A1 |
20050086429 | Chatterjee et al. | Apr 2005 | A1 |
20050177672 | Rao | Aug 2005 | A1 |
20060004957 | Hand, III et al. | Jan 2006 | A1 |
20060236029 | Corrado et al. | Oct 2006 | A1 |
20070268905 | Baker et al. | Nov 2007 | A1 |
20070283079 | Iwamura et al. | Dec 2007 | A1 |
20070294565 | Johnston et al. | Dec 2007 | A1 |
20070297265 | Kim et al. | Dec 2007 | A1 |
20080133969 | Manoj | Jun 2008 | A1 |
20080141054 | Danilak | Jun 2008 | A1 |
20080155160 | McDaniel | Jun 2008 | A1 |
20080229148 | Forhan et al. | Sep 2008 | A1 |
20080276124 | Hetzler et al. | Nov 2008 | A1 |
20090073762 | Lee et al. | Mar 2009 | A1 |
20090083504 | Belluomini et al. | Mar 2009 | A1 |
20090150599 | Bennett | Jun 2009 | A1 |
20090172335 | Kulkarni et al. | Jul 2009 | A1 |
20090204852 | Diggs et al. | Aug 2009 | A1 |
20090210744 | Kamalavannan | Aug 2009 | A1 |
20090248998 | Sato et al. | Oct 2009 | A1 |
20090327604 | Sato et al. | Dec 2009 | A1 |
20090327803 | Fukutomi et al. | Dec 2009 | A1 |
20100049914 | Goodwin | Feb 2010 | A1 |
20100064111 | Kunimatsu et al. | Mar 2010 | A1 |
20100088557 | Weingarten et al. | Apr 2010 | A1 |
20100088579 | Hafner et al. | Apr 2010 | A1 |
20100115175 | Zhuang et al. | May 2010 | A9 |
20100122115 | Olster | May 2010 | A1 |
20100169543 | Edgington et al. | Jul 2010 | A1 |
20100262773 | Borchers et al. | Oct 2010 | A1 |
20100281202 | Abali et al. | Nov 2010 | A1 |
20110035548 | Kimmel et al. | Feb 2011 | A1 |
20110126045 | Bennett | May 2011 | A1 |
20110173484 | Schuette et al. | Jul 2011 | A1 |
20110191649 | Lim et al. | Aug 2011 | A1 |
20110219259 | Frost et al. | Sep 2011 | A1 |
20110264843 | Haines et al. | Oct 2011 | A1 |
20110314218 | Bert | Dec 2011 | A1 |
20120079318 | Colgrove et al. | Mar 2012 | A1 |
20120110376 | Dreifus et al. | May 2012 | A1 |
20120151253 | Horn | Jun 2012 | A1 |
20120173790 | Hetzler et al. | Jul 2012 | A1 |
20120233406 | Igashira et al. | Sep 2012 | A1 |
20120246403 | McHale et al. | Sep 2012 | A1 |
Number | Date | Country |
---|---|---|
WO 2010049928 | May 2010 | WO |