1. Technical Field
This disclosure relates to storage devices, which can include disk drives and solid state memory subsystems, for example. More particularly, this disclosure relates to techniques for managing metadata in a storage device to improve drive performance.
2. Description of the Related Art
Storage subsystems such as disk drives, solid state memories, and the like, often utilize logical-to-physical mappings to store data. Data is accessed using logical addresses from the mapping which correspond to physical locations on the memory device.
The storage subsystem may access the logical-to-physical mapping relatively frequently in order to locate data. Thus, the subsystem often stores a version of the logical-to-physical address mapping in a relatively fast memory (e.g., a volatile memory such as a DRAM). A copy of the mapping is also typically kept in non-volatile memory. This allows the subsystem to retrieve the mapping on power up, for example. Maintaining the logical-to-physical address mapping and restoring the mapping can be complex and resource intensive tasks.
Embodiments described herein include systems and methods for maintaining and/or recovering a logical-to-physical address mapping of a storage subsystem. Certain of these embodiments improve system performance by reducing resource and time consumption involved in reconstructing a logical-to-physical mapping, at power-up, for instance. And, in some cases, the techniques described herein reduce the amount of resources (e.g., non-volatile memory accesses) utilized in maintaining the mapping. Specific embodiments of systems and processes will now be described with reference to the drawings. This description is intended to illustrate specific embodiments of the inventions, and is not intended to be limiting. Thus, nothing in this description is intended to imply that any particular component, step or characteristic is essential. The inventions are defined only by the claims.
System Overview
The non-volatile memory 108 can include at least one non-volatile memory device, which can be a hard-disk, a solid-state memory, some other type of addressable storage subsystem, or any combination thereof. The non-volatile memory 108 is arranged in a plurality of addressable memory locations which can be organized in a variety of manners. In one embodiment, the non-volatile memory 108 is arranged in a plurality of zones each corresponding to a plurality of memory locations. As one example, where the subsystem includes a hard-disk, the zones may correspond to sectors. The data may additionally be organized in units of further granularity. For instance, the sectors may each include a plurality of tracks, which in some cases can overlap one another in a shingled fashion.
The subsystem 100 maintains a first copy 110 of a logical-to-physical address mapping in the non-volatile memory 108. As shown, the logical-to-physical address mapping 108 may be referred to as a translation table. The first copy 110 of the translation table may reside in a dedicated portion of the non-volatile memory 108 in certain cases (e.g., a physically or logically contiguous set of addresses). In alternative configurations, the first copy 110 is distributed across disparate physical and/or logical portions of the memory 108.
The controller 106 is in communication with the non-volatile memory 108 and with the interface 104, and generally controls the operation of the subsystem 100. The controller 106 may include one or more microprocessors executing firmware code, field-programmable gate arrays (FPGAs), application-specific circuitry, or a combination thereof. Firmware may be stored in any appropriate type of non-transitory computer readable medium, such as a solid state memory device.
The controller 106 can further include or be otherwise be associated with a second copy 112 of the translation table. The storage subsystem 100 can include a second memory 113 that is different than the non-volatile memory 108, which is a volatile memory (e.g., DRAM) in certain embodiments. The second copy 112 of the mapping is stored in the second memory 113. In some implementations, the second memory 113 is a non-volatile memory, or in some other memory that is separate from or different than the non-volatile memory 108. In general, the second memory 113 can have significantly faster memory access times as compared to the non-volatile memory 108.
The controller 106 receives commands via the interface 104 from the host system 102. The commands can include write commands, read commands, erase commands, etc. Changes to the translation table can occur during system operation and are tracked as change data 114 in the second memory 113. For instance, one or more commands from the host system 102 may direct the controller 106 to update the translation table, or the controller 106 may itself initiate changes to the translation table.
In some embodiments, the change data 114 is stored separately from the initial version of the second copy 112 of the translation table. In one example scenario, the controller 106 accesses the first copy 110 of the translation table upon power up, and generates the second copy 112 based on the first copy 110. At this point, the first and second copies 110, 112 are identical or substantially identical. During system operation, changes to the translation table are tracked as change data 114 separately from the second copy 112 of the translation table. In the example case, the change data 114 is stored in the second memory 113 along with the second copy 112, although another memory could be used in other configurations. In other embodiments, the second copy 112 of the translation table is updated as the changes occur, and outdated entries in the second copy 112 are overwritten. In such cases, a flag or other appropriate mechanism may be associated with the data to indicate that the entries in the second copy 112 have been changed with respect to the first copy 110 that is stored in the non-volatile memory 108.
Where the change data 114 is stored in volatile memory, the change data 114 will be lost when the subsystem 100 is powered down. Thus, in order to be able to reconstruct the translation table on power up to reflect changes to the table, the controller 108 copies the change data 114 to the non-volatile memory 108. However, while the changes to the translation table may be tracked in the relatively fast memory 113 generally as they occur, the subsystem 100 may copy the change data to the non-volatile memory 108 at relatively less frequent intervals. Because the non-volatile memory 108 has relatively slower access speeds than the memory 113, this approach can improve system performance. In other cases, the change data 114 is copied to the non-volatile memory 108 as the changes occur.
On power up, the controller 106 uses the first copy 110 of the translation table along with the translation table change data that was copied to the non-volatile memory 108 to reconstruct the translation table. The controller 106 updates the first copy 110 to reflect the reconstructed table, and similarly updates the second copy 112. As will be described in greater detail, proper selection of the scheme used to manage the translation table changes in the non-volatile memory 108 can advantageously reduce the time required to reconstruct the table on power up.
As shown, the metadata units 204 can also include a “sequence number”. The sequence number generally provides an indication of whether the data written to the particular metadata unit 204 and corresponding user data segment 206 is current. For example, the sequence number may correspond to the currently active zone. The sequence number for a particular metadata unit 204 may be updated with the current sequence number when data is written to the metadata unit 204 (and corresponding user data segment 206). Thus, metadata units 204 having outdated sequence numbers are from previous, now inactive zones. To determine whether a metadata unit includes current data, the controller 106 can additionally maintain a global sequence number corresponding to the currently active zone, and compare a retrieved sequence number from a particular metadata unit 204 to the global sequence number. If the global sequence number and retrieved sequence number match, the particular metadata unit 204 and user data segment 206 include current data. If not, they were written as part of a previous, inactive zone. The sequence number can be incremented when a new write zone is opened up in some cases, upon a translation table flush, or both.
The first copy 110 of the translation table in certain embodiments is stored logically and/or physically separate from the change data. For instance, the first copy 110 in some embodiments is stored in a set of logically or physically contiguous or substantially contiguous locations in the non-volatile memory 108.
As illustrated by the arrow 208, the controller 106 updates the first copy 110 of the translation table at particular intervals to reflect the accumulated change data. Updating the first copy 110 is also referred to herein as “flushing” the translation table. Although other schemes are possible, in the scenario shown in
As indicated by the arrow 210, a power down event occurs after writing “Log Entry 7” to the metadata unit 204D. On power up, the controller 106 begins the process of re-building the translation table from the non-volatile memory 108. The change data written prior to the most recent translation table flush is irrelevant in reconstructing the translation table because this change data would already be reflected in the first copy 110 of the translation table as a result of the flush. Thus, the controller 106 locates the metadata unit 204A written immediately after the last translation table flush 208 by accessing a pointer or other appropriate metadata. This pointer may have been stored at the time the flush operation occurred, for example. To locate the last metadata unit 204F written before power down, the controller 106 executes a sequential search through the non-volatile memory 108, starting at metadata unit 204A. Because metadata unit 204G is the first metadata unit containing an outdated sequence number (“4150”), the controller 106 determines that the previous metadata unit 204F was the last written metadata unit written in the active zone before power down. The controller 106 then creates the current version of the translation table by updating the first copy 110 of the translation table with the change data corresponding to log entries 1-7 found in metadata units 204A-204F during the sequential search.
For the purposes of illustration, only seven metadata units and corresponding user data segments are searched through in the scenario depicted in
Upon power up, the controller 106 locates the first metadata unit 304A written after the last translation table flush. Because the translation table is flushed upon reaching the capacity of the metadata container, the controller 106 can assume that the last metadata unit written before the power down event 310 includes all or substantially all of the change data (Log Entries 1-3) needed to reconstruct the translation table. Thus, the controller 106 can execute a binary or other non-sequential search to efficiently locate the last metadata unit 304D written before the power down 310, and without having to read all of the intermediate metadata units.
In the illustrated case, the controller 106 executes a binary search by accessing the metadata unit 304E mid-way between the metadata unit 304A and the end of the current zone. Because the sequence number corresponding to the metadata unit 304E is an outdated sequence number (“4150”), the controller 106 determines that the last metadata unit written before the flush is located between the metadata unit 304A and the metadata unit 304E. Continuing with the binary search, the controller 106 accesses the metadata unit 304C midway between the last accessed metadata unit 304E and the metadata unit 304A. The controller 106 determines that the sequence number corresponding to the metadata unit 304C is the current sequence number (“4358”), and therefore determines that metadata unit 304D is the last metadata unit 304 written before the power down event 310. The controller 106 then uses the change data from the metadata unit 304D (“Log Entries 1-3”) and the first copy 110 of the translation table to construct an up-to-date version of the translation table.
In reconstructing the translation table as shown in the scenario of
At block 406, the controller 106 updates metadata in the non-volatile memory 108 with the translation change data. For instance, as described with respect to
At block 408, the controller 106 determines whether the change data stored in the non-volatile memory 108 exceeds a permissible size or threshold amount. For instance, a threshold amount of change data may correspond to a certain number of log entries in a metadata container. As shown in
If the amount of change data exceeds the threshold, at block 410 the controller 106 flushes the change data to update the first copy 110 of the translation table in non-volatile memory 108. For instance, the controller 106 combines the accumulated change data with the first copy 110 to generate an updated copy reflective of the changes to the logical-to-physical address mapping since the first copy 110 was last updated. In certain embodiments, only portions of the first copy 110 of the translation table that have changed are updated. For example, the first copy 110 includes a plurality of pages each corresponding to one or more locations in the non-volatile memory 108. Only the pages corresponding to locations with changed logical-to-physical mappings are updated by the controller 106 during the translation table flushing process. In another embodiment, each changed entry is updated. In yet another configuration, the entire first copy 110 of the translation table is updated during a flush operation.
At block 502, a power up condition occurs. The power up may follow either a planned or unintended power down event. At block 504, the controller loads the first copy 110 of the translation table from non-volatile memory 108. Because the first copy 110 is not flushed or otherwise updated in real-time with translation table change data during system operation, the controller 106 cannot assume that the first copy 110 is an up to date version. But, the translation table is flushed upon accumulation of a threshold amount of change data and/or before overwriting previously accumulated change data. As such, the controller 106 can assume that the last metadata unit 204 written before the power down operation includes all or substantially all of the change data for reconstructing the translation table.
To locate the last metadata unit 204 written before the power down event, at block 506 the controller 106 identifies the first metadata unit 204 written after the last translation table flush. For instance, the controller 106 may store a pointer or other mechanism in response to a flush operation, and the controller may later access the pointer at block 506. To locate the last metadata unit written before the power down, the controller 106 can advantageously execute a binary or other non-sequential search at block 508.
At block 510, the controller reads the change data in the identified last metadata unit 204. And, at block 512 the controller 106 updates the first copy 110 translation table with the change data to construct an updated version of the translation table. As described above with respect to the flush operation of block 410 of
The features and attributes of the specific embodiments disclosed above may be combined in different ways to form additional embodiments, all of which fall within the scope of the present disclosure. Although certain embodiments have been disclosed, other embodiments that are apparent to those of ordinary skill in the art, including embodiments which do not provide all of the features and advantages set forth herein, are also within the scope of this disclosure. Accordingly, the scope of protection is defined only by the claims.
Number | Name | Date | Kind |
---|---|---|---|
4769770 | Miyadera et al. | Sep 1988 | A |
5613066 | Matsushima et al. | Mar 1997 | A |
6092231 | Sze | Jul 2000 | A |
6202121 | Walsh et al. | Mar 2001 | B1 |
6324604 | Don et al. | Nov 2001 | B1 |
6339811 | Gaertner et al. | Jan 2002 | B1 |
6574774 | Vasiliev | Jun 2003 | B1 |
6772274 | Estakhri | Aug 2004 | B1 |
6829688 | Grubbs et al. | Dec 2004 | B2 |
6886068 | Tomita | Apr 2005 | B2 |
6895468 | Rege et al. | May 2005 | B2 |
6901479 | Tomita | May 2005 | B2 |
6920455 | Weschler | Jul 2005 | B1 |
6967810 | Kasiraj et al. | Nov 2005 | B2 |
7155448 | Winter | Dec 2006 | B2 |
7412585 | Uemura | Aug 2008 | B2 |
7486460 | Tsuchinaga et al. | Feb 2009 | B2 |
7490212 | Kasiraj et al. | Feb 2009 | B2 |
7509471 | Gorobets | Mar 2009 | B2 |
7529880 | Chung et al. | May 2009 | B2 |
7539924 | Vasquez et al. | May 2009 | B1 |
7603530 | Liikanen et al. | Oct 2009 | B1 |
7647544 | Masiewicz | Jan 2010 | B1 |
7669044 | Fitzgerald et al. | Feb 2010 | B2 |
7685360 | Brunnett et al. | Mar 2010 | B1 |
7840878 | Tang et al. | Nov 2010 | B1 |
8006027 | Stevens et al. | Aug 2011 | B1 |
20040019718 | Schauer et al. | Jan 2004 | A1 |
20040109376 | Lin | Jun 2004 | A1 |
20050071537 | New et al. | Mar 2005 | A1 |
20050144517 | Zayas | Jun 2005 | A1 |
20060090030 | Ijdens et al. | Apr 2006 | A1 |
20060112138 | Fenske et al. | May 2006 | A1 |
20060117161 | Venturi | Jun 2006 | A1 |
20060181993 | Blacquiere et al. | Aug 2006 | A1 |
20070016721 | Gay | Jan 2007 | A1 |
20070067603 | Yamamoto et al. | Mar 2007 | A1 |
20070204100 | Shin et al. | Aug 2007 | A1 |
20070226394 | Noble | Sep 2007 | A1 |
20070245064 | Liu | Oct 2007 | A1 |
20070288686 | Arcedera et al. | Dec 2007 | A1 |
20070294589 | Jarvis et al. | Dec 2007 | A1 |
20080098195 | Cheon et al. | Apr 2008 | A1 |
20080104308 | Mo et al. | May 2008 | A1 |
20080183955 | Yang et al. | Jul 2008 | A1 |
20080195801 | Cheon et al. | Aug 2008 | A1 |
20080256287 | Lee et al. | Oct 2008 | A1 |
20080256295 | Lambert et al. | Oct 2008 | A1 |
20080270680 | Chang | Oct 2008 | A1 |
20080307192 | Sinclair et al. | Dec 2008 | A1 |
20090019218 | Sinclair et al. | Jan 2009 | A1 |
20090043985 | Tuuk et al. | Feb 2009 | A1 |
20090055620 | Feldman et al. | Feb 2009 | A1 |
20090063548 | Rusher et al. | Mar 2009 | A1 |
20090119353 | Oh et al. | May 2009 | A1 |
20090150599 | Bennett | Jun 2009 | A1 |
20090154254 | Wong et al. | Jun 2009 | A1 |
20090164535 | Gandhi et al. | Jun 2009 | A1 |
20090164696 | Allen et al. | Jun 2009 | A1 |
20090187732 | Greiner et al. | Jul 2009 | A1 |
20090193184 | Yu et al. | Jul 2009 | A1 |
20090198952 | Khmelnitsky et al. | Aug 2009 | A1 |
20090204750 | Estakhri et al. | Aug 2009 | A1 |
20090222643 | Chu | Sep 2009 | A1 |
20090240873 | Yu et al. | Sep 2009 | A1 |
20090276604 | Baird et al. | Nov 2009 | A1 |
20100011275 | Yang | Jan 2010 | A1 |
20100208385 | Toukairin | Aug 2010 | A1 |
Number | Date | Country |
---|---|---|
2009102425 | Aug 2009 | WO |
Entry |
---|
Rosenblum, Mendel and Ousterhout, John K. (Feb. 1992), “The Design and Implementation of a Log-Structured File System.” ACM Transactions on Computer Systems, vol. 10, Issue 1, pp. 26-52. |
Rosenblum, “The Design and Implementation of a Log-structured File System”, EECS Department, University of California, Berkeley, Technical Report No. UCB/CSD-92-696, Jun. 1992. |
“Amer, et al., ““Design Issues for a Shingled Write Disk System””, 26th IEEE Symposium on Massive Storage Systems and Technologies: Research Track (MSST2010), May 2010, 12 pages.” |