The speed at which a system can write data to persistent storage and read data from persistent storage is often a critical factor in the overall performance of the system. The traditional approach to reading data from and writing data to persistent storage requires processing by multiple layers in the system kernel and by multiple entities in the hardware. As a result, the traditional approach to reading data from and writing data to persistent storage introduces significant latency in the system and, consequently, reduces the overall performance of the system.
In general, in one aspect, the invention relates to a method for rebuilding an in-memory data structure. The method includes selecting a first table of contents (TOC) entry of a TOC page of a block in persistent storage, wherein the first TOC entry comprises a first object identifier (ID) of the first object, a first offset ID, and a first birth time, determining, based on the first object ID, that the in-memory data structure comprises a first object metadata for the first object comprising a first mod time and a first object map pointer to a first object map tree, determining, based on the first birth time, that the first birth time in the first TOC entry is greater than the first mod time of the first object, updating, after determining the first object metadata exists and after determining the first birth time is greater than the first mod time, a first stored physical address in the first object map tree based on the first offset ID to a first physical address derived from the first TOC entry, and updating, after determining the first birth time is greater than the first mod time, the first mod time stored in the first object metadata to the first birth time.
In general, in one aspect, the invention relates to a non-transitory computer readable medium comprising instructions, which when executed by a processor perform a method, the method comprising selecting a first table of contents (TOC) entry of a TOC page of a block in persistent storage, wherein the first TOC entry comprises a first object identifier (ID) of the first object, a first offset ID, and a first birth time, determining, based on the first object ID, that the in-memory data structure comprises a first object metadata for the first object comprising a first mod time and a first object map pointer to a first object map tree, determining, based on the first birth time, that the first birth time in the first TOC entry is greater than the first mod time of the first object, updating, after determining the first object metadata exists and after determining the first birth time is greater than the first mod time, a first stored physical address in the first object map tree based on the first offset ID to a first physical address derived from the first TOC entry, and updating, after determining the first birth time is greater than the first mod time, the first mod time stored in the first object metadata to the first birth time.
In general, in one aspect, the invention relates to a storage appliance, comprising: persistent storage, a non-transitory computer readable medium comprising instructions, a processor, configured to execute the instructions, wherein the instructions when executed by the processor perform the method. The method comprising selecting a first table of contents (TOC) entry of a TOC page of a block in persistent storage, wherein the first TOC entry comprises a first object identifier (ID) of the first object, a first offset ID, and a first birth time, determining, based on the first object ID, that the in-memory data structure comprises a first object metadata for the first object comprising a first mod time and a first object map pointer to a first object map tree, determining, based on the first birth time, that the first birth time in the first TOC entry is greater than the first mod time of the first object, updating, after determining the first object metadata exists and after determining the first birth time is greater than the first mod time, a first stored physical address in the first object map tree based on the first offset ID to a first physical address derived from the first TOC entry, and updating, after determining the first birth time is greater than the first mod time, the first mod time stored in the first object metadata to the first birth time.
Other aspects of the invention will be apparent from the following description and the appended claims.
Specific embodiments of the invention will now be described in detail with reference to the accompanying figures. In the following detailed description of embodiments of the invention, numerous specific details are set forth in order to provide a more thorough understanding of the invention. However, it will be apparent to one of ordinary skill in the art that the invention may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description
In the following description of
In general, embodiments of the invention relate to a method and system for tracking modification times of data in a storage appliance. Specifically, embodiments of the invention relate to creating an in-memory data structure for storing object metadata for each object (e.g., file) in the storage appliance. Each object metadata may include a mod time of the object that describes the time the object was last modified (e.g., updated, cropped, trimmed). Further, embodiments of the invention relate to using the in-memory data structure to directly ascertain the physical address(es) of data for each object in the storage appliance.
In one embodiment of the invention, a client (100) is any system or process executing on a system that includes functionality to issue a read request to the storage appliance (102) and/or issue a write request to the storage appliance. In one embodiment of the invention, the clients (100) may each include a processor (not shown), memory (not shown), and persistent storage (not shown).
In one embodiment of the invention, a client (100) is operatively connected to the storage appliance (102). In one embodiment of the invention, the storage appliance (102) is a system that includes volatile and persistent storage and is configured to service read requests and/or write requests from one or more clients (100). The storage appliance (102) is further configured to create an in-memory data structure during reboot of the system in a manner consistent with the method described below (see e.g.,
In one embodiment of the invention, the storage appliance (102) includes a processor (104), memory (106), and one or more solid state memory modules (e.g., solid state memory module A (110A), solid state memory module B (110B), solid state memory module N (110N)).
In one embodiment of the invention, memory (106) may be any volatile memory including, but not limited to, Dynamic Random-Access Memory (DRAM), Synchronous DRAM, SDR SDRAM, and DDR SDRAM. In one embodiment of the invention, memory (106) is configured to temporarily store various data (including data for table of contents (TOC) entries and frags) prior to such data being stored in a solid state memory module (e.g., 110A, 110B, 110N). Memory (106) is operatively connected to the processor (104).
In one embodiments of the invention, the processor (104) is a group of electronic circuits with a single core or multi-cores that are configured to execute instructions. The processor (104) is configured to execute instructions to implement one or more embodiments of the invention, where the instructions are stored on a non-transitory computer readable medium (not shown) that is located within or that is operatively connected to the storage appliance (102). Alternatively, the storage appliance (102) may be implemented using hardware. The storage appliance (102) may be implemented using any combination of software and/or hardware without departing from the invention.
In one embodiment of the invention, the processor (104) is configured to create and update an in-memory data structure (108), where the in-memory data structure is stored in memory (106). In one embodiment of the invention, the in-memory data structure includes mappings (direct or indirect) between logical addresses and physical addresses. In one embodiment of the invention, the logical address is an address at which the data appears to reside from the perspective of the client (100). In one embodiment of the invention, the logical address is (or includes) a hash value generated by applying a hash function (e.g., SHA-1, MD-5, etc.) to an n-tuple. In one embodiment of the invention, the n-tuple is <object ID, offset ID>, where the object ID defines an object (e.g. file) and the offset ID defines a location relative to the starting address of the object. In another embodiment of the invention, the n-tuple is <object ID, offset ID, birth time>, where the birth time corresponds to the time when the file (identified using the object ID) was created. Alternatively, the logical address may include a logical object ID and a logical byte address, or a logical object ID and a logical address offset. In another embodiment of the invention, the logical address includes an object ID and an offset ID. Those skilled in the art will appreciate that multiple logical addresses may be mapped to a single physical address and that the logical address is not limited to the above embodiments. Further details on the in-memory data structure are discussed below (see
In one embodiment of the invention, the solid state memory modules (e.g., 110A, 110B, 110N) correspond to any data storage device that uses solid-state memory to store persistent data (i.e., persistent storage). In one embodiment of the invention, solid-state memory may include, but is not limited to, NAND Flash memory, NOR Flash memory, Magnetic RAM Memory (M-RAM), Spin Torque Magnetic RAM Memory (ST-MRAM), Phase Change Memory (PCM), or any other memory defined as a non-volatile Storage Class Memory (SCM).
Those skilled in the art will appreciate that the invention is not limited to the configuration shown in
The following discussion describes embodiments of the invention implemented using solid-state memory modules. Turning to
This process is repeated until there is only one page remaining in the block (208) to fill. At this point, a TOC page (210) is created and stored in the last page of the block (208). Those skilled in the art will appreciate that the total cumulative size of the TOC entries in the TOC page (210) may be less than the size of the page. In such cases, the TOC page may include padding to address the difference between the cumulative size of the TOC entries and the page size. Finally, because there are other TOC pages in the block (208), TOC page (210) includes a reference to one other TOC page (212).
As shown in
Those skilled in the art will appreciate that while block (208) only includes frag pages and TOC pages, block (208) may include pages (e.g., a page that includes parity data) other than frag pages and TOC pages without departing from the invention. Such other pages may be located within the block and, depending on the implementation, interleaved between the TOC pages and the frag pages.
Those skilled in the art will appreciate that the TOC entry may include additional or fewer fields than shown in
In one embodiment of the invention, the object map tree (318) is a data structure, such as a tree, a hash table, a multi-level table or any data structure that stores the physical addresses of data associated with the object corresponding to the object metadata (304) within the storage appliance (hereinafter “stored physical addresses”). In one embodiment of the invention, the object map tree (318) includes one or more pointer levels (320) and a physical address level (322). The object map tree pointer references the first pointer level. The one or more pointer levels (320) include pointers that reference either the next pointer level or the physical address level (322) if it is the last pointer level. The physical address level (322) includes the stored physical addresses of the most recently modified data associated with the object within the storage appliance. For example, suppose frag 1 is written at time stamp 1 and a frag 2 is written at time stamp 2, where time stamp 2 is greater than or after time stamp 1. Because time stamp 2 occurs after time stamp 1, the physical address level in the object map tree (318) maps to a physical address associated with frag 2. However, frag 1 still exists in the storage appliance until an operation, such as garbage collection, frees the location corresponding to the physical address associated with frag 1. In one embodiment of the invention, the stored physical addresses are organized in the physical address level such that the offset ID and fragment size may be used to determine a location in the physical address level of the stored physical address. The stored physical addresses in the physical address level may be organized in other ways without departing from the invention.
Turning to the flowcharts, while the various steps in the flowchart are presented and described sequentially, one of ordinary skill will appreciate that some or all of the steps may be executed in different orders, may be combined or omitted, and some or all of the steps may be executed in parallel. In one embodiment of the invention, one or more steps shown in
Turning to
Referring to
Referring to
In Step 504, a determination is made about whether corresponding object metadata for the object exists in an in-memory data structure based on the object ID. In one or more embodiments of the invention, the object ID is a key or index used to lookup or locate the object metadata for the object corresponding to the object ID in the in-memory data structure. The object metadata exists in the in-memory data structure if the object ID maps to the object metadata rather than, for example, a null or empty value. If a determination is made that object metadata for a corresponding object exists in an in-memory data structure based on the object ID, the method may proceed to Step 506. In Step 506, birth time is obtained in the TOC entry. In one or more embodiments of the invention, the birth time is the time when the frag associated with the TOC entry was written to the storage appliance.
In Step 508, a determination is made about whether the birth time is greater than the mod time in the object metadata. In one or more embodiments of the invention, the birth time is greater than the mod time if the birth time occurs after the mod time. If a determination is made that the birth time is greater than the mod time, the method may proceed to Step 510. In Step 510, the mod time in the object metadata is updated to the birth time. Said another way, the mod time in the object metadata is set to the birth time from the TOC entry.
In Step 512, the offset ID is obtained in the TOC entry. In one or more embodiments of the invention, the offset ID is the location or address in the frag page where the frag associated with the TOC entry starts.
In Step 514, the stored physical address in the object map tree is updated based on the offset ID. In one or more embodiments of the invention, each object metadata includes an object map pointer that references the object map tree for the object. The offset ID may be the index or used to derive the index in the physical address level in the object map tree that locates where to store the physical address for the frag associated with the TOC entry. Said another way, the offset ID is used to lookup a physical address of a frag in an object. In one or more embodiments of the invention, a fragment size in the TOC entry may be required to convert the offset ID to an index for the physical address level. For example, suppose a TOC entry specifies a fragment size is 8 kb and an offset ID is 32 kb. An index of 4 is obtained by dividing the offset ID of 32 kb by a fragment size of 8 k. A lookup using an index of 4 in the physical address level may then locate the physical address of the frag associated with the TOC entry.
In one or more embodiments of the invention, the physical address for a frag is defined as the following n-tuple: <storage module, channel, chip enable, LUN, plane, block, page ID, byte>. In one or more embodiments of the invention, the page ID and byte are stored in the TOC entry associated with the frag and may be used to determine the remaining information in the n-tuple to derive a physical address (hereinafter “derived physical address”). Once the derived physical address is obtained, the stored physical address in the physical address level that the offset ID maps to is updated to the derived physical address. In one embodiment of the invention, although the stored physical address is no longer mapped to by the offset ID, the data stored physical address still exists in the storage appliance until an operation frees the physical address (e.g., garbage collection operation). In one or more embodiments of the invention, a stored physical address in the physical address level that the offset ID maps to may not exist and, in such cases, the derived physical address replaces a null or empty value.
Returning to Step 504, if a determination is made that object metadata for a corresponding object does not exist in the in-memory data structure based on the object ID, the method may proceed to Step 516. In Step 516, object metadata for an object corresponding to the object ID is created in the in-memory data structure. In one or more embodiments of the invention, the object metadata is created such that the object ID may be used in the future to locate the object metadata for the object corresponding to the object ID within the in-memory data structure.
In Step 518, the object metadata is populated. In one or more embodiments of the invention, the object metadata is populated with the following information: object ID, size, mod time, fragment size, and object map pointer. In one or more embodiments of the invention, the mod time is set to null. Alternatively, the mod time is automatically set to the birth time in the TOC entry.
In Step 520, the object map tree is created. In one or more embodiments of the invention, the object map tree is created such that the object map pointer in the object references the object map tree. Further, the physical address level referenced by the one or more pointer levels are organized such that the offset ID may be used to locate physical addresses of frags in the object. The method may then proceed to Step 510 discussed above.
The process then proceeds to
In Step 410, a determination is made about whether there are remaining TOC pages in the block. In one or more embodiments of the invention, a block includes one or more TOC pages. If a determination is made that there are remaining TOC pages, the method may return to Step 404 (discussed above). If a determination is made that there are no remaining TOC pages, the method may proceed to Step 412.
In Step 412, a determination is made about whether there are remaining blocks in the solid state module. In one or more embodiments of the invention, the solid state module includes one or more blocks. If a determination is made that there are remaining blocks, the method may return to Step 402.
Turning to
Referring to
Turning to
In Step 602, a request to obtain a modification time of an object is received from a client. In Step 604, an object ID of the object from the request is obtained. In one or more embodiments of the invention, the request from the client includes the following n-tuple: <object ID, offset ID>. The object ID may then be retrieved from the n-tuple. Other methods to derive the object ID, such as a logical address, may be included in the request from the client without departing from the invention.
In Step 606, a determination is made about whether object metadata corresponding to the object exists in an in-memory data structure based on the object ID. In one or more embodiments of the invention, the object ID is a key or index used to lookup or locate the object metadata for the object corresponding to the object ID in the in-memory data structure. The object exists in the in-memory data structure if the object ID maps to the object metadata of the object rather than, for example, null or empty value. If a determination is made that object metadata for the object exists in the in-memory data structure based on the object ID, the method may proceed to Step 608.
In Step 608, a mod time in the object metadata is obtained. In one or more embodiments of the invention, each object metadata includes a mod time describing the last time the object was modified (e.g., written, updated, trimmed, cropped).
In Step 610, the request is responded to with the mod time. Said another way, the storage appliance communicates the mod time to the client that submitted the request.
Referring to
The in-memory data structure may be indexed by object ID. A lookup of object ID 37 determines that object metadata for object 37 does not exist in the in-memory data structure. Object 37 is then created in the in-memory data structure and populated with object ID of 37, an object size of 64 k, a mod time of 12:39, a fragment size of 8 k, and an object map pointer from metadata in TOC entry A. The object map pointer references the object map tree. Specifically, the object map tree pointer references a pointer level that may include eight pointers to eight physical addresses associated with object 37 in the storage appliance. There are eight physical addresses because the size of object 37 is 64 k and the fragment size is 8 k. In other words, 64 k divided by 8 k is 8 frags referenced by eight physical addresses.
Continuing with the example in
Referring to
As described in
Continuing with the example, a request to obtain the mod time for object 37 is then received by the storage appliance. The request includes the object ID of 37 and the offset ID of 24 k. Object metadata for object 37 is found to exist in the in-memory data structure based on the object ID as described in
Referring to
The in-memory data structure may be indexed by object ID. A lookup of object ID 37 determines that object metadata for object 37 does not exist in the in-memory data structure. Metadata for object 37 is then created in the in-memory data structure and populated with object ID of 37, an object size of 64 k, a mod time of 13:14, a fragment size of 8 k, and an object map pointer from metadata in TOC entry B. The object map pointer references the object map tree. Specifically, the object map tree pointer references a pointer level may include up to eight pointers to eight physical addresses associated with object 37 in the storage appliance. There are eight physical addresses because the size of object 37 is 64 k and the fragment size is 8 k. In other words, 64 k divided by 8 k is eight frags referenced by eight physical addresses.
Continuing with the example in
Referring to
As described in
Continuing with the example, a request to obtain the mod time for object 37 is then received by the storage appliance. The request includes the object ID of 37 and the offset ID of 24 k. Object metadata for object 37 is found to exist in the in-memory data structure based on the object ID as described in
One or more embodiments of the invention enable the creation of an in-memory data structure, which allows the storage appliance to determine the last modification time (i.e. mod time) for each object in the storage appliance. Further, embodiments of the invention enable access to data within an object in a single look-up step. Said another way, the storage appliance may use the in-memory data structure to directly ascertain the physical address(es) of the most recently modified data for each object in the storage appliance. Using this information, the storage appliance is able to directly access the data and does not need to traverse any intermediate metadata hierarchy in order to obtain the data.
Further, by storing a birth time in a TOC entry at the time a frag associated with the TOC entry is written, modification times may be tracked on a per frag basis rather than a per object basis. Additionally, storing the birth time at the time the frag is written allows the storage system to write the modification time for an object once by, for example, taking the maximum of the birth times for any frags associated with the object.
One or more embodiments of the invention may be implemented using instructions executed by one or more processors in the system. Further, such instructions may corresponds to computer readable instructions that are stored on one or more non-transitory computer readable mediums.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.
Number | Name | Date | Kind |
---|---|---|---|
5873097 | Harris | Feb 1999 | A |
6850969 | Ladan-Mozes et al. | Feb 2005 | B2 |
6996682 | Milligan et al. | Feb 2006 | B1 |
7366825 | Williams et al. | Apr 2008 | B2 |
7543100 | Singhal et al. | Jun 2009 | B2 |
7610438 | Lee et al. | Oct 2009 | B2 |
7634627 | Ohr et al. | Dec 2009 | B1 |
7650458 | Rogers et al. | Jan 2010 | B2 |
7685126 | Patel et al. | Mar 2010 | B2 |
7694091 | Andrewartha et al. | Apr 2010 | B2 |
7702849 | Saarinen et al. | Apr 2010 | B2 |
7739312 | Gordon et al. | Jun 2010 | B2 |
7773420 | Kim | Aug 2010 | B2 |
7836018 | Oliveira et al. | Nov 2010 | B2 |
7870327 | Cornwell et al. | Jan 2011 | B1 |
7904640 | Yano et al. | Mar 2011 | B2 |
7913032 | Cornwell et al. | Mar 2011 | B1 |
7917803 | Stefanus et al. | Mar 2011 | B2 |
7978516 | Olbrich et al. | Jul 2011 | B2 |
8074011 | Flynn et al. | Dec 2011 | B2 |
8301832 | Moore | Oct 2012 | B1 |
8370567 | Bonwick | Feb 2013 | B1 |
8407377 | Shapiro | Mar 2013 | B1 |
8601206 | Shapiro | Dec 2013 | B1 |
20050055531 | Asami et al. | Mar 2005 | A1 |
20070073989 | Sharma et al. | Mar 2007 | A1 |
20070168633 | English et al. | Jul 2007 | A1 |
20080229003 | Mizushima et al. | Sep 2008 | A1 |
20090019245 | Bondurant et al. | Jan 2009 | A1 |
20090150641 | Flynn et al. | Jun 2009 | A1 |
20090198902 | Khmelnitsky et al. | Aug 2009 | A1 |
20090198947 | Khmelnitsky et al. | Aug 2009 | A1 |
20090198952 | Khmelnitsky et al. | Aug 2009 | A1 |
20100030827 | Sarakas | Feb 2010 | A1 |
20100030999 | Hinz | Feb 2010 | A1 |
20100042790 | Mondal et al. | Feb 2010 | A1 |
20100070735 | Chen et al. | Mar 2010 | A1 |
20100106895 | Condit et al. | Apr 2010 | A1 |
20100228800 | Aston et al. | Sep 2010 | A1 |
20100281230 | Rabii et al. | Nov 2010 | A1 |
20110022780 | Wakrat et al. | Jan 2011 | A1 |
20110231624 | Fukutomi et al. | Sep 2011 | A1 |
20110238886 | Post et al. | Sep 2011 | A1 |
20130097135 | Goldberg | Apr 2013 | A1 |
Entry |
---|
Internation Search Report and Written Opinion in PCT/US2013/033276, mailed Jul. 24, 2013 (8 pages). |