This patent application is related to: the co-pending commonly-assigned U.S. patent application Ser. No. 15/058,538, titled “METHOD OF PREVENTING METADATA CORRUPTION BY USING A NAMESPACE AND A METHOD OF VERIFYING CHANGES TO THE NAMESPACE,” filed on Mar. 2, 2016; and the co-pending commonly-assigned U.S. patent application Ser. No. 15/140,241, titled “GENERALIZED VERIFICATION SCHEME FOR SAFE METADATA MODIFICATION,” filed on Apr. 27, 2016. The foregoing applications are herein incorporated by reference in entirety for all purposes.
Embodiments of the present invention generally relate to the field of data storage systems. More specifically, embodiments of the present invention relate to systems and methods for preserving the consistency of a file system, including preserving the consistency of data storage metadata.
Metadata is a set of data that describes the organization of user data or other metadata on a data storage partition or a file system volume. Preserving file system metadata is critical to the operation of modern file systems and helps ensure that user data written to the file system volume will be accessible when requested. However, occasionally, end user workloads, end user mistakes, malicious operations, or bugs in a file system driver may cause improper behavior that can result in metadata corruption, especially in unstable file systems. It is possible to encounter metadata corruptions even in mature file systems.
When a file system's metadata becomes corrupted, special tools such as the “fsck” (file system consistency check) system utility (in Unix-like operating systems) or backup/restore software are often used to attempt to recover the file system data. However, using the fsck utility is a very time-consuming operation and cannot guarantee that a corrupted file system's data will be recovered. Furthermore, such tools can be used only when the file system is in an unmounted (e.g., offline) state. As such, preventing metadata corruption is a more beneficial approach than detecting corruption and attempting to recover corrupted data.
Methods and systems for preventing metadata corruption on a storage device are described herein. Embodiments of the present invention utilize a verification architecture to validate changes made to metadata and may comprise one or more subsystems and phases. A file system volume is created using a file system creation utility (e.g., “mkfs” in Unix-like operating systems) through reservation and initialization of space for metadata structures inside the device's partition. The space for metadata structures inside the device's partition is reserved for the specific file system volume. Every reserved metadata area should be described by an area legend. Verified area legends are used by a device driver (host side) or storage device controller (e.g., Application-specific integrated circuit (ASIC) or Field-programmable gate array (FPGA)) when checking metadata modifications after the volume has been created. The verified area legends may be stored in a dedicated partition, inside the master boot record (MBR) or Globally Unique Identifier (GUID) partition table (GPT), or on a special memory chip. Write requests that overlap with any reserved metadata area on a file system volume must be verified to prevent metadata corruption.
According to one embodiment, a method of validating a write request to a storage device to prevent corruption of metadata is disclosed. The write request includes a logical block address, a magic signature, and a data type flag. The method includes determining that the logical block address of the write request overlaps an existing extent of a verified area of the storage device; responsive to the magic signature matching an expected magic signature of a legend of the verified area, determining that the magic signature is valid; responsive to the data type flag comprising a metadata type, determining that a number of blocks of the write request is valid; responsive to a size of the write request being equal to a multiple of a node size of the legend of the verified area, determining that the size of the write request is valid; and responsive to the write request comprising a valid magic signature, a valid number of blocks, and a valid size, determining that the write request is valid.
According to another embodiment, an apparatus for validating a write request to prevent corruption of metadata is disclosed. The apparatus includes a storage device and a processor communicatively coupled to the storage device that is configured to analyze the write request, where the write request includes a logical block address, a magic signature, and a data type flag; determine that the logical block address of the write request overlaps an existing extent of a verified area of the storage device; responsive to the magic signature matching an expected magic signature of a legend of the verified area, determine that the magic signature is valid; responsive to the data type flag comprising a metadata type, determine that a number of blocks of the write request is valid, responsive to a size of the write request being equal to a multiple of a node size of the legend of the verified area, determine that the size of the write request is valid; and responsive to the write request comprising a valid magic signature, a valid number of blocks, and a valid size, determine that the write request is valid.
The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention:
Reference will now be made in detail to several embodiments. While the subject matter will be described in conjunction with the alternative embodiments, it will be understood that they are not intended to limit the claimed subject matter to these embodiments. On the contrary, the claimed subject matter is intended to cover alternatives, modifications, and equivalents, which may be included within the spirit and scope of the claimed subject matter as defined by the appended claims.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the claimed subject matter. However, it will be recognized by one skilled in the art that embodiments may be practiced without these specific details or with equivalents thereof. In other instances, well-known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects and features of the subject matter.
Portions of the detailed description that follows are presented and discussed in terms of a method. Although steps and sequencing thereof are disclosed in a figure herein describing the operations of this method, such steps and sequencing are exemplary. Embodiments are well suited to performing various other steps or variations of the steps recited in the flowchart (e.g.,
Some portions of the detailed description are presented in terms of procedures, steps, logic blocks, processing, and other symbolic representations of operations on data bits that can be performed on computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, computer-executed step, logic block, process, etc., is here, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout, discussions utilizing terms such as “accessing,” “writing,” “including,” “storing,” “transmitting,” “traversing,” “associating,” “identifying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission, or display devices.
In the example of
A communication or network interface 108 allows the computing device 112 to communicate with other computer systems via an electronic communications network, including wired and/or wireless communication and including an Intranet or the Internet. The components of the computer system 112, including the CPU 101, memory 103/102, data storage device 104, user input devices 106, and the display device 110 may be coupled via one or more data buses 100.
In the embodiment of
Some embodiments may be described in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.
The following description is presented to enable a person skilled in the art to make and use the embodiments of this invention. It is presented in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. Thus, the present invention is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
The verification architecture described according to embodiments of the present invention validates write requests and may comprise one or more subsystems and phases. According to some embodiments, the file system creation utility (e.g., “mkfs” in Unix-like operating systems) creates a file system volume by means of reservation and initialization of space for metadata structures inside the device's partition that is reserved for the specific file system volume, and creates verified area legends for the reserved areas. The storage device controller (e.g., ASIC or FPGA), device driver, or special verification subsystem (host side) uses the verified area legends when checking write requests before applying metadata or user data changes after the file system volume has been created. The verified area legends may be stored in a dedicated partition, inside the master boot record (MBR) or Globally Unique Identifier (GUID) partition table (GPT), or dedicated memory chip (e.g., Not-AND (NAND) flash memory), for example.
According to some embodiments, a write request received by the verification subsystem may include a flag indicating if the write request includes user data or metadata. If the flag of the write request indicates that the write request relates to user data, the verification subsystem prevents the write request from writing to any reserved metadata areas on the file system volume. If the flag of the write request indicates that the write request relates to metadata, the verification subsystem prevents the write request from writing the metadata to a location outside of any reserved metadata areas on the file system volume.
With regard to
The reserved metadata areas may be conceptualized as a series of nodes (e.g., tree of nodes) or an array of metadata items distributed between blocks of a certain size. As a whole, the metadata structure may represent a sequence of nodes of identical size. Alternatively, the metadata structure may represent a simple table or array distributed between several physical sectors. According to some embodiments of the present invention, the metadata node begins with a header. The header may be used to identify a specific metadata structure. The metadata structure may be identified using magic signature 202, for example. The header may also comprise an identification number 203 that enables the system to analyze an order of nodes in the metadata structure's sequence.
The metadata structure of a file system may be characterized by several elements. First, the metadata structure's magic signature 202 is a special pre-defined binary value that identifies the type of metadata structure. A node size associated with the metadata structure determines the granularity of a portion of the metadata items. Min, default, and max clump size values define the minimum, default, and maximum possible size of a contiguous metadata area for future reservations. A sequence identification number 203 is used to check the order of nodes in the metadata structure's sequence.
With regard to
With regard to
With regard to
At step 506, if the write request does contain metadata, it is determined if the write request comprises a valid magic signature (e.g., the expected magic signature), for example, at the beginning of the byte stream. The write request may also include the magic signature as part of service information. The magic signature of the write request is checked against an existing magic signature in the associated metadata area legend to determine if it is the expected magic signature. If a write request does not comprise a valid magic signature (e.g., the expected magic signature), an error is signaled and/or the write request terminates (step 504). If the write request does comprise a valid magic signature, the process continues to step 507.
At step 507, it is determined if the requested number of blocks for the write request is valid. If the byte stream of the write request is located exclusively in the reserved metadata area, the requested number of blocks is valid. If the write request is attempting to store a part of byte stream outside of a metadata reserved area, an error is signaled and/or the write request terminates (step 504). If the write request was determined to be valid in step 507, the process continues to step 508, where it is determined if the size of the write request is valid. The write request size is determined based on the number of physical sectors in the write request. The size of the write request is valid if it is equal to the size of one or more metadata nodes. If the size of the write request is equal to one or more metadata nodes, the file system volume modification complies with all of the constraints of the associated verified area legend, and the metadata can be written to the file system volume (step 509). If the above constraints are not satisfied, an error is signaled and/or the write request terminates (step 504).
Embodiments of the present invention are thus described. While the present invention has been described in particular embodiments, it should be appreciated that the present invention should not be construed as limited by such embodiments, but rather construed according to the following claims.
Number | Name | Date | Kind |
---|---|---|---|
6584582 | O'Connor | Jun 2003 | B1 |
7206785 | Stephens | Apr 2007 | B1 |
7305393 | Seeger et al. | Dec 2007 | B2 |
7430570 | Srinivasan et al. | Sep 2008 | B1 |
7529745 | Ahluwalia et al. | May 2009 | B2 |
7640412 | Molaro et al. | Dec 2009 | B2 |
8086585 | Brashers | Dec 2011 | B1 |
8555022 | Edwards | Oct 2013 | B1 |
9250823 | Kamat | Feb 2016 | B1 |
9256373 | Liang | Feb 2016 | B1 |
9529735 | Hashimoto | Dec 2016 | B2 |
9804966 | Sadanandan | Oct 2017 | B1 |
20030163553 | Kitamura | Aug 2003 | A1 |
20060112096 | Ahluwalia et al. | May 2006 | A1 |
20060117056 | Havewala et al. | Jun 2006 | A1 |
20060129614 | Kim et al. | Jun 2006 | A1 |
20060282471 | Mark et al. | Dec 2006 | A1 |
20090177721 | Mimatsu | Jul 2009 | A1 |
20100023847 | Morita | Jan 2010 | A1 |
20100125586 | VanVieck | May 2010 | A1 |
20110099461 | Rajpal | Apr 2011 | A1 |
20110106802 | Pinkney et al. | May 2011 | A1 |
20120110043 | Cavet | May 2012 | A1 |
20120110281 | Green | May 2012 | A1 |
20130067148 | Takagi | Mar 2013 | A1 |
20130238876 | Fiske | Sep 2013 | A1 |
20140040540 | Pruthi | Feb 2014 | A1 |
20140188957 | Hosoi | Jul 2014 | A1 |
20140258599 | Rostoker et al. | Sep 2014 | A1 |
20150347492 | Dickie | Dec 2015 | A1 |
20160150047 | O'Hare | May 2016 | A1 |
20170255415 | Dubeyko | Sep 2017 | A1 |
20170277715 | Strauss | Sep 2017 | A1 |
20170316027 | Mondal | Nov 2017 | A1 |
20170316047 | Dubeyko | Nov 2017 | A1 |
Number | Date | Country |
---|---|---|
2014147816 | Sep 2014 | WO |
Entry |
---|
Recon: Verifying File System Consistency at Runtime https://www.usenix.org/legacy/events/fast/poster_refereed/Fryerposter.pdf, Feb. 16, 2012. |
Metadata Invariants: Checking and Inferring Metadata Coding Conventions http://people.cs.vt.edu/tilevich/papers/icse2012.pdf, Jun. 2, 2012. |
XFS Self Describing Metadata https://www.kernel.org/doc/Documentation/filesystems/xfs-self-describing-metadata.txt, Apr. 27, 2013. |
BC—Namespaces and Naming Conventions (BC-CTS-NAM), 2001, SAP AG, Release 4.6C, 26 pages. |
How can I create an empty namespace object without overwriting another object with the same name?, Feb. 24, 2012, retrieved from https://stackoverflow.com/questions/9425943/how-can-i-create-an-empty-namespace-object-without-overwriting-another-object-wi, 9 pages. |
Metz, Creating Higher Performance Solid State Storage with Non-Volatile Memory Express (NVMe), Data Storage Innovation Conference, 2015, 9 pages. |
Recon http://www.eecg.toronto.edu/˜ashvin/publicatons/recon-fs-consistency-runtime.pdf, Proceedings of the 10th USENIX conference on File and Storage Technologies, Feb. 14-17, 2012, 16 pages. |
Number | Date | Country | |
---|---|---|---|
20170322927 A1 | Nov 2017 | US |