Data can be stored in various types of storage devices, including magnetic storage devices (such as magnetic disk drives), optical storage devices, integrated circuit storage devices, and so forth. Data stored in storage devices includes user data and metadata. The term “user data” refers to user-created data, program instructions, data associated with applications or other software, and the like. “Metadata” is information that describes the stored user data. Examples of metadata include file names, ownership and access rights, last modified date, file size, and other information relating to the structure, content, and attributes of files containing user data. Metadata stored by a file system is referred to as file system metadata. A file system is a mechanism for storing and organizing user data to allow software in a computer to easily find and access the user data.
In response to detecting a problem occurring in a system, or as part of preventative maintenance, file system metadata can be checked for errors, such as metadata inconsistencies. Usually, a system administrator runs a file system metadata checking tool to perform metadata consistency checking. Performing consistency checking of file system metadata associated with a large number of files can be time-consuming. The amount of time for performing consistency checking of file system metadata grows linearly with the number of files in the file system.
Usually, a file system has to be first unmounted (or otherwise taken offline) before a file system metadata checking tool can be run against the file system metadata. During the period of time that the file system is offline for the purpose of performing consistency checking, the file system and consequently user data managed by the file system is unavailable for access by system software.
Other types of file system metadata checking tools are able to perform metadata checking while a file system remains online (available for access by software). However, since the metadata can be changing while the file system is online, the results can often be unreliable. Also, other conventional file system metadata checking tools that perform metadata checking while a file system remains online typically implement certain restrictions, such as preventing all writes at some point during the metadata checking process. Such restrictions may slow down the file system metadata checking process.
As depicted in
In
To manage access of and to organize the user data 126, the system including the host system 100 and storage subsystem 118 has a file system. A file system is usually part of an operating system. The file system includes file system logic 102 that is executable in the host system 100 and file system metadata 124 that describes the user data 126. The file system allows software (e.g., application software 103) in the host system 100 to easily find and access user data 126. Examples of file system metadata include file names, ownership and access rights, last modified date, file size, and other information relating to the structure, content, and attributes of files containing the user data 126. A file system thus includes the file system logic 102, file system metadata, and user data. A change to either the user data or the file system metadata is considered a change to the file system.
In
The original file system metadata 124 is subject to corruption or inconsistency as a result of various causes, including malfunction of the storage subsystem 118 (e.g., the storage subsystem writing to a particular block on the storage medium 120 when the storage subsystem 118 should have written to another block on the storage medium); mistakes made by a system administrator (e.g., the system administrator powering off a storage subsystem cache or other component by mistake); and file system programming errors (e.g., bugs in the file system). Other causes of file system metadata corruption or inconsistency also exist. Metadata corruption or inconsistency may cause errors during access of user data by the file system. Corruption of file system metadata refers to any damage to the metadata caused by errors or failures in software, hardware, or both. Inconsistency of file system metadata refers to different parts or pieces of the metadata that are inconsistent with one another.
The host system 100 includes a metadata checker utility 106 that performs a check for errors in file system metadata. Checking for errors in file system metadata includes checking for metadata inconsistency or corruption, or for any other problem of the metadata that would prevent proper access of the user data 126 by the file system logic 102.
Examples of metadata consistency checking include performing cross-checks between different pieces of the metadata to ensure that the different pieces are synchronized (consistent with each other). In one exemplary embodiment, the file system includes a metadata file that maps segments of the physical storage medium 120 to files containing user data. This metadata file is usually referred to as a storage map or the like. With respect to the storage map, a consistency check involves examining all the files in the file system and building a copy of what the storage map should look like. The copy of the storage map is then compared with the actual storage map to determine if the actual storage map accurately maps segments of the storage medium 120 to files containing the user data 126.
Another type of consistency checking involves performing sanity checking with respect to individual information fields of file system metadata, where the individual information fields of the file system metadata are examined to ensure that the values contained in the information fields are “sane” values (in other words, the values of the information fields are within ranges of expected values). For example, if a file system is not supposed to span more than 128 disks making up the storage medium 120, and a “number of disks” information field in the file system metadata is 532, then the metadata checker utility 106 will report this “number of disks” information field as being inconsistent.
Another consistency check that can be performed involves checking the relationships between directories and files. If a file “X” has file system metadata that indicates that the file “X” is in a directory “Y,” but the directory “Y” does not actually have an entry for file “X,” then the metadata checker utility 106 will report this as an inconsistency.
There are numerous other types of consistency checks that can be performed by the metadata checker utility 106. Also, in addition to consistency checks, other types of errors are detectable by the metadata checker utility 106, including corruption of the file system metadata or other problems associated with the metadata.
If a file system is large, then the error checking performed by the metadata checker utility 106 of the file system metadata can take a relatively long time. Thus, if the file system has to be unmounted (or otherwise taken offline) to perform the error checking, then the file system becomes unavailable for access by software in the host system 100 or by external devices (external to the host system 100) during this offline period.
To avoid having to take the file system offline to perform error checking by the metadata checker utility 106, the snapshot 122 of the file system metadata is first created. The metadata checker utility 106 then performs error checking on the snapshot 122 of file system metadata, rather than on the original file system metadata 124. In one embodiment, the snapshot 122 is taken based on cooperation between a snapshot application 104 in the host system 100 and snapshot logic 108 in the file system logic 102. Note that although two separate snapshot blocks are depicted (the snapshot application 104 and snapshot logic 108), it is contemplated that the tasks performed by the snapshot application 104 and snapshot logic 108 can be combined into a single module. Alternatively, the snapshot application 104 can be omitted. The snapshot application 104 is created by a user, such as a user at a user station 114 that is coupled to the host system 100 (over a network). The user station 114 has a user interface 116 that can contain various elements, such as a command line interface, a programming interface, or a graphical user interface (GUI). The programming interface can be used to create the snapshot application 104, which issues commands to the snapshot logic 108 in the file system logic 102 to create the snapshot 122 of file system metadata. Alternatively, instead of creating a snapshot application 104 to issue commands to the snapshot logic 108, a user can issue commands to the snapshot logic 102 through the command line interface of the user interface 116. Commands can also be issued through the GUI of the user interface 116 in alternative implementations.
In response to commands (from the snapshot application 104, from the command line interface or GUI on the user station 114, or from some other source), the snapshot logic 108 creates the snapshot 122 of the original file system metadata 124. Note that the created snapshot 122 contains a copy of the file system metadata, but not a copy of the user data. Copying just the file system metadata in the snapshot 122 utilizes much less storage space than copying the entire file system into the snapshot 122. The commands can be issued by a user action; or alternatively, the commands to take the snapshot can be based on a set time or other event in the host system 100 (as detected by the snapshot application 104). As an example, the snapshot 122 of file system metadata can be taken periodically, such as every hour, every day, every week, every month, and so forth. Other events that can cause the snapshot 122 of file system metadata to be taken include detection of certain types of errors in the host system 100 that may be indications of corruption, inconsistency, or some other problem in the original file system metadata 124. By running the metadata checker utility 106 against the snapshot 122 of file system metadata, rather than against the original file system metadata 124, the file system does not have to be unmounted (or otherwise taken offline) so that software in the host system 100, such as application software 103 or an external device, can continue to access the user data 126 through the file system based on the original file system metadata 124. Thus, a file system is said to be online or available if software is able to access the file system for the purpose of accessing user data. Concurrently with normal file system operations, the metadata checker utility 106 is able to run error checking against the snapshot 122 of file system metadata.
The various software modules in the host system, including the metadata checker utility 106, the snapshot application 104, application software 103, and file system logic 102 are executable on a central processing unit (CPU) 110, or plural CPUs. The CPU 110 is coupled to memory 112 in the host system 100.
A non-extending write changes user data, but does not change file system metadata that are required by a file system standard (e.g., POSIX file system standard) to occur synchronously with update of the user data. A non-extending write changes a file system (which includes the user data). Also, a non-extending write changes a “last update time” field of the corresponding file system metadata. However, the change to the “last update time” field can be updated at a later time, rather than synchronously with the update of the user data. A file system metadata change occurs “synchronously” with a user data change if the file system metadata change occurs at substantially the same time as the user data change. By allowing reads and non-extending writes (at 205) during creation of the snapshot, system throughput is enhanced since such operations are allowed to proceed even during snapshot creation. Techniques according to some embodiments that allow non-extending writes to occur during snapshot creation are more efficient than techniques that would block or prohibit any operations that would change the file system.
Also, according to some embodiments, creation of the snapshot of the file system metadata can proceed even if dirty data (dirty metadata or dirty user data) resides in a cache, such as in a cache in the memory 112 or elsewhere. In other words, according to these embodiments, creation of the snapshot does not have to wait for flushing or synchronization of dirty data from a cache to persistent storage such as the storage medium 118.
Once the snapshot 122 has been created, then any file system operation can proceed, even file system operations that involve metadata changes.
In one implementation, the snapshot 122 is created using copy-on-write logic. Copy-on-write refers to taking a snapshot before a write is executed. In the metadata context, copy-on-write refers to taking the snapshot of the original file system metadata 124 before a write is performed on the original file system metadata 124.
In some embodiments, the snapshot 122 contains the entirety of the original file system metadata 124 (at a particular point in time). In other embodiments, the snapshot 122 can contain a subset (less than all) of the original file system metadata 124.
After the snapshot 122 is created, a command is received (at 206) to run the metadata checker utility 106. In response to this command, the metadata checker utility is run (at 208) against the snapshot 122 of file system metadata. Results of the metadata check are then presented (at 210). For example, the results can be presented through the user interface 116 of the user station 114, in the form of a report, graphical output, text output, and so forth. The results can also be stored in the host system 100, or in the user station 114, for later access by a user. Any errors detected as a result of this metadata check is addressed by a user by modifying the original file system metadata 124 to fix any inconsistencies or other errors.
While the metadata checker utility runs (at 208) the metadata error checking against the snapshot 122 of file system metadata, the file system remains online (available) so that the file system logic 102 continues to be able to access the original file system metadata 124 for normal access of the user data while the metadata checking proceeds.
In this manner, metadata checking and normal file system service can both occur in parallel, which eliminates the often lengthy downtime associated with metadata checking in conventional systems.
The flow diagram of
Instructions of software routines described herein (including the metadata checker utility 106, the snapshot application 104, application software 103, and file system logic 102 in
Data and instructions (of the software) are stored in respective storage devices, which are implemented as one or more machine-readable storage media. The storage media include different forms of memory including semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs) and flash memories; magnetic disks such as fixed, floppy and removable disks; other magnetic media including tape; and optical media such as compact disks (CDs) or digital video disks (DVDs).
In the foregoing description, numerous details are set forth to provide an understanding of the present invention. However, it will be understood by those skilled in the art that the present invention may be practiced without these details. While the invention has been disclosed with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover such modifications and variations as fall within the true spirit and scope of the invention.