1. Technical Field
The invention relates to the autonomic recovery of filesystem operations. More specifically, the present invention provides an improved method, apparatus and program for recovering a filesystem in an inconsistent state and returning the filesystem to a consistent state.
2. Description of Related Art
A filesystem is a file management system that an Operating System (OS) or other program can use to organize and monitor files. Currently, when a filesystem operation fails during the course of the operation, the OS (or other program) performing the filesystem operation typically aborts the operation, marks the filesystem as “dirty,” notifies the user of the failed operation, and utilizes another program or process to correct the error. For example, the OS can use a filesystem error correction program, such as a filesystem checker (fsck), to repair the “dirty” filesystem.
Essentially, when a conventional filesystem operation needs to change a series of metadata resources, the filesystem typically acquires an exclusive “lock” on a resource, changes the data for that resource, and then drops the “lock” on that resource. Under certain conditions, the filesystem can “lock” multiple resources at once, but these operations are coded carefully to avoid a “deadlock”. An example of the flow of such an occurrence in a conventional, single thread filesystem operation is shown in
As depicted in
Next, the OS (or other program) updates a directory associated with that file (step 104). The directory contains information about the files that lie beneath the directory in a hierarchical structure. For example, the hierarchical structure can be in the form of an inverted tree. An assumption is made that an error in the filesystem operation has occurred (step 106). Notably, this error occurred in the filesystem operation after the pertinent inode was updated. Because this is an error that the OS (or other program) cannot correct immediately, the filesystem operation is aborted or terminated (step 108). The OS marks this filesystem as “dirty” and notifies a user with an alert message that an error has occurred (step 110). If so desired, the user can then initiate an error correction program (e.g., fsck) to determine the problem and correct the error (step 112).
A major drawback of this conventional solution is that since an inode was updated before an error occurred, aborting the filesystem operation at the point shown in
Thus, it would be advantageous to have a method by which a filesystem's state is not left inconsistent as a result of an aborted or otherwise incomplete filesystem operation.
The present invention provides a method, apparatus, and computer instructions to bind “undo” information to given filesystem resources, in order to reverse or rollback certain changes and thereby return a filesystem affected by a failed or incomplete operation from an inconsistent state to a previous, consistent state. The present invention also provides a method, apparatus, and computer instructions to bind “undo” information to given filesystem resources so that that later changes to the metadata in the filesystem can be “undone,” by ensuring that no filesystem operation is successful until all preceding operations that changed the same metadata are also successful.
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
With reference now to the figures,
In the depicted example, server 204 is connected to network 202 along with storage unit 206. In addition, clients 208, 210, and 212 are connected to network 202. These clients 208, 210, and 212 may be, for example, personal computers or network computers. In the depicted example, server 204 provides data, such as boot files, operating system images, and applications to clients 208-212. Clients 208, 210, and 212 are clients to server 204. Network data processing system 200 may include additional servers, clients, and other devices not shown. In the depicted example, network data processing system 200 is the Internet with network 202 representing a worldwide collection of networks and gateways that use the Transmission Control Protocol/Internet Protocol (TCP/IP) suite of protocols to communicate with one another. At the heart of the Internet is a backbone of high-speed data communication lines between major nodes or host computers, consisting of thousands of commercial, government, educational and other computer systems that route data and messages. Of course, network data processing system 200 also may be implemented as a number of different types of networks, such as for example, an intranet, a local area network (LAN), or a wide area network (WAN).
Referring to
Peripheral component interconnect (PCI) bus bridge 314 connected to I/O bus 312 provides an interface to PCI local bus 316. A number of modems may be connected to PCI local bus 316. Typical PCI bus implementations will support four PCI expansion slots or add-in connectors. Communications links to clients 208-212 in
Additional PCI bus bridges 322 and 324 provide interfaces for additional PCI local buses 326 and 328, from which additional modems or network adapters may be supported. In this manner, data processing system 300 allows connections to multiple network computers. A memory-mapped graphics adapter 330 and hard disk 332 may also be connected to I/O bus 312 as depicted, either directly or indirectly.
Those of ordinary skill in the art will appreciate that the hardware depicted in
The data processing system depicted in
Essentially, in accordance with an exemplary embodiment of the present invention, as each resource (e.g., data file) is acquired by a filesystem operation and the resource's data modified or changed, the filesystem operation stores “undo” information for that resource that can be used to reverse the changes. Also, the filesystem operation determines if other “undo” information is present for that resource, before the operation adds its own “undo” information. The filesystem operation determines, if any, which threads created the other “undo” information. As such, the filesystem operation considers the other “undo” information as “uncommitted updates” and that the other threads' operations are not yet complete.
In a “normal” or “non-error” path (e.g., no filesystem operation error has occurred), the filesystem operation modifies or changes all of the pertinent resources (data files), completes the entire operation, and then remains in a wait state. At this point, the filesystem operation waits for all other threads that had uncommitted updates on the resources involved. The filesystem operation allows all of the other threads to complete their operations successfully, before the filesystem operation can commit to the use of its undo information (thereby removing the changes that were made by the filesystem operation).
After the other threads have committed and used their undo information successfully, the filesystem operation, for the thread being run, can remove all of the undo blocked information for its resources, and then “wake up” any of the other threads that are waiting for the filesystem operation to be completed. Notably, in accordance with the present invention, if a deadlock situation occurs whereby two resources are modified in different orders, but both modifications are successful, both sets of undo blocks can be removed.
If an error occurs during the filesystem operation, the filesystem can review each resource that it has modified and determine if other threads have also modified resources in addition to the filesystem's initial modifications. If such other modifications are found, the threads that performed these modifications are considered to be in a wait state and waiting for the particular thread's operation that failed (due to the error involved). The failed thread then notifies the later (in time) threads that an operation has failed and all modifications that the other threads made are to be “undone”. Each thread is then run and all metadata changes are “undone”. The failed thread can wait for a repair process or an input/output command to complete its operation. Thus, the failed thread and the other threads have returned the filesystem to a previous, consistent state.
Specifically,
Next, the filesystem then updates a directory associated with the file of interest (step 404). An exemplary directory can be for an inverted tree structure. For example, at step 404, the filesystem changes certain data in the directory page for the thread described above with respect to step 402, and records the changes made. For a file removal operation, the directory change may be the deletion of the previous entry. The filesystem can store the recorded changes, for example, on hard disk 332 of
After the update and record change occurs, it is assumed that an error has occurred in the filesystem operation shown (step 406). In accordance with the present invention, the filesystem retrieves (e.g., from hard disk 332 of
Essentially, the exemplary embodiment of
Specifically, referring to
At T=2, the filesystem changes certain data in the directory page for thread 1 for the file of interest, and records or stores the changes made (step 506). At T=3, the filesystem also changes the data in the directory page for thread 2 for the file of interest, and records or stores the changes made (step 508). Since there is already a changed record from thread 1, the changes to the directory page for thread 2 are chained to the end of those from thread 1. For example,
At T=4, because of the interdependency of the files associated with the operations being performed for both threads 1 and 2, the filesystem delays the timing of the operations for thread 2 until the operations for thread 1 are appropriately synchronized with those of thread 2 (step 510). Specifically, thread 2 reviews its changes that were made, and also determines that thread 1 had made at least one change prior to those of thread 2. Consequently, thread 2 is required to wait for thread 1 to complete its operations before thread 2 can continue its operations, because thread 1 may want to request thread 2 to abort its operations.
After the update and record changes occur, at T=5, it is assumed that an error has occurred with respect to thread 1 in the filesystem operations shown (step 512). In accordance with the present invention, at T=6, the filesystem retrieves (e.g., from hard disk 332 of
Similarly, at T=7, the filesystem retrieves the stored changes made to the data in the updated inode page and directory page for thread 2, and reverses those changes using, for example, an “undo” command (step 516). Specifically, thread 2 aborts both changes, because now both of the thread 2 changes are “outer level” changes. Also, at T=8, the filesystem retrieves the stored changes made to the data in the updated directory page for thread 1, and reverses those changes again using, for example, an “undo” command (step 518).
Notably, at this point, the filesystem depicted in
It is important to note that although an “undo” command is described above as being used to rollback or reverse changes that have been made during the filesystem operations, the present invention is not intended to be so limited. Other appropriate commands, instructions or processes may be used to rollback or reverse such changes, in order to return a filesystem to a consistent state, and still be covered by the present invention.
It is also important to note that while the present invention has been described in the context of a fully functioning data processing system, those of ordinary skill in the art will appreciate that the processes of the present invention are capable of being distributed in the form of a computer readable medium of instructions and a variety of forms and that the present invention applies equally regardless of the particular type of signal bearing media actually used to carry out the distribution. Examples of computer readable media include recordable-type media, such as a floppy disk, a hard disk drive, a RAM, CD-ROMs, DVD-ROMs, and transmission-type media, such as digital and analog communications links, wired or wireless communications links using transmission forms, such as, for example, radio frequency and light wave transmissions. The computer readable media may take the form of coded formats that are decoded for actual use in a particular data processing system.
The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
The present application is related to commonly assigned and co-pending U.S. patent application Ser. No. ______ (Attorney Docket No. AUS920030646US1) entitled “AUTONOMIC FILESYSTEM RECOVERY”, filed on Oct. 30, 2003, and hereby incorporated by reference.