1. Field of the Invention
This invention relates to multiple concurrent writeable file systems.
2. Description of the Related Art
A file system provides a structure for storing information, for example application programs, file system information, other data, etc. (hereinafter collectively referred to as simply data) on storage devices such as disk drives, CD-ROM drives, etc. One problem with many file systems is that if the file system is damaged somehow, a large quantity of data can be lost.
In order to prevent such loss of data, backups are often created of a file system. One very efficient method for creaking a backup of a file system is to create a snapshot of the file system. A snapshot is an image of the file system at a consistency point, a point at which the file system is self-consistent. A file system is self-consistent if the data stored therein constitutes a valid file system image.
In some file systems, for example Write Anywhere File system Layout (WAFL) file systems, a snapshot of a file system can be created by copying information regarding the organization of data in the file system. Then, as long as the data itself is preserved on the storage device, the data can be accessed through the snapshot. A mechanism is provided in these file systems for preserving this data, for example through a block map.
Conventionally, snapshots are read-only. A read-only snapshot can be used to recall previous versions of data and to repair damage to a file system. These capabilities can be extremely useful. However, these types of snapshots do not provide certain other capabilities that might be useful.
It would be advantageous if snapshots could be written to as well, so that a user desiring to modify a snapshot could do so. This would have several advantages:
A writable snapshot is actually another active file system. Because this active file system is based on data from another active file system, experimental modifications and changes for the active file system can be made to the writable snapshot without risking harm to the original active file system. In addition, because a snapshot can be created by simply copying organizational information and preserving existing data, writable snapshots (i.e., new active file systems) can be created easily and with utilization of few system resources.
These advantages and others are provided in an embodiment of the invention, described herein, in which plural active file systems are maintained, wherein each of the active file systems initially access data shared with another of the active file systems, and wherein changes made to each of the active file systems are not reflected in other active file systems.
In the preferred embodiment, when a second active file system is created based on a first active file system, the first active file system and the second active file system initially share data. When changes are made to the first active file system, modified data is recorded in the first active file system in a location that is not shared with the second active file system. When changes are made to the second active file system, modified data is recorded in the second active file system in a location that is not shared with the first active file system.
Further snapshots preferably are made of ones of the plural active file systems, each snapshot forming an image of its respective active file system at a past consistency point. Each snapshot includes a complete hierarchy for file system data, separate and apart from active file system data for the plural active file systems. One of these snapshots in turn can be converted into a new active file system by making the snapshot writable and by severing snapshot pointers from any of the active file systems to the new active file system.
The invention also encompasses memories that include instructions for performing the foregoing operations and storage systems that implement those operations.
This brief summary has been provided so that the nature of the invention may be understood quickly. A more complete understanding of the invention may be obtained by reference to the following description of the preferred embodiments thereof in connection with the attached drawings.
Related Applications
Inventions described herein can be used in conjunction with inventions described in the following documents:
These documents are hereby incorporated by reference as if fully set forth herein. These documents are referred to as the “incorporated disclosures”.
Lexicography
The following terms refer or relate to aspects of the invention as described below. The descriptions of general meanings of these terms are not intended to be limiting, only illustrative.
As noted above, these descriptions of general meanings of these terms are not intended to be limiting, only illustrative. Other and further applications of the invention, including extensions of these terms and concepts, would be clear to those of ordinary skill in the art after perusing this application. These other and further applications are part of the scope and spirit of the invention, and would be clear to those of ordinary skill in the art, without further invention or undue experimentation.
Snapshots and Active File Systems
File system 100 in
File system 100 includes root inode 110 and data 120, as well as other data. All of the inodes and data in file system 100 preferably are stored in blocks, although this also does not have to be the case.
Root inode 110 stores parts of the organizational data for file system 100. In particular, root inode 110 points to data and to other inodes and data that in turn point to data for all information stored in file system 100. Thus, any information stored in a file system 100 can be reached by starting at root inode 110.
Snapshot 130 has been formed from file system 100. In
After snapshot root inode 140 has been created, snapshot 130 and file system 100 actually share data on the storage device or devices. Thus, snapshot 130 preferably includes the same physical data 120 on the storage device or devices as file system 100, as indicated by the duel solid and dashed borders around data 120 in
File system 100 preferably includes snapshot data 150 that points to snapshots of file system 100. In particular, pointers 160 in the snapshot data preferably point to root inodes of those snapshots.
Snapshot 130 also preferably includes snapshot data 170 that points to other snapshots. However, snapshot data 170 of snapshot 130 can be different from snapshot data 150 of file system 100 because snapshot 130 preferably does not point to itself. This difference is shown in
Preferably, a snapshot of a file system according to the invention includes a complete hierarchy for file system data, separate and apart from active file system data for the active file systems. This hierarchy is included in the root inode for the snapshot and possibly in other nodes and data copied for the snapshot (not shown).
There is no particular requirement for the file system hierarchies for a snapshot to duplicate the name space originally used for the associated active file system. In one preferred embodiment, file names in a snapshot's root inode (and other organizational data) can be compressed using a hash code or other technique, so as to minimize the organizational data that must be stored for each snapshot. However, in an alternative embodiment, in some circumstances possibly preferable, it might be superior to maintain the original name space and other organizational data for each snapshot in a form relatively easy to read by a human user. This might have the salutary effect of aiding human users with backup and restore operations based on such snapshots.
Because file system 100 is active, a mechanism must be provided for changing data in the file system. However, in order to maintain the integrity of snapshot 130, data pointed to by snapshot root inode 140 must be preserved. Thus, for example, when data 120 is changed in file system 100, modified data 120′ is stored in the storage device or devices. Root inode 110 of file system 100 and any intervening inodes and organizational data are updated to point to modified data 120′. In addition, the unmodified data 120 is preserved on the storage device or devices. Snapshot root inode 140 continues to point to this unmodified data, thereby preserving the integrity of snapshot 130.
Likewise, when data is deleted from active file system 100, pointers to that data are removed from the file system. However, the data itself is preserved if it is included in snapshot 130. (This data can actually be deleted when the snapshot itself is removed.)
In actual practice, changes to root inode 110, other inodes, and data for many changes to file system 100 are accumulated before being written to the storage device or devices. After such changes have been written, file system 100 is self-consistent (i.e., at a consistency point). Preferably, snapshots are only made at such consistency point.
According to the invention, snapshot 130 can be converted into a new active file system by making the snapshot writable. In order to modify data in a writable snapshot 130, modified data is written to the storage device or devices. Root inode 140 and any intervening inodes and organizational data pointing to the modified data are updated. Furthermore, an unmodified copy of the data is preserved if it is still included in file system 100. This process is substantially identical to the process that occurs when modifications are made to file system 100, only the unmodified data that is preserved is data pointed to by root inode 110.
In other words, when changes are made to the first active file system (e.g., file system 100), modified data is recorded in the first active file system in a location that is not shared with the second active file system (e.g., writable snapshot 130). Likewise, when changes are made to the second active file system, modified data is recorded in the second active file system in a location that is not shared with the first active file system. As a result, changes made to the first active file system not reflected in the second active file system, and changes made to the second active file system not reflected in the first active file system.
When created, snapshot 130 substantially overlaps file system 100. If the snapshot is made writable shortly after its creation, the new active file system formed by the writable snapshot will initially share almost all of its data with the existing active file system. As a result, the invention allows for creation of an entire new active file system with efficient utilization of resources such as processing time and storage space.
The process of storing modified data and preserving unmodified data causes file system 100 and snapshot 130 (whether read-only or writable) to diverge from one another. This divergence is representationally shown in
Either or both of snapshots 130 and 180 can be turned into active file systems by making those snapshots writable. As a data is written to any of the active file systems (i.e., file system 100, writable snapshot 130, or writable snapshot 180), the file systems will diverge from one another.
In
The top portion of
Both of active file systems 800 and 830′ can trace back to a common snapshot 820. However, when that snapshot is deleted, the active file systems will no longer share a common snapshot. This situation has occurred with respect to file system 1020 and snapshots 1030 to 1050. This arrangement illustrates that it is possible to have a “forest” (i.e., a collection of unconnected trees) formed by the links between active file systems and their associated snapshots, all on one storage device or set of storage devices. Despite the fact that the file systems and their snapshots no longer point to a common snapshot, these snapshots and even the active file systems could still share some data (i.e., overlap), thereby preserving the efficiency of the invention.
In the foregoing discussion, new active file systems are created from snapshots. However, the invention does not require the actual creation of a snapshot in order to create a new active file system. Rather, all that is required is creation of structures along the lines of those found in a snapshot, namely organizational data along the lines of that found in a snapshot's root inode, along with preservation of the data pointed to by that organizational data.
Furthermore, the invention is not limited to the particular arrangements discussed above. Rather, those arrangements illustrate some possible types of relationships between active file systems, snapshots, and new active file systems. Other arrangements are possible and are within the scope of the invention.
System Elements
A system 1100 includes at least one file system processor 1110 (i.e., controller) and at least one storage device 1120 such as a hard disk or CD-ROM drive. The system also preferably includes interface 1130 to at least one computing device or network for receiving and sending information. In an alternative embodiment, processor 1100 is the processor for a computing device connected to the storage system via interface 1130.
Processor 1110 performs the tasks associated with the file system, as described herein, under control of program and data memory, the program and data memory including appropriate software for controlling processor 1110 to perform operations on storage device 1120 (and possibly for controlling storage device 1120 to cooperate with processor 1110).
In a preferred embodiment, at least one such storage device 1120 includes one or more boot records 1140. Each boot record 1140 includes two or more (preferably two) entries designating a root data block (i.e., inode) in a file system hierarchy for an active file system. Where there is a single active file system, there preferably is a single such boot record; where there is more than one such active file system, there preferably is more than one such boot record.
As noted above, more than one active file system might be present in storage device 1120. In such cases, the file system maintainer (i.e., processor 1110 operating under program control) preferably will designate and orderly maintain more than one boot record 1140, one for each such active file system.
Read-only snapshots also can be present in storage device 1120. In this case, pointers from active file systems to snapshots and from snapshots to other snapshots are stored in the storage device, as discussed above.
High Availability
A file system cluster includes a plurality of file system processors 1200 and one or more file system disks 1210. In a preferred embodiment, each such processor 1200 is disposed for operating as a file server, capable of receiving file server requests and making file server responses, such as using a known file server protocol. In a preferred embodiment, the one or more file system disks 1210 include a plurality of such disks, so that no individual disk 1210 presents a single point of failure for the entire highly-available cluster. The Write Anywhere File system Layout (WAFL), which preferably is used with the invention, incorporates such an arrangement.
As discussed above, the plurality of processors 1200 can maintain multiple parallel writeable active file systems 1210, along with all associated snapshots for those parallel writeable active file systems. The active file systems and snapshots can be maintained on the same set of disks 1220. Thus, the set of processors 1200 and the set of disks 1220 can provide a highly available cluster without need for substantial duplication of resources.
The invention can be embodied in methods for creating and maintaining plural active file systems, as well as in software and/or hardware such as a storage device or devices that implement the methods, and in various other embodiments.
In the preceding description, a preferred embodiment of the invention is described with regard to preferred process steps and data structures. However, those skilled in the art would recognize, after perusal of this application, that embodiments of the invention may be implemented using one or more general purpose processors or special purpose processors adapted to particular process steps and data structures operating under program control, that such process steps and data structures can be embodied as information stored in or transmitted to and from memories (e.g., fixed memories such as DRAMs, SRAMs, hard disks, caches, etc., and removable memories such as floppy disks, CD-ROMs, data tapes, etc.) including instructions executable by such processors (e.g., object code that is directly executable, source code that is executable after compilation, code that is executable through interpretation, etc.), and that implementation of the preferred process steps and data structures described herein using such equipment would not require undue experimentation or further invention.
Furthermore, although preferred embodiments of the invention are disclosed herein, many variations are possible which remain within the content, scope and spirit of the invention, and these variations would become clear to those skilled in the art after perusal of this application.
This application is a continuation and claims the benefit of U.S. application Ser. No. 10/165,188, filed on Jun. 7, 2002, now U.S. Pat. No. 6,857,001, which is hereby incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
4742450 | Duvall et al. | May 1988 | A |
4814971 | Thatte | Mar 1989 | A |
4825354 | Agrawal et al. | Apr 1989 | A |
4875159 | Cary et al. | Oct 1989 | A |
5014192 | Mansfield et al. | May 1991 | A |
5043871 | Nishigaki | Aug 1991 | A |
5043876 | Terry | Aug 1991 | A |
5088026 | Bozman et al. | Feb 1992 | A |
5129085 | Yamasaki | Jul 1992 | A |
5144659 | Jones | Sep 1992 | A |
5163148 | Walls | Nov 1992 | A |
5182805 | Campbell | Jan 1993 | A |
5195100 | Katz et al. | Mar 1993 | A |
5201044 | Frey et al. | Apr 1993 | A |
5208813 | Stallmo | May 1993 | A |
5210866 | Milligan et al. | May 1993 | A |
5218695 | Noveck et al. | Jun 1993 | A |
5255270 | Yanai et al. | Oct 1993 | A |
5278838 | Ng et al. | Jan 1994 | A |
5287496 | Chen et al. | Feb 1994 | A |
5313646 | Hendricks et al. | May 1994 | A |
5315602 | Noya et al. | May 1994 | A |
5357509 | Ohizumi | Oct 1994 | A |
5367698 | Webber et al. | Nov 1994 | A |
5369757 | Spiro et al. | Nov 1994 | A |
5379391 | Belsan et al. | Jan 1995 | A |
5379417 | Lui et al. | Jan 1995 | A |
5390187 | Stallmo | Feb 1995 | A |
5392446 | Tower et al. | Feb 1995 | A |
5398253 | Gordon | Mar 1995 | A |
5403639 | Belsan et al. | Apr 1995 | A |
5410667 | Belsan et al. | Apr 1995 | A |
5452444 | Solomon | Sep 1995 | A |
5455946 | Mohan et al. | Oct 1995 | A |
5457796 | Thompson | Oct 1995 | A |
5481699 | Saether | Jan 1996 | A |
5490248 | Dan et al. | Feb 1996 | A |
5504857 | Baird et al. | Apr 1996 | A |
5566297 | Devarakonda et al. | Oct 1996 | A |
5604862 | Midgely et al. | Feb 1997 | A |
5633999 | Clowes et al. | May 1997 | A |
5649152 | Ohram et al. | Jul 1997 | A |
5675802 | Allen et al. | Oct 1997 | A |
5819292 | Hitz et al. | Oct 1998 | A |
5828876 | Fish et al. | Oct 1998 | A |
5835953 | Ohran | Nov 1998 | A |
5838964 | Gubser | Nov 1998 | A |
5870764 | Lo et al. | Feb 1999 | A |
5884328 | Mosher, Jr. | Mar 1999 | A |
5963962 | Hitz et al. | Oct 1999 | A |
6006227 | Freeman et al. | Dec 1999 | A |
6006232 | Lyons | Dec 1999 | A |
6067541 | Raju et al. | May 2000 | A |
6101507 | Cane et al. | Aug 2000 | A |
6101585 | Brown | Aug 2000 | A |
6205450 | Kanome | Mar 2001 | B1 |
6289356 | Hitz et al. | Sep 2001 | B1 |
6317844 | Kleiman | Nov 2001 | B1 |
6341341 | Grummon et al. | Jan 2002 | B1 |
6397307 | Ohran | May 2002 | B2 |
6516357 | Borr | Feb 2003 | B1 |
6604118 | Kleiman et al. | Aug 2003 | B2 |
6636878 | Rudoff | Oct 2003 | B1 |
6665689 | Muhlestein | Dec 2003 | B2 |
6721764 | Hitz et al. | Apr 2004 | B2 |
6795966 | Lim et al. | Sep 2004 | B1 |
6829617 | Sawdon et al. | Dec 2004 | B2 |
6857001 | Hitz et al. | Feb 2005 | B2 |
6915447 | Kleiman | Jul 2005 | B2 |
7174352 | Kleiman et al. | Feb 2007 | B2 |
7714352 | Klieman et al. | Feb 2007 | |
20010022792 | Maeno et al. | Sep 2001 | A1 |
20010044807 | Kleiman et al. | Nov 2001 | A1 |
20020007470 | Kleiman | Jan 2002 | A1 |
20020019874 | Borr | Feb 2002 | A1 |
20020019936 | Hitz et al. | Feb 2002 | A1 |
20020049718 | Kleiman et al. | Apr 2002 | A1 |
20020059172 | Muhlestein | May 2002 | A1 |
20020083037 | Lewis et al. | Jun 2002 | A1 |
20020091670 | Hitz et al. | Jul 2002 | A1 |
20030229656 | Hitz et al. | Dec 2003 | A1 |
Number | Date | Country |
---|---|---|
1316707 | Oct 2001 | CN |
0 359 384 | Mar 1990 | EP |
0 359 384 | Jul 1991 | EP |
0 453 193 | Oct 1991 | EP |
0 453 193 | Jul 1993 | EP |
0 359 384 | Jan 1998 | EP |
94 92 1242 | Jun 1998 | EP |
1 003 103 | May 2000 | EP |
1003103 | May 2000 | EP |
1197836 | Apr 2002 | EP |
WO 8903086 | Apr 1989 | WO |
WO 9113404 | Sep 1991 | WO |
WO 9429795 | Dec 1994 | WO |
WO 9429796 | Dec 1994 | WO |
WO 9429807 | Dec 1994 | WO |
WO 9945456 | Sep 1999 | WO |
WO 9946680 | Sep 1999 | WO |
WO 9966401 | Dec 1999 | WO |
WO9966401 | Dec 1999 | WO |
WO0007104 | Feb 2000 | WO |
WO0011553 | Mar 2000 | WO |
WO0114991 | Mar 2001 | WO |
WO0114991 | Mar 2001 | WO |
WO 0131446 | May 2001 | WO |
WO0131446 | May 2001 | WO |
WO 0143368 | Jun 2001 | WO |
WO0143368 | Jun 2001 | WO |
WO 0217057 | Feb 2002 | WO |
WO0217057 | Feb 2002 | WO |
WO 0229572 | Apr 2002 | WO |
WO 0229573 | Apr 2002 | WO |
WO0229572 | Apr 2002 | WO |
WO0229573 | Apr 2002 | WO |
WO0229573 | Apr 2002 | WO |
WO 03105026 | Dec 2003 | WO |
Number | Date | Country | |
---|---|---|---|
20050182799 A1 | Aug 2005 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10165188 | Jun 2002 | US |
Child | 11057409 | US |