Method and system for providing a file system overlay

Information

  • Patent Grant
  • 7437387
  • Patent Number
    7,437,387
  • Date Filed
    Friday, August 30, 2002
    21 years ago
  • Date Issued
    Tuesday, October 14, 2008
    15 years ago
Abstract
A method and system wherein a plurality of different file system views may be provided for the same data. Data copied in a sequential format to a disk based repository using a data protection application is decoded so that the data may be presented and accessed using a standard file system view. The standard file system may be used to randomly access the data as desired.
Description
BACKGROUND

The present invention relates to the presentation of data and more particularly to providing a plurality of different file system views for the same data.


Data protection (which includes backing up computer data, restoring computer data, securing computer data, and managing computer data storage) and disaster recovery procedures are essential processes in organizations that use computers. In fact, data protection is the single most expensive storage administrative task. Most large organizations perform data backups to tape media and use a robotically controlled tape library (or tape jukebox) to assist with backup automation. Performing and managing backups and restores involves many functions including, for example, media management (including tape tracking, rotation and off-site storage), tape jukebox management, file tracking, backup scheduling, assisted or automated data restore, and data archival.


In order to effectively perform the above functions, a sophisticated data protection application (DPA) is required. Examples of such DPAs include, for example, Legato NetWorker, Veritas BackupExec and CA ArcServe. DPAs automate and assist with the essential functions of data protection. DPAs are designed specifically to work with physical tape media, tape drives and tape libraries. Most of the complexity in DPAs relates to their interaction with physical tape.


Most DPAs implement sophisticated tape packing when performing backup of data. The function of a DPA is to efficiently collect data from the system that is being backed up and then to effectively store this data on tape. DPAs, therefore, implement their own proprietary tape formats to best suit their functionality.


Restoring data backed-up on tape is an operation that is also performed via the DPA. The DPA typically presents an interface that allows a user to select the file(s) required to be retrieved and facilitates the process of restoration. Physical tapes can only be sequentially accessed and are relatively slow compared to magnetic disks. This means that there is usually a significant time penalty (several minutes) when a file is restored. The restore process is cumbersome and requires that a user learn the operation of the DPA. Restore operations can typically only be performed by a small number of system administrators at a site who have been trained on the DPA's operation.


Furthermore, the data that is stored on physical tapes is considered off-line storage. In order to access the data, it is necessary for the DPA to read the files from the tape and then create appropriate files in a disk-based file system and write the contents of the files to the disk. This indirect restore process is necessary since the seek times for tape are extremely slow compared to disk (minutes instead of milliseconds). Although it would be easier for a user to access data on tape in the same way as data on disk, this would require random access patterns to tape. Tapes, however, are sequential devices making their performance extremely limited when randomly accessed.


It would therefore be desirable for data written in a sequential format by a particular DPA to be randomly accessible with standard file system semantics at disk-like speed.


SUMMARY

The current invention is a method and system wherein a plurality of different file system views may be provided for the same data regardless of the manner in which the data was stored. Data stored in sequential format may be accessed randomly at disk-like speed with standard file system semantics.





BRIEF DESCRIPTION OF THE DRAWING(S)


FIG. 1 is a method for providing a file system overlay for data stored in sequential format so that the data may be accessed randomly at disk-like speed with standard file system semantics in accordance with the preferred embodiment of the present invention.



FIG. 2 is a method for decoding data copied in sequential format to a disk based repository in accordance with the preferred embodiment of the present invention.



FIG. 3 is a system for providing a file system overlay for data stored in sequential format so that the data may be accessed randomly at disk-like speed with standard file system semantics in accordance with the preferred embodiment of the present invention.





DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT(S)

Referring initially to FIG. 1, there is shown a method for providing a file system overlay. The overlay enables data stored in sequential format to be randomly accessed at disk-like speed using a standard file system. For example, when data is copied using the interface of one file system, that same data may be accessed using the interface of different file system. A preferred embodiment of the invention is to construct a standard file system view of data copied in a sequential format by a DPA. Typical standard file system views include Unix File System (UFS), Windows NT (NTFS), Veritas Files System (VxFS) as well as the network versions thereof (CIFS, NFS, etc.). The data copied by the DPA may be copied to a physical tape, but is preferably copied to a disk based repository such as a virtual tape library (VTL). The DPA does not distinguish between copying to a physical tape or copying to a VTL and data copied to the VTL by the DPA is copied in the same format as if the DPA was copying to a physical tape.


As mentioned, the VTL is a disk based repository for backup data. The VTL is a logical representation of a physical tape library (PTL). It exhibits all the behaviors of a PTL—it appears to have a robotic arm, one or more tape drives, tape cartridge slots, mailbox (entry/exit) slots and a bar code reader. It can respond on the bus (Small Computer System Interface or Fibre Channel, for example) in exactly the same way as a PTL. Furthermore, the characteristics of a VTL are defined by virtual library types. A VTL type defines how many tape drives and tape slots the library should have, as well as how the library should identify itself when probed on the bus.


Similar to the VTL, a virtual tape is a logical representation of a physical tape. An unlimited number of virtual tapes may be used inside a VTL and are written to by virtual tape drives in the same way as physical tapes. When a virtual tape is created, a tape label is associated with it. This label is used to identify a particular virtual tape which in turn comprises particular virtual tape files. Tape labels in a VTL are reported to the DPA in exactly the same way as tape barcode labels are reported by a PTL. Regardless of whether a VTL, PTL or both are used, it is still essential to have an offsite copy of the backup data. However, copying the contents of a VTL does not require a DPA and is much easier and more flexible. Furthermore, the disk based VTL is more reliable than tape media and a PTL.


When copying the data to the VTL, the DPA writes the data to the VTL in exactly the same format as if the DPA was writing the data to tape. While the DPA is writing data to the VTL, a log is kept of all the write operations made by the DPA to the VTL. Maintaining the log of write operations allows the particular sequence in which the data was copied to the onsite VTL to be played back in the same way it was received from the DPA. This allows additional physical tapes that are equivalent to physical tapes created using a DPA to be conveniently created, if necessary, without using the DPA. This also means, however, that the data copied to the VTL is still in a sequential format and cannot be randomly accessed using the DPA.


To randomly access the data which was copied to the VTL in sequential format, the data is decoded and presented in a standard file system. This allows users using a standard file system to directly access the individual files which were collectively copied to the VTL as part of the sequential copy written by the DPA.


As shown in FIG. 1, the method 10 begins with the step of copying data to a VTL (step 12). The data is copied to the VTL using a DPA as explained above. The data may be a backup data set which is typically copied to the VTL in the form of a small number of large files. The DPA copies the files to the VTL in sequential format. DPAs typically vary in the exact manner in which they copy sequential data, so the exact format in which the data is written will vary as a function of the DPA being used. In step 14, however, decoders are used to decode the sequential data so that the data may be randomly accessed using a standard file system. The sequential data is decoded while taking into account the particular manner in which the DPA copied the data to the VTL as well as the particular file system chosen for presenting that data.


Once the data, as contained in the VTL is decoded, that same data is presented in a standard file system view where the data may be accessed randomly at disk like speed (step 16). As mentioned above, the file system may be any standard file system such as, for example, Windows NT. This eliminates the requirement that data copied by a DPA be copied as a large file comprising a plurality of smaller files wherein the entire large file must be probed by the DPA when attempting to restore even a single file contained within the large file. In contrast, the present invention allows each individual file to be individually presented and accessed using a standard file system and without using the DPA.


The preferred method for decoding data (step 14, FIG. 1) so that data copied in sequential format to a VTL may be randomly accessed is shown in FIG. 2. To begin, the copy of the data copied to the VTL is read (step 52) in order to identify the DPA that was used to copy the data to the VTL (step 54). Based on the type of DPA, meta-data is extracted from the copy of the data in the VTL (step 56). The meta-data, by way of example, may include file names, directory names, hierarchical structure, file permissions, file size, creation, modification and access dates, and locations of virtual tape library blocks containing the file data.


Two alternatives exist for the meta-data extracted in step 56, it may be provided in real time as needed or may be organized into a database. In step 58, the method determines whether to provide the meta-data in real time. If so, the decoders will work in real time to produce the meta-data information only when needed (step 62). If not, the meta-data is organized into a database that is stored in the VTL (step 60). This database can store the meta-data for an unlimited number of backup instances and is similar in structure to file catalogs used by DPAs.


Regardless of whether the meta-data is provided in real time or organized in a data base, the meta-data is used in step 64 to present the data which was originally copied sequentially to the VTL in a standard file system view. Specifically, a file system layer is implemented and the data is presented as a standard read-only file system.


So as to provide an example of how the method may be implemented, assume a user performs a backup of the following files using a DPA:














/home/data/file1


/home/data/file2


/home/data/file3


/home/data/dir1/file1


/home/data/dir1/file2


/home/data/dir1/dir2/file1


/home/data/dir2/file1


/home/data/dir2/file2.










All of those files are copied as one big file, say file “home,” by a DPA to a VTL. By way of background, if a user wants to restore a file, without the file system overlay of the present invention, he needs to start the DPA and, using the DPA's graphical user interface (GUI), probe the entire “home” file until the file that is required to perform the restore is located. Once located, the user must perform the restore via the GUI's restore interface to a specified location.


In contrast, providing a file system overlay in accordance with the present invention allows a user to simply mount or view the entire “home” file as a file system wherein all of the individual files which make up the “home” file may be presented and accessed directly without using the DPA. As such, all of the individual files that make up the “home” file may be presented as though they are on disk, or possibly at a new location. Therefore, the copy may be represented as individual files such as:














/restore/home/data/file1


/restore/home/data/file2


/restore/home/data/file3


/restore/home/data/dir1/file1


/restore/home/data/dir1/file2


/restore/home/data/dir1/dir2/file1


/restore/home/data/dir2/file1


/restore/home/data/dir2/file2.










This allows a user to directly choose and open a particular file that previously would have had to be located within a larger file. The particular file or files that are required for a restore may simply be copied to the required destination location.


Additional embodiments of the invention may be used to simplify the process for searching for a particular instance of a file. For example, a user may specify a data range to the VTL and the VTL in turn may present a file system with all versions of the files required for a restore that cover the designated data range. The files may also be differentiated by means of a unique version extension. Any means of conveniently designating particular files within the VTL so that particular files may be searched more efficiently is well within the scope of the present invention.


Referring now to FIG. 3, there is shown a system 100 for providing a file system overlay so that data may be presented and accessed in more than one file system. In the system 100 shown, data generated by a computer network 102 is copied to a disk based repository which preferably is, as shown, a VTL 106 so as to provide a backup copy of the data. A DPA 104 is used to copy the data to the VTL 106 in the same format as if the DPA 104 were copying the data to a traditional PTL. A tape emulator (not shown) may be used to trick the DPA into thinking it is writing to a PTL.


Decoders 108 are also included to decode the data that was copied in sequential format by the DPA to the VTL, so that the data may be presented in a standard file system view. As noted above, the decoders may take into account the particular format used by both the DPA when copying data to a VTL and the file system in which the data will be presented for random access. Once the data has been decoded, it may be presented on the computer network 102 as a plurality of standard read only files which are individually viewable and accessible, to those with permission, using whatever standard file system is used by the computer network 102.


Although the present invention has been described in detail, it is to be understood that the invention is not limited thereto, and that various changes can be made therein without departing from the spirit and scope of the invention, which is defined by the attached claims.

Claims
  • 1. A computer implemented method for providing a plurality of different file system views for same data, the method comprising the steps of: identifying a format used by a data protection application to copy data to a virtual tape library;selecting a decoder based on the identified format;processing the data using the selected decoder to extract meta-data from the virtual tape library; andimplementing a file system overlay for the data using the extracted meta-data;presenting and randomly accessing the data stored in a sequential format using the file system overlay with a standard file system.
  • 2. The method of claim 1 wherein the virtual tape library is a disk based repository for storing backup data in the same manner as a physical tape library.
  • 3. The method of claim 1 wherein the step of processing the data further comprises: reading the data copied to the virtual tape library;identifying which type of data protection application was used to write the data;extracting meta-data based on the type of data protection application used to write the data; andreading the data from the virtual tape library in random access format.
  • 4. A computer implemented method for providing a plurality of different file system views for same data, the method comprising the steps of: copying data in sequential format to a disk based repository using a data protection application;identifying which type of data protection application was used to copy the data;extracting meta-data based on the type of data protection application used to copy the data;decoding the data copied in sequential format to the disk based repository;presenting and accessing the data using a standard file system view, wherein the decoding is based on the type of data protection application used to copy the data to the disk based repository;reading the data copied to the disk based repository in random access format;andpresenting the data in the standard file system view using a file system overlay wherein the data copied in a sequential format to the disk based repository is randomly accessed using a standard file system.
  • 5. The method of claim 4 wherein the data copied to the disk based repository using the data protection application is copied as if the data protection application was copying the data to a physical tape library.
  • 6. The method of claim 4 wherein the disk based repository is a virtual tape library for storing backup data in the same manner as a physical tape library.
  • 7. A computer system for providing a plurality of different file system views for same data, the system comprising: a disk based repository;a data protection application configured to copy data in sequential format to the disk based repository;a decoder configured to decode the data copied in sequential format to the disk based repository, the decoder is selected from a plurality of decoders based on the type of data protection application used to copy the data to the disk based repository; andwherein data is presented using a file system overlay with a standard file system view to randomly access data copied in a sequential format to the disk based repository with a standard file system.
  • 8. The system of claim 7 wherein the disk based repository is a virtual tape library configured to store backup data in the same manner as a physical tape library.
US Referenced Citations (159)
Number Name Date Kind
4635145 Horie et al. Jan 1987 A
4727512 Birkner et al. Feb 1988 A
4775969 Osterlund Oct 1988 A
5297124 Plotkin et al. Mar 1994 A
5438674 Keele et al. Aug 1995 A
5455926 Keele et al. Oct 1995 A
5579516 Van Maren et al. Nov 1996 A
5666538 DeNicola Sep 1997 A
5673382 Cannon et al. Sep 1997 A
5745748 Ahmad et al. Apr 1998 A
5774292 Georgiou et al. Jun 1998 A
5774643 Lubbers et al. Jun 1998 A
5774715 Madany et al. Jun 1998 A
5805864 Carlson et al. Sep 1998 A
5809511 Peake Sep 1998 A
5809543 Byers et al. Sep 1998 A
5854720 Shrinkle et al. Dec 1998 A
5864346 Yokoi et al. Jan 1999 A
5872669 Morehouse et al. Feb 1999 A
5875479 Blount et al. Feb 1999 A
5911779 Stallmo et al. Jun 1999 A
5949970 Sipple et al. Sep 1999 A
5961613 DeNicola Oct 1999 A
5963971 Fosler et al. Oct 1999 A
5974424 Schmuck et al. Oct 1999 A
6021408 Ledain et al. Feb 2000 A
6023709 Anglin et al. Feb 2000 A
6029179 Kishi Feb 2000 A
6041329 Kishi Mar 2000 A
6044442 Jesionowski Mar 2000 A
6049848 Yates et al. Apr 2000 A
6061309 Gallo et al. May 2000 A
6067587 Miller et al. May 2000 A
6070224 LeCrone et al. May 2000 A
6098148 Carlson Aug 2000 A
6128698 Georgis Oct 2000 A
6131142 Kamo et al. Oct 2000 A
6131148 West et al. Oct 2000 A
6163856 Dion et al. Dec 2000 A
6173359 Carlson et al. Jan 2001 B1
6195730 West Feb 2001 B1
6225709 Nakajima May 2001 B1
6247096 Fisher et al. Jun 2001 B1
6260110 LeCrone et al. Jul 2001 B1
6266784 Hsiao et al. Jul 2001 B1
6269423 Kishi Jul 2001 B1
6269431 Dunham Jul 2001 B1
6282609 Carlson Aug 2001 B1
6289425 Blendermann et al. Sep 2001 B1
6292889 Fitzgerald et al. Sep 2001 B1
6301677 Squibb Oct 2001 B1
6304880 Kishi Oct 2001 B1
6304882 Strellis et al. Oct 2001 B1
6317814 Blendermann et al. Nov 2001 B1
6324497 Yates et al. Nov 2001 B1
6327418 Barton Dec 2001 B1
6336163 Brewer et al. Jan 2002 B1
6336173 Day et al. Jan 2002 B1
6339778 Kishi Jan 2002 B1
6341329 LeCrone et al. Jan 2002 B1
6343342 Carlson Jan 2002 B1
6353837 Blumenau Mar 2002 B1
6360232 Brewer et al. Mar 2002 B1
6385706 Ofek et al. May 2002 B1
6389503 Georgis et al. May 2002 B1
6408359 Ito et al. Jun 2002 B1
6487561 Ofek et al. Nov 2002 B1
6496791 Yates et al. Dec 2002 B1
6499026 Rivette et al. Dec 2002 B1
6546384 Shaath et al. Apr 2003 B2
6557073 Fujiwara Apr 2003 B1
6557089 Reed et al. Apr 2003 B1
6578120 Crockett et al. Jun 2003 B1
6615365 Jenevein et al. Sep 2003 B1
6625704 Winokur Sep 2003 B2
6654912 Viswanathan et al. Nov 2003 B1
6658435 McCall Dec 2003 B1
6694447 Leach et al. Feb 2004 B1
6725331 Kedem Apr 2004 B1
6766520 Rieschl et al. Jul 2004 B1
6779057 Masters et al. Aug 2004 B2
6779058 Kishi et al. Aug 2004 B2
6779081 Arakawa et al. Aug 2004 B2
6816941 Carlson et al. Nov 2004 B1
6816942 Okada et al. Nov 2004 B2
6834324 Wood Dec 2004 B1
6850964 Brough et al. Feb 2005 B1
6877016 Hart et al. Apr 2005 B1
6915397 Lubbers et al. Jul 2005 B2
6931557 Togawa Aug 2005 B2
6938039 Bober et al. Aug 2005 B1
6950263 Suzuki et al. Sep 2005 B2
6957291 Moon et al. Oct 2005 B2
6973369 Trimmer et al. Dec 2005 B2
6973534 Dawson Dec 2005 B2
6978325 Gibble Dec 2005 B2
6988109 Stanley et al. Jan 2006 B2
7032126 Zalewski et al. Apr 2006 B2
7032131 Lubbers et al. Apr 2006 B2
7055009 Factor et al. May 2006 B2
7072910 Kahn et al. Jul 2006 B2
7096331 Haase et al. Aug 2006 B1
7100089 Phelps Aug 2006 B1
7107417 Gibble et al. Sep 2006 B2
7111136 Yamagami Sep 2006 B2
7127388 Yates et al. Oct 2006 B2
7127577 Koning et al. Oct 2006 B2
7143307 Witte et al. Nov 2006 B1
7152077 Veitch et al. Dec 2006 B2
7155586 Wagner et al. Dec 2006 B1
7200546 Nourmohamadian et al. Apr 2007 B1
20010047447 Katsuda Nov 2001 A1
20020004835 Yarbrough Jan 2002 A1
20020016827 McCabe et al. Feb 2002 A1
20020026595 Saitou et al. Feb 2002 A1
20020095557 Constable et al. Jul 2002 A1
20020133491 Sim et al. Sep 2002 A1
20020144057 Li et al. Oct 2002 A1
20020163760 Lindsey et al. Nov 2002 A1
20020166079 Ulrich et al. Nov 2002 A1
20020171546 Evans et al. Nov 2002 A1
20030004980 Kishi et al. Jan 2003 A1
20030037211 Winokur Feb 2003 A1
20030097462 Parent et al. May 2003 A1
20030120476 Yates et al. Jun 2003 A1
20030120676 Holavanahalli et al. Jun 2003 A1
20030126388 Yamagami Jul 2003 A1
20030135672 Yip et al. Jul 2003 A1
20030149700 Bolt Aug 2003 A1
20030182301 Patterson et al. Sep 2003 A1
20030182350 Dewey Sep 2003 A1
20030188208 Fung Oct 2003 A1
20030217077 Schwartz et al. Nov 2003 A1
20030225800 Kavuri Dec 2003 A1
20040015731 Chu et al. Jan 2004 A1
20040098244 Dailey et al. May 2004 A1
20040181388 Yip et al. Sep 2004 A1
20040181707 Fujibayashi Sep 2004 A1
20050010529 Zalewski et al. Jan 2005 A1
20050044162 Liang et al. Feb 2005 A1
20050063374 Rowan et al. Mar 2005 A1
20050065962 Rowan et al. Mar 2005 A1
20050066118 Perry et al. Mar 2005 A1
20050066222 Rowan et al. Mar 2005 A1
20050066225 Rowan et al. Mar 2005 A1
20050076070 Mikami Apr 2005 A1
20050076261 Rowan et al. Apr 2005 A1
20050076262 Rowan et al. Apr 2005 A1
20050076264 Rowan et al. Apr 2005 A1
20050144407 Colgrove et al. Jun 2005 A1
20060047895 Rowan et al. Mar 2006 A1
20060047902 Passerini Mar 2006 A1
20060047903 Passerini Mar 2006 A1
20060047905 Matze et al. Mar 2006 A1
20060047925 Passerini Mar 2006 A1
20060047989 Delgado et al. Mar 2006 A1
20060047998 Darcy Mar 2006 A1
20060047999 Passerini et al. Mar 2006 A1
20060143376 Matze et al. Jun 2006 A1
Foreign Referenced Citations (10)
Number Date Country
1333379 Apr 2006 EP
WO 9903098 Jan 1999 WO
WO 9906912 Feb 1999 WO
WO2005031576 Apr 2005 WO
WO2006023990 Mar 2006 WO
WO2006023991 Mar 2006 WO
WO2006023992 Mar 2006 WO
WO2006023993 Mar 2006 WO
WO2006023994 Mar 2006 WO
WO2006023995 Mar 2006 WO
Related Publications (1)
Number Date Country
20040044706 A1 Mar 2004 US