Referring now to the drawings in which like reference numbers represent corresponding parts throughout:
1. Overview
As previously mentioned, embodiments of the invention can operate as part of a distributed data backup system for a networked computer system. In such a distributed data backup system, a backup server may be accessed by one or more client backup applications, each operating on a local computer system, to create data backups on the distributed backup system. When a data backup is created, a client backup application stores backup restore information as part of the backup data which can be interpreted by the backup application and/or backup server to direct how the remainder of the backup data needs to be restored.
Importantly, embodiments of the present invention allow the backup restore information to be employed directly from a staging directory where they are cached that may exist on the local computer system. During a backup restore process, the backup application first determines whether the backup restore information exist in the staging directory before requesting them from the backup server. The backup restore information may be stored in a unique location within the staging directory, e.g. a timestamp-labeled subdirectory. The backup application can reconcile the staging directory to eliminate backup restore information for backup data which no longer exists on the backup server.
2. Caching Backup Restore Information on a Local System
In addition, as part of the ordinary backup and restore processes, some or all of the data objects 106A-106C to be backed up may be stored in a local backup 108 as coordinated between the backup client application 102 and the backup server 106. Embodiments of the present invention recognize that to properly restore some data objects 106A-106C particular backup restore information 112A-112F (e.g. backup metadata such as in XML), which is added to the backup data when the backup is made, may be required first to provide instructions for the proper structure the restored data objects 106A-106C. Accordingly, embodiments of the invention inspect the local backup 110 to first determine whether the backup restore information 112A-112F for the particular backup object 106A-106C exists there. If the required backup restore information 112A-112F is found in the local backup 108, there is no need to retrieve the same information from the remote storage resources 110A-110E through the backup server 108 which will tax the system and delay the overall restore operation.
Operation of an embodiment of the invention may be enhanced by identifying and organizing the backup restore information 112A-112F when a backup is made. First, for each backup of a particular data object 106A-106C, the corresponding backup restore information 112A-112F is uniquely identified within the local backup 108. For example, a timestamp-labeled subdirectory may be created to store the particular backup restore information 112A-112F on the local storage, although any other known technique for generating a unique identifier may also be employed. It is important to note that the unique identifier can distinguish between separate backup objects 106A-106C which may each have different restore requirement although together they are part of a single backup. For example, backup restore information 112A, 112D, 112E stored in the local backup 110 are used to restore data objects 106A, 106B, 106C, respectively. In addition, the unique identifiers can also serve to distinguish between different backup versions of the same backup object. For example, three different backup versions of data object 106A correspond to backup restore information 112A-112C stored in the local backup 110. Similarly, two backup versions of data object 106C correspond to backup restore information 112E, 112F, although only one backup version of data object 106B is described in backup restore information 112D. The unique identifier may be stored on a database of the backup server 108 for quick retrieval to be available when a backup restore is requested.
In addition, to providing a unique identifier to the backup restore information 112A-112F stored locally, a digital signature may also be applied to (or determined from) piece of backup restore information 112A-112F when a backup of a data object 106A-106C is mad. The digital signature may also then be stored in a database on the backup server 108 (or locally) and used to check the backup restore information 112A-112F during a restore. This can secure the backup restore information 112A-112F from any corruption.
During the usual process of performing data backups and restoring, it is important to delete any backup restore information 112A-112F which may still exist in the local backup when a corresponding backup no longer exists on the backup server. Accordingly, backup restore information 112A-112F in the local backup 110 may be periodically reconciled with the existing backups for the local system shown on the backup server 108. For example, this reconciliation may occur at each subsequent backup request and any extraneous backup restore information 112A-112F deleted. Without this process, the contents of the local backup 110 would continue to increase indefinitely over time.
In one specific example, an embodiment of the invention may be applied to the Microsoft Volume Shadow Copy Services (VSS) backup method, previously mentioned. In this case, the local backup client application for a Tivoli Storage Manager (TSM) backup server stores the XML files (the backup restore information such as backup metadata information) in a known location, a staging directory, and backs up these files along with the remainder of the backup data. Upon a backup restore process, these XML files are restored first and then a second pass is made to restore the rest of the data. It is often the case that at restore time the XML information might still be in the local staging directory and could be used directly instead of restoring the information from the backup server. Embodiments of the invention allow the backup application to determine if the files exist on the local system before requesting them from the backup server. If they XML information does exist in the local staging directory and they can be retrieved from there, the overall speed of the restore proces is improved. Embodiments of the invention may incorporate one or more of a variety of techniques to operate effectively.
For example, several backup versions can be made for the same file or application data. Instead of writing the files to a common location, each backup version stores its XML files in a unique section of a known location, e.g., a subdirectory within the staging directory which is named with a backup time stamp. The backup time stamp may be recorded as part of the backup operation. However, the backup application should remove these XML files (e.g. through a regularly performed reconciliation) if there is no longer a corresponding entry on the backup server. If this operation is not performed, the local cache of XML files in the staging directory will grow indefinitely. In addition, the local cache of XML files should be protected by using a digital signature to ensure that the contents are not changed or deleted.
For example, a digital signature (e.g. such as a checksum) can be derived from one or more metadata information files (e.g. one or more XML files) and then stored in the backup server database. In order to verify viability of the metadata information in the local cache at any later time (e.g. during a regular reconcilation with the backup server), the current checksum value of the applicable metadata information file(s) can be compared to the corresponding digital signature from backup server database.
Embodiments of the invention provide several advantages over applicable prior art distributed backup systems. Ordinarily, if the backup server is be used to restore data stored on local media such as a local FlashCopy, some metadata information (e.g. XML files) needs to be stored on the TSM backup server. However, with embodiments of the invention, all of the backup metadata information can be restored locally from the cache, including the metadata that is also stored on the TSM server.
In general, with a FlashCopy, just a copy of physical media is taken, i.e., only the data bits without any context. If the FlashCopy is only a local copy the local physical copy may be all that is required. However, in a backup to a TSM server, the backup is occurring at the file system level (i.e. images of file systems). Thus, there are two typical types of restores: a local FlashCopy restore where additional file information metadata defines the logical volumes and file systems or a typical restore from TSM server where the metadata information is read and defines the logical volumes and file systems and then restores the files system data. This describes a distinction between conventional TSM server backups and conventional hardware backups (like a FlashCopy). However, more recently hardware backups (like FlashCopy) may also be managed by the TSM server. Embodiments of the invention are applicable to backup servers managing all types of backup processes, e.g. hardware and file system level.
A backup request may store several disparate pieces of metadata which can exesterbate the restore request. For example, the tape layout of the data on the backup server (e.g. a TSM server) could comprise metadata for a first backup object, the real data for the first backup object, metadata for a second backup object, the real data for the second backup object, and so on. Embodiments of the invention can greatly reduce the need to position the tape several times in systems where all the backup information metadata must be restored before the actual backup data is restored. In addition, operation of the invention can be independent of where files are ultimately stored on the backup server (TSM server); operation is independent of media type, tape placement, and other similar factors.
3. Hardware Environment
Generally, the computer 202 operates under control of an operating system 208 (e.g. z/OS, OS/2, LINUX, UNIX, WINDOWS, MAC OS) stored in the memory 206, and interfaces with the user to accept inputs and commands and to present results, for example through a graphical user interface (GUI) module 232. Although the GUI module 232 is depicted as a separate module, the instructions performing the GUI functions can be resident or distributed in the operating system 208, a computer program 210, or implemented with special purpose memory and processors.
The computer 202 also implements a compiler 212 which allows one or more application programs 210 written in a programming language such as COBOL, PL/1, C, C++, JAVA, ADA, BASIC, VISUAL BASIC or any other programming language to be translated into code that is readable by the processor 204. After completion, the computer program 210 accesses and manipulates data stored in the memory 206 of the computer 202 using the relationships and logic that was generated using the compiler 212. The computer 202 also optionally comprises an external data communication device 230 such as a modem, satellite link, ethernet card, wireless link or other device for communicating with other computers, e.g. via the Internet or other network.
Instructions implementing the operating system 208, the computer program 210, and the compiler 212 may be tangibly embodied in a computer-readable medium, e.g., data storage device 220, which may include one or more fixed or removable data storage devices, such as a zip drive, floppy disc 224, hard drive, DVD/CD-rom, digital tape, etc., which are generically represented as the floppy disc 224. Further, the operating system 208 and the computer program 210 comprise instructions which, when read and executed by the computer 202, cause the computer 202 to perform the steps necessary to implement and/or use the present invention. Computer program 210 and/or operating system 208 instructions may also be tangibly embodied in the memory 206 and/or transmitted through or accessed by the data communication device 230. As such, the terms “article of manufacture,” “program storage device” and “computer program product” as may be used herein are intended to encompass a computer program accessible and/or operable from any computer readable device or media.
Embodiments of the present invention are generally directed to any software application program 210 that manages backup storage and restore processes over a network. The program 210 may operate within a single computer 202 or as part of a distributed computer system comprising a network of computing devices. The network may encompass one or more computers connected via a local area network and/or Internet connection (which may be public or secure, e.g. through a VPN connection).
Those skilled in the art will recognize many modifications may be made to this hardware environment without departing from the scope of the present invention. For example, those skilled in the art will recognize that any combination of the above components, or any number of different components, peripherals, and other devices, may be used with the present invention meeting the functional requirements to support and implement various embodiments of the invention described herein.
4. Example Process of Caching Backup Restore Information
In an example process illustrating an embodiment of the invention, a user first requests a backup of an application, e.g., a backup of an Microsoft Exchange storage group or groups, with a backup client application running on a local system. The backup client application determines that the backup can be accomplished with a system such VSS which requires the backup of metadata information in XML format (the backup restore information). The backup client application then creates a timestamp, e.g., 20050825153030, for the backup and stores it on the backup server. This information is also stored in the backup server database for fast retrieval.
The backup client application generates the XML documents; instead of writing them to a common file or subdirectory, e.g., c:\adsm.sys, it writes them to a unique staging subdirectory on the local system identified by the timestamp, e.g., c:\adsm.sys\20050825153030. A digital signature may also be created by taking information such as file size and number of files into account or some other mechanism which guards against files being deleted or changed. During a subsequent backup operation, a reconciliation process with the backup server can determine (from the backup server database) whether the backup server still has a backup with a timestamp of 20050825153030. If so, the backup client application leaves the unique staging subdirectory (c:\adsm.sys\20050825150303) in place. If the backup server no longer includes a backup with the timestamp, the backup client application deletes the unique staging subdirectory within the staging area on the local system.
When a backup restore is requested, the backup client application retrieves the backup timestamp from the backup server; if the staging directory (c:\adsm.sys\20050825150303) is in place and the digital signature is correct, the backup application skips restoring these files from the backup server as they are readily available from the unique staging subdirectory on the local system.
In addition, the method 300 may further include optional operations for reconciling the local backup staging directory with the backup server by determining whether the data backup no longer exists on the backup server in operation 308 and deleting the backup recovery information in the local backup staging directory in response to determining that the data backup no longer exists on the backup server in operation 310. Reconciling of the local backup staging directory with the backup server is typically performed upon a subsequent data backup. As previously mentioned, method embodiments of the invention can be further modified consistent with the computer program and/or system embodiments of the invention described herein.
This concludes the description including the preferred embodiments of the present invention. The foregoing description including the preferred embodiment of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible within the scope of the foregoing teachings. Additional variations of the present invention may be devised without departing from the inventive concept as set forth in the following claims.