1. Field of the Invention
This invention relates to computer systems and, more particularly, to restoring selected portions of a datastore from a backup datastore.
2. Description of the Related Art
It has become common for organizations to employ large-scale, complex applications to manage a wide variety of information that may be critical to their operations. For example, Microsofti Exchange Server provides an organization with a messaging (e-mail) and collaboration environment. Another example, Microsofti SharePoint Portal Server, provides a unified view (on a website) of information from various applications. Additional examples of large-scale applications, available from various vendors, are well known to those having ordinary skill in the art.
In order for an application, such as those noted above, to provide its desired functionality, it will generally be configured to store a large quantity of data in a datastore. For example, a large-scale application may store data in one or more database repositories, file systems, or other storage media, either local or remote. In order to avoid the loss of data associated with an application, a data protection application is commonly employed to manage backup and restore operations on the datastore.
To backup the data from a datastore, a data protection application may copy the entire contents of the datastore and store the data on backup media. To restore the data to a datastore, a data protection application may execute an operation to restore all of the data to the datastore. However, in some cases it may be desirable to restore only a portion of the data associated with an application to the datastore. For example, in the event that a single mailbox, message, calendar event, or other data item is inadvertently deleted, restoration of only the deleted item may be desired. Unfortunately, given the nature of backup up datasets, selectively restoring particular items may be relatively difficult and inefficient.
Generally speaking, data is backed up to a media that does not support random access. For example, tape is often used as a backup medium. Consequently, in order to gain access to a particular item within the backup dataset, it is generally necessary to first restore the entire backup dataset to a temporary location and then search the restored data for the particular items of interest. However, when selecting a particular backup dataset for restoration, a user generally is not provided with any detailed information concerning the contents of that backup dataset. Rather, a catalog of backups is generally maintained which only provides basic information concerning the backup dataset (e.g., date, size, name, etc.). Therefore, a user must generally make a “best guess” as to which backup dataset contains the item of interest and restore that backup dataset. The user may then search the restored dataset for the item(s) of interest. However, as may be appreciated, the user may not always be correct in guessing which backup dataset includes the item of interest. Consequently, the user may restore and search an entire backup dataset only to find it does not contain the item(s) of interest. Therefore, time and effort may be wasted.
Accordingly, an efficient method and mechanism for restoring data is desired.
Various embodiments of a computer system and method are disclosed. In one embodiment, a computer system comprises a plurality of data entries and a backup server coupled to a data storage device. The backup server may be configured to harvest a data set indicative of a logical relationship among the plurality of data entries, associate the harvested data set with the corresponding backup data set, and store the harvested data set and the backup data set on the data storage device. Entries are created in a backup catalog that correspond to both the backup data set and the harvested data set. During a restoration procedure, a user may browse the catalog in order to identify a backup data set, or portion of a backup data set, for restoration. Utilizing the catalog entries corresponding to the harvested data set, the user may view the structure and relationships between data items in the corresponding backup data set. The user may select for restoration particular items of data from the backup data set. In response, the backup data set(s) and harvested data set(s) corresponding to the selected items are restored to a temporary location. Data within the restored harvested data which corresponds to the user's selections is then identified. The identified data is then used to locate in the backup data set the selected items for restoration.
In a further embodiment, the logical relationship comprises a hierarchy of groups and each data entry is a member of one or more groups. Each data entry comprises one or more values, each value corresponding to a selection parameter of one or more pre-determined selection parameters. The harvested data set includes a plurality of references, each reference corresponding to one of the plurality of data entries, and each reference comprising data corresponding to the one or more values of a corresponding data entry.
These and other embodiments will become apparent upon reference to the following description and accompanying figures.
a illustrates one embodiment of data flow during the creation of a backup data set.
b illustrates one embodiment of data flow during the harvesting of a data set.
c illustrates one embodiment of data flow during the storage of a data set and a backup data set.
a illustrates one embodiment of data flow during a full restoration operation.
b illustrates the one embodiment of data flow during a selective restoration operation.
While the invention is susceptible to various modifications and alternative forms, specific embodiments are shown by way of example in the drawings and are herein described in detail. It should be understood, however, that drawings and detailed descriptions thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the invention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.
In one embodiment, a user may request either to backup data from a live datastore associated with an application whose data is being protected or restore previously backed-up data to the live datastore (decision block 110). As used herein, a datastore refers to any combination of suitable data storage media such as RAM, disk storage, flash memory or other computer-readable storage media that an application may use to store its data. If a backup operation is selected, a data set may be harvested from a backup data set that corresponds to contents of the live datastore (block 120). Generally, the process of harvesting a data set comprises determining the logical relationships that exist between data entries in the backup data set. Once the backup server has harvested a data set from the backup data set, the backup server may store both the harvested and backup data sets on a backup medium (block 130) and a catalog corresponding to the backup data set may be created or updated (block 132) to reflect the new backup. In addition, data corresponding to the harvested data set which provides a logical view, or representation, of the backup data set is also cataloged (block 132). Generally speaking, the harvested data is relatively small compared to the backup data set itself.
Continuing to refer to the embodiment shown in
Generally speaking, the type of data to be harvested may depend on the logical relationships among data entries in the datastore. For example, in one embodiment, data may be stored in a hierarchical database. In such an embodiment, harvesting may include creating data that corresponds to a map or a tree-structure representing the hierarchy. Alternatively, data entries may be represented as rows in a table, each column of which corresponds to a parameter having a value associated with each data entry. In that case, harvesting may include creating a table that contains a pointer to each data entry and a selected subset of the columns containing the values of a selected subset of the parameters for each data entry. In a still further embodiment, each data entry may include one or more group associations. For example, a folder containing files may be considered a group. In such a case, harvesting may include creating a data set that corresponds to the group associations of each data entry. These and other embodiments may be part of the harvesting process.
The harvesting process may depend, as well, on application programming interfaces (APIs) or utilities that may be provided by the application for accessing the data, metadata related to the data, or data describing the logical organization of the data stored in the datastore. For example, harvesting may include execution of a sequence of parametric queries against the backup data set to determine the group associations of each data entry. Alternatively, the application may provide specific APIs through which to request data representing the logical relationships among the data entries in the datastore. In various embodiments, the harvesting process may include these and other methods of obtaining a data set from the backup data set.
Once the backup server has harvested a data set from the backup data set, in the embodiment illustrated in
Turning now to
As shown in
In one embodiment, a backup server may take a snapshot of data entries 315 from live application datastore 310 (process 320) and store a resulting backup data set 325 in a working datastore 330. In various embodiments, process 320 may include one or more of: locking live application datastore 310, creating a copy 315 of the data entries 305 on the same host as live application datastore 310, or creating a copy 315 of the data entries 305 on a different host. In one embodiment, working datastore 330 contains data corresponding to all of data entries 305 from live application datastore 310. In other embodiments, the working datastore 330 contains data entries corresponding to data entries 305 from live application datastore 310 that have changed since the last backup operation (e.g., as would be found in an incremental backup). To simplify the example, it is assumed that the working datastore contains data corresponding to all of data entries 305 from live application datastore 310.
As shown in
In the example illustrated, working datastore 330 may include data entries 335A-335E, etc. that correspond to data entries 315A-315E, etc. respectively from live application datastore 310. As shown, working datastore 330 may also contain harvested data set 345. Data set 345, in one embodiment, is representative of a set of hierarchical groups 400, 410, 420, 430, 440, and 450. For example, if backup data set 325 corresponds to an e-mail server, the groups depicted in
It is noted that, in the illustrated embodiment, groups may contain other groups. In this and other embodiments, groups may contain both other groups and references. Furthermore, groups may contain any number of references with various numbers of values, depending on the structure and content of backup data set 325. It is further noted that each group may contain the same number of parameters in each reference or a different number of parameters in each reference, or different parameters in each column, depending on the structure and content of backup data set 325. In addition, as noted, groups may be representative of a number of different container objects such as folders, mailboxes, and directories, etc. As may be seen in
If while browsing a backup catalog the user chooses only a selected portion of a backup for restoration, a selective restore is indicated (block 510). In one embodiment, the backup server may provide a catalog interface to a user through which the user may browse the data set and optionally select criteria that may be used to locate or retrieve selected data entries for restoration. In response to detecting a selective restore is desired, the backup data set and corresponding harvested data set are restored from the backup medium to a temporary location (block 540). Based upon the user's selection, data within the restored harvested data which corresponds to the user's selection is identified (block 550). Based upon the identified information, queries are automatically generated which locate and retrieve the corresponding data within the backup data set (block 560). The retrieved data is then transferred for restoration to the live data stored. (block 570).
In some embodiments, an application may include utilities through which selected data may be restored to a live datastore. Accordingly, in one embodiment, the application utilized may itself be used during restoration of the selected data entries to the live data store. In other embodiments, the backup server may be operable to restore selected data entries to the live data store without intervention from the application.
Turning now to
In the example shown, folders may generally correspond to the groups as previously described in the discussion of
During a selective restore operation, a user may highlight a folder or other group represented on the left pane of user interface 600. Once a folder is highlighted, the contents of the right pane may be browsed. The user may then select specific values of specific parameters (i.e., specific row-column locations). For example, rows in which the “From” parameter has a value of “Sender 2” are highlighted, indicating that the user has chosen to restore all messages in Folder 4.2 that were sent from Sender 2. Upon completing the selection, such as by hitting the “ENTER” key, the backup server may proceed to execute a selective restoration as described above. It is noted that the numerous ways of selecting data and items for restoration. In addition, searches on the data may be performed as well. All such alternatives are contemplated. In addition, other user interfaces, including either graphical or command-line interfaces, may be implemented to provide a method for selecting criteria for a selective restoration. The arrangement of information on the screen, selection procedures, and functionality such as the ability to make multiple selections from one screen may vary from embodiment to embodiment.
a and 7b illustrate the flow of data during a restore operation.
As shown in
As shown in
Based upon the retrieved harvested data 745, queries 745 may be generated 730 for a query engine 300. As discussed above, the application to which the data is being restored may itself include utilities for selectively restoring items of data to a live data store. In such an embodiment, the query engine 300 may correspond to APIs of the application itself. In other embodiments, the backup/restore software may include the query engine 300. For example, in one embodiment, query engine 300 may be a Messaging Application Programming Interface (MAPI) compliant e-mail application and query 730 may comprise executing a MAPI request to restore selected data entries to an email message store. Alternatively, in some embodiments, the backup server may restore selected data entries 345 to live application datastore 310 without intervention by application 300, as described above in the example of a full restoration. In response to the queries, corresponding data 755 is retrieved and restored 750 to the live data store 310. It is noted that restoration of a selected item may generally include restoration of other items which are not explicitly selected for restoration. For example, if a user selects a particular folder for restoration, it may be necessary to also restore folders or directories which contain the selected folder. In other cases, logical relationships between data entries may indicate dependencies between the entries. In such a case, restoration of the other data entries in a dependent relationship may be automatically performed as well.
Turning now to
Each of the storage devices 804 may include any of one or more types of storage devices including, but not limited to, storage systems such as RAID (Redundant Array of Independent Disks) systems, disk arrays, JBODs (Just a Bunch Of Disks, used to refer to disks that are not configured according to RAID), tape devices, and optical storage devices. These devices may be products of any of a number of vendors including, but not limited to, Compaq® Inc., EMC® Corp., and Hitachi® Ltd. Servers 802 may run any of a variety of operating systems such as a Unix® operating system, Solaris® operating system, or a Windows® operating system. Each server 802 may be connected to the fabric 806 via one or more Host Bus Adapters (HBAs).
Fabric 806 includes hardware that connects servers 802 to storage devices 804. The fabric 806 may enable server-to-storage device connectivity through Fibre Channel switching technology. The fabric 806 hardware may include one or more switches (also referred to as fabric switches), bridges, hubs, or other devices such as routers, as well as the interconnecting cables (e.g., for Fibre Channel SANs, fibre optic or copper cables), as desired.
In one embodiment, the SAN may use the Network File System (NFS) protocol to provide access to shared files on the SAN. Using NFS, each server 802 may include a logical hierarchy of files (e.g., a directory tree) physically stored on one or more of storage devices 804 and accessible by the client systems 806 through the server 802. These hierarchies of files, or portions or sub-trees of the hierarchies of files, are referred to herein as “file systems.” In one embodiment, the SAN components may be organized into one or more clusters to provide high availability, load balancing, and/or parallel processing. For example, in
It is noted that the above described embodiments may comprise software. In such an embodiment, the program instructions which implement the methods and/or mechanisms may be conveyed or stored on a computer accessible medium. Numerous types of media which are configured to store program instructions are available and include hard disks, floppy disks, CD-ROM, DVD, flash memory, Programmable ROMs (PROM), random access memory (RAM), and various other forms of volatile or non-volatile storage. Still other forms of media configured to convey program instructions for access by a computing device include terrestrial and non-terrestrial communication links such as network, wireless, and satellite links on which electrical, electromagnetic, optical, or digital signals may be conveyed. Thus, various embodiments may further include receiving, sending or storing instructions and/or data implemented in accordance with the foregoing description upon a computer accessible medium.
Although the embodiments above have been described in considerable detail, numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
Number | Name | Date | Kind |
---|---|---|---|
5276860 | Fortier et al. | Jan 1994 | A |
5535381 | Kopper | Jul 1996 | A |
5555371 | Duyanovich et al. | Sep 1996 | A |
5559991 | Kanfi | Sep 1996 | A |
5835953 | Ohran | Nov 1998 | A |
5991542 | Han et al. | Nov 1999 | A |
6029168 | Frey | Feb 2000 | A |
6085298 | Ohran | Jul 2000 | A |
6101585 | Brown et al. | Aug 2000 | A |
6360330 | Mutalik et al. | Mar 2002 | B1 |
6542962 | Kodama et al. | Apr 2003 | B2 |
6640278 | Nolan et al. | Oct 2003 | B1 |
6665815 | Goldstein et al. | Dec 2003 | B1 |
6714952 | Dunham et al. | Mar 2004 | B2 |
6829688 | Grubbs et al. | Dec 2004 | B2 |
6847983 | Somalwar et al. | Jan 2005 | B2 |
6865655 | Anderson | Mar 2005 | B1 |
6880051 | Timpanaro-Perrotta | Apr 2005 | B2 |
6910112 | Berkowitz et al. | Jun 2005 | B2 |
6938135 | Kekre et al. | Aug 2005 | B1 |
6976039 | Chefalas et al. | Dec 2005 | B2 |
20030163495 | Lanzatella et al. | Aug 2003 | A1 |
20030177149 | Coombs | Sep 2003 | A1 |
20040268068 | Curran | Dec 2004 | A1 |
20050216788 | Mani-Meitav et al. | Sep 2005 | A1 |
Number | Date | Country |
---|---|---|
838758 | Apr 1998 | EP |