The present invention relates to data backup. More particularly, the present invention is a method and system for generating a snapshot in a consistent state and recovering electronic mail, hereinafter “email”, data from a snapshot in a consistent state.
Many schemes have been developed to protect data from loss or damage. One such scheme is hardware redundancy, such as redundant arrays of independent disks (RAID). Unfortunately, hardware redundancy schemes are ineffective in dealing with logical data loss or corruption. For example, an accidental file deletion or virus infection is automatically replicated to all of the redundant hardware components and can neither be prevented nor recovered from when using such technologies.
To overcome this problem, backup technologies have been developed to retain multiple versions of a production system over time. This allowed administrators to restore previous versions of data and to recover from data corruption.
One type of data protection system involves making point in time (PIT) copies of data. A first type of PIT copy is a hardware-based PIT copy, which is a mirror of a primary volume onto a secondary volume. The main drawbacks of the hardware-based PIT copy are that the data ages quickly and that each copy takes up as much disk space as the primary volume. A software-based PIT, or so called “snapshot,” is a “picture” of a volume at the block level or a file system at the operating system level.
It is desirable to generate a snapshot when an application or a file system is in a consistent state because it alleviates the need to replay a log of write streams and allows applications to be restarted rapidly. In order to achieve this, prior art systems suspend an application to update source data and flushes the source data to primary storage before generating a snapshot. However, this method is not efficient because the system has to be suspended for a while in order to generate a snapshot. PIT systems also inefficiently require that the entire snapshot be restored in order to recover specific data. However, it is sometimes desirable to recover a specific file, email data, or the like. This may require recovering a parsed version of a snapshot. For email data, the user may also have to manually set up an email application on top of the recovered snapshot in order to read the recovered email data.
Therefore, there is a need for a method and system for generating a snapshot in a consistent state without suspending an application or a system and for restoring email data from a snapshot in a consistent state.
The present invention is a method and system for generating a snapshot in a consistent state and recovering email data using a remote client. The system comprises a host computer, primary data storage, a data protection unit, and secondary data storage. The data protection unit monitors a state of an application which is running on the host computer. The data protection unit generates a snapshot of data stored in primary data storage when the application is in a consistent state, and stores the snapshot on secondary storage. In the event of a system failure, the data is recovered using the last snapshot. Snapshot generation may be triggered either by storing data on a secondary storage or marking data that already exists on the secondary storage.
Alternatively, the system may identify a consistent snapshot by analyzing previous write streams. Snapshots are generated in accordance with a snapshot generation policy. In the event of a system failure, the data protection unit identifies a snapshot which is generated in a consistent state among a plurality of snapshots. The data is recovered from the identified snapshot.
The present system may also provide means for recovering email data from a snapshot in a consistent state. The data protection unit may provide interface means for a remote client to access snapshots in a consistent state having email data.
A more detailed understanding of the invention may be had from the following description of a preferred embodiment, given by way of example, and to be understood in conjunction with the accompanying drawings, wherein:
The present invention will be described with reference to the drawing figures wherein like numerals represent like elements throughout. The present invention may be implemented, purely by way of example, in a Chronospan system, such as is described in U.S. patent application Ser. No. 10/771,613, which is incorporated by reference as if fully set forth.
A volume manager is a software module that runs on the host computer 102 or an intelligent storage switch 142 (see
The data protection unit 106 controls generation of snapshots. A plurality of snapshots are generated, stored and expired in accordance with a snapshot generation policy. The host computer 102 runs an application. Hereinafter, the terminology “application” means any software running on a computer or a file management system for managing and storing data including, but not limited to, a database system, an email system or a file system. The application running on the host computer 102 generates an output and the output is preferably stored in a memory (not shown) in the host computer. The output in the memory is flushed into the primary data volume 104 when the memory is full or a predetermined time expires or instructed by the application. Alternatively, the output may be directly stored in the primary volume.
The application running on the host computer 102 generates information which may be used in determining whether the application is in a consistent state or not. Various schemes may be used for this purpose. For example, a file system may be configured to generate an indicator that the system is in a consistent state. More particularly, the file system may set specific bits to indicate that the file system is in a clean state. The system reads the specific bit to figure out whether the system is in a consistent state, and generates a snapshot when the bits are set. Alternatively, it is possible to analyze the log of a journaling system to find out a consistent state when the log is empty. A journaling system may be a file system that logs changes to a journal, i.e. a collection of logs, before actually writing them to a main file system. In the event of a system failure, a journaling system ensures that the data on the disk may be restored to its pre-crash configuration.
The data protection unit 106 monitors state information in real time and detects when the application is in a consistent state. The data protection unit 106 generates a snapshot when the application is in a consistent state. With this scheme, in the case of a system failure, the need to replay a log of write streams to recover data is substantially alleviated, and the application may be restarted more rapidly. The snapshots do not have to be absolutely consistent. The snapshots may be generated slightly before or after the consistent point. The snapshots may be generated at any point that may be a good time in practice, (i.e., any time that requires a small time for replaying the log is a good candidate).
The consistent point may vary from application to application. A snapshot that may be consistent for one application may not be consistent for another application. Therefore, after generating one snapshot which is consistent for one application, if a consistent point is detected for another application, another snapshot is generated. In this case, the second snapshot probably does not have many changes.
It is noted that the primary data volume 104 and the secondary data volume 108 can be any type of data storage, including, but not limited to, a single disk, a disk array (such as a RAID), or a storage area network (SAN). The main difference between the primary data volume 104 and the secondary data volume 108 lies in the structure of the data stored at each location. The primary volume 104 is typically an expensive, fast, and highly available storage subsystem, whereas the secondary volume 108 is typically cost-effective, high capacity, and comparatively slow (for example, ATA/SATA disks).
It is noted that the data protection unit 106 operates in the same manner, regardless of the particular construction of the protected computer system 100, 120, 140. The major difference between these deployment options is the manner and place in which a copy of each write is obtained. To those skilled in the art it is evident that other embodiments, such as the cooperation between a switch platform and an external server, are also feasible.
If the data protection unit 106 determines that the application is not in a consistent state, the process 200 returns to step 204 to monitor the state of the application (step 206). If the data protection unit 106 determines that the application is in a consistent state, which means the output temporarily stored in the memory is flushed into the primary data volume 104, the data protection unit 106 generates a snapshot and stores it in the secondary data volume 108 (step 208). If a system failure or other problem is detected at step 210, the data is restored using the last snapshot (step 212).
In typical recovery scenarios, it is necessary to examine how the primary volume looked like at multiple points in time before deciding which point to recover to. For example, consider a system that was infected by a virus. In order to recover from the virus, it is necessary to examine the primary volume as it was at different points in time to find the latest recovery point where the system was not yet infected by the virus.
A host computer 102 runs an application (step 302). The output generated by the host computer 102 is first stored in a memory and later flushed into a primary data volume 104. A data protection unit 106 generates a snapshot of the data and stores the snapshot in a secondary data volume 108 (step 304). The snapshots may be generated periodically or non-periodically depending on a snapshot generation policy. If a system failure or other problem is detected at step 306, the data protection unit 106 inspects a log of previous writes. The application updates a log of writes every time it gets output to be recorded in the memory and the primary data volume 104. The data protection unit 106 replays the log of write streams and determines an exact point in time when the application was in a consistent state. The data protection unit 106 identifies a snapshot in a consistent state among a plurality of snapshots (step 310) and restores the data based on the consistent state snapshot (step 312).
The snapshot from which the system is recovered does not have to be absolutely consistent. A snapshot which is generated slightly before or after the consistent point may be utilized. Basically, a snapshot which minimizes the replay of the log is the best snapshot for recovery. The best snapshot may be different from one application to another. Alternatively, when the consistency determination is made in real-time for an application, it is necessary to use a host resident agent that reads non-persistent state information from a memory rather than only analyzing the write data stream.
Still referring to
It is noted that the present invention may be implemented in a computer-readable storage medium containing a set of instructions for a processor or general purpose computer. For example, the set of instructions may include a snapshot code segment, an email recovery code segment, an email parsing code segment, and a communication code segment.
As explained above in the description of
As mentioned above, the present invention can be implemented in a computer program tangibly embodied in a computer-readable storage medium for execution by a processor or a general purpose computer; and method steps of the invention can be performed by a processor executing a program of instructions to perform functions of the invention by operating on input data and generating output data. Suitable processors include, by way of example, both general and special purpose processors. Typically, a processor will receive instructions and data from a read-only memory, a random access memory, and/or a storage device. Storage devices suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks and digital versatile disks (DVDs). In addition, while the illustrative embodiments may be implemented in computer software, the functions within the illustrative embodiments may alternatively be embodied in part or in whole using hardware components such as Application Specific Integrated Circuits, Field Programmable Gate Arrays, or other hardware, or in some combination of hardware components and software components.
While specific embodiments of the present invention have been shown and described, many modifications and variations could be made by one skilled in the art without departing from the scope of the invention. The above description serves to illustrate and not limit the particular invention in any way.
This application is a continuation-in-part of U.S. patent application Ser. No. 11/051,793, filed Feb. 4, 2005 which claims the benefit of U.S. Provisional Application Nos. 60/542,011, filed Feb. 5, 2004 and Ser. No. 60/541,626, filed Feb. 4, 2004 which are incorporated by reference as if fully set forth herein.
Number | Name | Date | Kind |
---|---|---|---|
4635145 | Horie et al. | Jan 1987 | A |
4727512 | Birkner et al. | Feb 1988 | A |
4775969 | Osterlund | Oct 1988 | A |
5163148 | Walls | Nov 1992 | A |
5235695 | Pence | Aug 1993 | A |
5269022 | Shinjo et al. | Dec 1993 | A |
5297124 | Plotkin et al. | Mar 1994 | A |
5408470 | Rothrock et al. | Apr 1995 | A |
5438674 | Keele et al. | Aug 1995 | A |
5455926 | Keele et al. | Oct 1995 | A |
5485321 | Leonhardt et al. | Jan 1996 | A |
5555371 | Duyanovich et al. | Sep 1996 | A |
5638509 | Dunphy et al. | Jun 1997 | A |
5666538 | DeNicola | Sep 1997 | A |
5673382 | Cannon et al. | Sep 1997 | A |
5774292 | Georgiou et al. | Jun 1998 | A |
5774643 | Lubbers et al. | Jun 1998 | A |
5774715 | Madany et al. | Jun 1998 | A |
5805864 | Carlson et al. | Sep 1998 | A |
5809511 | Peake | Sep 1998 | A |
5809543 | Byers et al. | Sep 1998 | A |
5835953 | Ohran | Nov 1998 | A |
5854720 | Shrinkle et al. | Dec 1998 | A |
5857208 | Ofek | Jan 1999 | A |
5864346 | Yokoi et al. | Jan 1999 | A |
5872669 | Morehouse et al. | Feb 1999 | A |
5875479 | Blount et al. | Feb 1999 | A |
5911779 | Stallmo et al. | Jun 1999 | A |
5949970 | Sipple et al. | Sep 1999 | A |
5961613 | DeNicola | Oct 1999 | A |
5963971 | Fosler et al. | Oct 1999 | A |
5974424 | Schmuck et al. | Oct 1999 | A |
6021408 | Ledain et al. | Feb 2000 | A |
6023709 | Anglin et al. | Feb 2000 | A |
6029179 | Kishi | Feb 2000 | A |
6041329 | Kishi | Mar 2000 | A |
6044442 | Jesionowski | Mar 2000 | A |
6049848 | Yates et al. | Apr 2000 | A |
6061309 | Gallo et al. | May 2000 | A |
6067587 | Miller et al. | May 2000 | A |
6070224 | LeCrone et al. | May 2000 | A |
6098148 | Carlson | Aug 2000 | A |
6128698 | Georgis | Oct 2000 | A |
6131142 | Kamo et al. | Oct 2000 | A |
6131148 | West et al. | Oct 2000 | A |
6134660 | Boneh et al. | Oct 2000 | A |
6163856 | Dion et al. | Dec 2000 | A |
6173293 | Thekkath et al. | Jan 2001 | B1 |
6173359 | Carlson et al. | Jan 2001 | B1 |
6195730 | West | Feb 2001 | B1 |
6225709 | Nakajima | May 2001 | B1 |
6247096 | Fisher et al. | Jun 2001 | B1 |
6260110 | LeCrone et al. | Jul 2001 | B1 |
6266784 | Hsiao et al. | Jul 2001 | B1 |
6269423 | Kishi | Jul 2001 | B1 |
6269431 | Dunham | Jul 2001 | B1 |
6282609 | Carlson | Aug 2001 | B1 |
6289425 | Blendermann et al. | Sep 2001 | B1 |
6292889 | Fitzgerald et al. | Sep 2001 | B1 |
6301677 | Squibb | Oct 2001 | B1 |
6304880 | Kishi | Oct 2001 | B1 |
6317814 | Blendermann et al. | Nov 2001 | B1 |
6324497 | Yates et al. | Nov 2001 | B1 |
6327418 | Barton | Dec 2001 | B1 |
6336163 | Brewer et al. | Jan 2002 | B1 |
6336173 | Day et al. | Jan 2002 | B1 |
6339778 | Kishi | Jan 2002 | B1 |
6341329 | LeCrone et al. | Jan 2002 | B1 |
6343342 | Carlson | Jan 2002 | B1 |
6353837 | Blumenau | Mar 2002 | B1 |
6360232 | Brewer et al. | Mar 2002 | B1 |
6389503 | Georgis et al. | May 2002 | B1 |
6397307 | Ohran | May 2002 | B2 |
6408359 | Ito et al. | Jun 2002 | B1 |
6487561 | Ofek et al. | Nov 2002 | B1 |
6496791 | Yates et al. | Dec 2002 | B1 |
6499026 | Rivette et al. | Dec 2002 | B1 |
6557073 | Fujiwara | Apr 2003 | B1 |
6557089 | Reed et al. | Apr 2003 | B1 |
6578120 | Crockett et al. | Jun 2003 | B1 |
6615365 | Jenevein et al. | Sep 2003 | B1 |
6625704 | Winokur | Sep 2003 | B2 |
6654912 | Viswanathan et al. | Nov 2003 | B1 |
6658435 | McCall | Dec 2003 | B1 |
6694447 | Leach et al. | Feb 2004 | B1 |
6725331 | Kedem | Apr 2004 | B1 |
6766520 | Rieschl et al. | Jul 2004 | B1 |
6779057 | Masters et al. | Aug 2004 | B2 |
6779058 | Kishi et al. | Aug 2004 | B2 |
6779081 | Arakawa et al. | Aug 2004 | B2 |
6816941 | Carlson et al. | Nov 2004 | B1 |
6816942 | Okada et al. | Nov 2004 | B2 |
6834324 | Wood | Dec 2004 | B1 |
6850964 | Brough et al. | Feb 2005 | B1 |
6877016 | Hart et al. | Apr 2005 | B1 |
6898600 | Fruchtman et al. | May 2005 | B2 |
6915397 | Lubbers et al. | Jul 2005 | B2 |
6931557 | Togawa | Aug 2005 | B2 |
6950263 | Suzuki et al. | Sep 2005 | B2 |
6973369 | Trimmer et al. | Dec 2005 | B2 |
6973534 | Dawson | Dec 2005 | B2 |
6978283 | Edwards et al. | Dec 2005 | B1 |
6978325 | Gibble | Dec 2005 | B2 |
7007043 | Farmer et al. | Feb 2006 | B2 |
7020779 | Sutherland | Mar 2006 | B1 |
7032126 | Zalewski et al. | Apr 2006 | B2 |
7055009 | Factor et al. | May 2006 | B2 |
7072910 | Kahn et al. | Jul 2006 | B2 |
7096331 | Haase et al. | Aug 2006 | B1 |
7100089 | Phelps | Aug 2006 | B1 |
7111136 | Yamagami | Sep 2006 | B2 |
7111194 | Schoenthal et al. | Sep 2006 | B1 |
7127388 | Yates et al. | Oct 2006 | B2 |
7127577 | Koning | Oct 2006 | B2 |
7152077 | Veitch et al. | Dec 2006 | B2 |
7152078 | Yamagami | Dec 2006 | B2 |
7155465 | Lee et al. | Dec 2006 | B2 |
7155586 | Wagner et al. | Dec 2006 | B1 |
7200726 | Gole et al. | Apr 2007 | B1 |
7203726 | Hasegawa et al. | Apr 2007 | B2 |
7251713 | Zhang | Jul 2007 | B1 |
7315965 | Stager et al. | Jan 2008 | B2 |
7325159 | Stager et al. | Jan 2008 | B2 |
7346623 | Prahlad et al. | Mar 2008 | B2 |
7426488 | Stager et al. | Jul 2008 | B1 |
7426617 | Stager et al. | Sep 2008 | B2 |
7490103 | Stager et al. | Feb 2009 | B2 |
20010047447 | Katsuda | Nov 2001 | A1 |
20020004835 | Yarbrough | Jan 2002 | A1 |
20020016827 | McCabe et al. | Feb 2002 | A1 |
20020026595 | Saitou et al. | Feb 2002 | A1 |
20020091670 | Hitz et al. | Jul 2002 | A1 |
20020095557 | Constable et al. | Jul 2002 | A1 |
20020144057 | Li et al. | Oct 2002 | A1 |
20020163760 | Lindsey et al. | Nov 2002 | A1 |
20020166079 | Ulrich et al. | Nov 2002 | A1 |
20020199129 | Bohrer et al. | Dec 2002 | A1 |
20030004980 | Kishi et al. | Jan 2003 | A1 |
20030005313 | Gammel et al. | Jan 2003 | A1 |
20030037211 | Winokur | Feb 2003 | A1 |
20030046260 | Satyanarayanan et al. | Mar 2003 | A1 |
20030120476 | Yates et al. | Jun 2003 | A1 |
20030120676 | Holavanahalli et al. | Jun 2003 | A1 |
20030126136 | Omoigui | Jul 2003 | A1 |
20030126388 | Yamagami | Jul 2003 | A1 |
20030135672 | Yip et al. | Jul 2003 | A1 |
20030149700 | Bolt | Aug 2003 | A1 |
20030158766 | Mital et al. | Aug 2003 | A1 |
20030182301 | Patterson et al. | Sep 2003 | A1 |
20030182350 | Dewey | Sep 2003 | A1 |
20030188208 | Fung | Oct 2003 | A1 |
20030217077 | Schwartz et al. | Nov 2003 | A1 |
20030225800 | Kavuri | Dec 2003 | A1 |
20040015731 | Chu et al. | Jan 2004 | A1 |
20040098244 | Dailey et al. | May 2004 | A1 |
20040103147 | Flesher et al. | May 2004 | A1 |
20040158766 | Liccione et al. | Aug 2004 | A1 |
20040167903 | Margolus et al. | Aug 2004 | A1 |
20040168034 | Homma et al. | Aug 2004 | A1 |
20040168057 | Margolus et al. | Aug 2004 | A1 |
20040181388 | Yip et al. | Sep 2004 | A1 |
20040181707 | Fujibayashi | Sep 2004 | A1 |
20040267836 | Armangau et al. | Dec 2004 | A1 |
20050010529 | Zalewski et al. | Jan 2005 | A1 |
20050044162 | Liang et al. | Feb 2005 | A1 |
20050063374 | Rowan et al. | Mar 2005 | A1 |
20050065962 | Rowan et al. | Mar 2005 | A1 |
20050066118 | Perry et al. | Mar 2005 | A1 |
20050066222 | Rowan et al. | Mar 2005 | A1 |
20050066225 | Rowan et al. | Mar 2005 | A1 |
20050076264 | Rowan et al. | Mar 2005 | A1 |
20050076070 | Mikami | Apr 2005 | A1 |
20050076261 | Rowan et al. | Apr 2005 | A1 |
20050076262 | Rowan et al. | Apr 2005 | A1 |
20050097260 | McGovern et al. | May 2005 | A1 |
20050108302 | Rand et al. | May 2005 | A1 |
20050144407 | Colgrove et al. | Jun 2005 | A1 |
20050182910 | Stager et al. | Aug 2005 | A1 |
20050240813 | Okada et al. | Oct 2005 | A1 |
20060005074 | Yanai et al. | Jan 2006 | A1 |
20060010177 | Kodama | Jan 2006 | A1 |
20060047895 | Rowan et al. | Mar 2006 | A1 |
20060047902 | Passerini | Mar 2006 | A1 |
20060047903 | Passerini | Mar 2006 | A1 |
20060047905 | Matze et al. | Mar 2006 | A1 |
20060047925 | Perry | Mar 2006 | A1 |
20060047989 | Delgado et al. | Mar 2006 | A1 |
20060047998 | Darcy | Mar 2006 | A1 |
20060047999 | Passerini et al. | Mar 2006 | A1 |
20060143376 | Matze et al. | Jun 2006 | A1 |
20060235907 | Kathuria et al. | Oct 2006 | A1 |
20060259160 | Hood et al. | Nov 2006 | A1 |
Number | Date | Country |
---|---|---|
2 256 934 | Jun 2000 | CA |
0 845 733 | Jun 1998 | EP |
0 869 460 | Oct 1998 | EP |
1 058 254 | Dec 2000 | EP |
1 122 910 | Aug 2001 | EP |
1 233 414 | Aug 2002 | EP |
1333379 | Apr 2006 | EP |
1 671 231 | Jun 2006 | EP |
1 671231 | Jun 2006 | EP |
WO9903098 | Jan 1999 | WO |
WO9906912 | Feb 1999 | WO |
WO-0118633 | Mar 2001 | WO |
WO-03067438 | Aug 2003 | WO |
WO-2004084010 | Sep 2004 | WO |
WO2005031576 | Apr 2005 | WO |
WO2006023990 | Mar 2006 | WO |
WO2006023991 | Mar 2006 | WO |
WO2006023992 | Mar 2006 | WO |
WO2006023993 | Mar 2006 | WO |
WO2006023994 | Mar 2006 | WO |
WO2006023995 | Mar 2006 | WO |
Number | Date | Country | |
---|---|---|---|
20060195493 A1 | Aug 2006 | US |
Number | Date | Country | |
---|---|---|---|
60542011 | Feb 2004 | US | |
60541626 | Feb 2004 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11051793 | Feb 2005 | US |
Child | 11413327 | US |