Method and apparatus for deleting data upon expiration

Information

  • Patent Grant
  • 7559088
  • Patent Number
    7,559,088
  • Date Filed
    Friday, February 4, 2005
    19 years ago
  • Date Issued
    Tuesday, July 7, 2009
    15 years ago
Abstract
A method and apparatus for efficiently deleting data including backup or snapshots upon expiration are disclosed. The data can be deleted even without physical access to the data. A data generation unit generates a data. Each data has an expiration time and should be deleted upon expiration. An encryption unit encrypts the data, and the encrypted data is stored in data storage. A controller monitors whether there is an expired data, and if there is an expired data, the controller deletes a key necessary for decrypting the expired data.
Description
FIELD OF INVENTION

The present invention relates to data generation and deletion. More particularly, the present invention is a method and apparatus for deleting data efficiently upon expiration. The present invention can be applied to any kind of data stored in any kind of system.


BACKGROUND

Backing up data is a process that generates a coherent copy of data. Backing up data has become more important as the amount of data has exploded in volume and the importance of electronic records has also greatly increased. Backups are performed for various reasons, such as to assure availability of data, to generate data archival, or to transport data to a distant location.


Many schemes have been developed to generate backup data. One data backup scheme is to generate point-in-time (PIT) copies of data. PIT copies which are generated are either hardware-based or software-based. A hardware-based PIT copy is a mirror of a primary volume which has been saved onto a secondary volume. A software-based PIT copy, called a “snapshot,” is a “picture” of a volume at the block level or a file system at the operating system level. Another data backup scheme is where a backup application sends full or incremental copies of data to tape.


Backup data is generated in accordance with a data backup policy. Backup copies are generated and stored in a storage media, and maintained for a certain period of time. Often, not only is a single copy generated, but multiple copies of one original data are generated and maintained in separate media. Because of regulatory requirements, companies have to keep certain backup copies for several years. Accordingly, the data backup policy typically sets an expiration time for each backup data. For example, a system may retain daily snapshots or backups for two months, weekly snapshots or backups for two years and monthly snapshots or backups for seven years. Once the expiration time has passed, the backup copies are deleted completely from record, and should not be available in the future.


Historically, magnetic tape has been used as the storage medium for backing up data because tape has been a much cheaper medium than a disk. In order to completely delete backup data stored in a tape, a system operator typically needs to access each tape through the backup application, delete the backup data in question, and run another backup procedure on the tape. This is a labor-intensive and expensive process. This is even more complicated if only certain pieces of data on the tape need to be expired or if the tape can not be easily located.


One problem with these prior art scenarios is that even after the expiration time, many copies across many tapes are not deleted completely and are still in existence. This may cause a problem. For example, when critical data that should have been deleted is obtained by an adversary in a lawsuit, this may cause a tremendous damage to companies.


Therefore, there is a need to efficiently and completely delete all expired data from record in a way that it is no longer recoverable. This is the case even when it is not easy to obtain direct access to the data.


SUMMARY

A method and apparatus for deleting data including backup data or snapshots upon expiration are disclosed. The present invention can be applied even where it is not possible to gain physical access to the data. A data generation unit generates data, (including backup data or snapshots). Each data has an expiration time which sets forth the desired time that the data will be deleted. An encryption unit encrypts the data, and the encrypted data is stored in a storage. A controller monitors whether an expiration time has passed and, if so the controller deletes a key necessary for decrypting the expired data.





BRIEF DESCRIPTION OF THE DRAWINGS

A more detailed understanding of the invention may be had from the following description of a preferred embodiment, given by way of example, and to be understood in conjunction with the accompanying drawings, wherein:



FIGS. 1A-1C are block diagrams of a data backup system in accordance with the present invention;



FIG. 2 is a block diagram of a data protection unit in accordance with the present invention; and



FIG. 3 is a flow diagram of process for deleting data upon expiration in accordance with the present invention.





DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention will be described with reference to the drawing FIGS. wherein like numerals represent like elements throughout.



FIG. 1A shows an example of a data backup system 100 that can be implemented in accordance with the present invention. The system 100 comprises a host computer 102, a primary data volume 104 (the primary data volume may also be referred to as a protected volume), a data protection unit 106, and a secondary data volume 108. The host computer 102 is coupled to the primary data volume 104 and to the data protection unit 106. The data protection unit 106 manages the secondary data volume 108, and generates and maintains backup data for data stored in the primary data volume 104. The configuration of the system 100 minimizes the lag time by writing directly to the primary data volume 104 and permits the data protection unit 106 to focus exclusively on managing the secondary data volume 108.


It should be noted that the primary data volume 104 and the secondary data volume 108 can be any type of data storage, including, but not limited to, a single disk, a disk array (such as a RAID), a tape drive, a tape library or a storage area network (SAN). The main difference between the primary data volume 104 and the secondary data volume 108 lies in the type of data that is stored on the device. The primary volume 104 is typically an expensive, fast, and highly available storage subsystem that stores the primary copy of the data, whereas the secondary volume 108 is typically a cost-effective, high capacity, and comparatively slow (for example, tape, ATA/SATA disks) storage subsystem that stores backup copies of the data.



FIG. 1B shows an alternative example of a system 120 that can be implemented in accordance with the present invention. The host computer 102 is directly connected to the data protection unit 106, which manages both the primary data volume 104 and the secondary data volume 108. The system 120 may be slower than the system 100 described with reference to FIG. 1A since the data protection unit 106 must manage both the primary data volume 104 and the secondary data volume 108. Although slower operation results in a higher latency for writes to the primary volume 104 in the system 120 and lowers the available bandwidth for use, such a configuration as shown in FIG. 1B may be acceptable in certain applications.



FIG. 1C shows another example of a system 140 that can be implemented in accordance with the present invention. The host computer 102 is connected to an intelligent switch 142. The switch 142 is connected to the primary data volume 104 and the data protection unit 106 which, in turn, manages the secondary data volume 108. The switch 142 includes the ability to host applications and contains some of the functionality of the data protection unit 106 in hardware, to assist in reducing system latency and improve bandwidth.


It should be noted that the configurations of the system shown in FIGS. 1A-1C are provided as an example, not as a limitation. The present invention can be applied not only to data backup systems but also to any kind of data generation system in which the generated data may need to be expired at some point. Hereinafter including FIGS. 2 and 3, the present invention will be explained with reference to a data backup system. However, it should be noted that it is obvious for those skilled in the art that the present can be equally applied to any data generation system and any kind of data, not limited to data backup system or backup data. Any other configuration, (e.g. a typical backup system, a virtual tape library or any other data generation system), may be implemented, and the data protection unit 106 operates in the same manner regardless of the particular configuration of the system 100, 120, 140. It should be also noted that the present invention may utilize only one data volume, instead of two, for storing an original data and/or a backup data. The primary difference between these examples is the manner and place in which a copy of each write is obtained. To those skilled in the art, it is evident that other embodiments, such as the cooperation between a switch platform and an external server, are also feasible. Accordingly, although two data volumes are shown, a single data volume may be used. Additionally, although two data volumes may be used, they may be stored on a single storage device.



FIG. 2 is a block diagram of the data protection unit 106 in accordance with the present invention. Backup data is generated, stored and deleted in accordance with a data backup policy. It should be noted that it is obvious to those skilled in the art that the present invention can be applied to any kind of data that needs to be retained and then expired, (e.g. archival data, financial data, email data, etc.). The data protection unit 106 controls generating, storing and deleting of backup data. The data protection unit 106 comprises a controller 112, a backup data generation unit 114, an encryption unit 116, and an encryption key storage 118.


The controller 112 provides overall control of generating, storing, and deleting backup data. The backup data generation unit 114 generates backup data, such as snapshots, under the control of the controller 112 as desired under the backup policy. The backup data is stored in a storage unit, such as a secondary volume 108 or tape. The backup data is input into the encryption unit 116. The encryption unit 116 performs both encryption of the backup data, and subsequent decryption of the encrypted backup data as necessary. Therefore, encrypted backup data is stored in storage, and the encrypted backup data may be restored from the storage in case of system failure. Encryption is performed by any means which is currently available or will be developed in the future.


The encryption unit 116 encrypts the backup data using an encryption key. The encryption key may be either a symmetric key or an asymmetric key. If a symmetric key is used, a same key is required for both encryption and decryption. If an asymmetric key is used, a pair of keys is used for encryption and decryption. The encryption unit 116 uses different encryption/decryption keys for each backup data block.


As the backup data generation unit 114 generates a backup copy, the encryption unit 116 generates a new encryption key and performs encryption on the new backup data using the new encryption key.


Alternatively, the encryption unit 116 may use same encryption/decryption keys for backup data having same expiration date. For example, if the backup data generation unit 114 generates twenty four (24) hourly snapshots and only one (1) snapshot is kept as a weekly snapshot, the encryption unit 116 may use same encryption/decryption keys for twenty three (23) snapshots and use different key for the other one (1) snapshot, so that the twenty three (23) hourly snapshots may be deleted at the same time by just deleting the common decryption key for the twenty three (23) hourly snapshots, which will be explained in detail hereinafter.


The encryption/decryption key is preferably separately stored and maintained in an encryption key storage 118. The encryption key storage 118 contains a list of the backup data blocks, related encryption and decryption keys necessary for encrypting and decrypting the backup data, and an expiration time for each backup data block.


Each backup data block is assigned an individual expiration time. The controller 112 monitors whether there is any expired backup data. If there is expired backup data, the controller 112 deletes the decryption key for the expired backup data from the encryption key storage 118, so that the encrypted backup data may not be decrypted in the future. The backup data itself may or may not be deleted. However, without the decryption key, the encrypted backup data is unreadable and therefore irretrievable.


A process 300 for data backup will be explained with reference to the flow diagram of FIG. 3. The process 300 deletes backup data efficiently upon expiration in accordance with the present invention. The backup data generation unit 114 creates backup data in accordance with a data backup policy (step 302). The encryption unit 116 encrypts the backup data before storing the backup data in storage (step 304). A controller 112 stores encryption/decryption key for the encrypted backup data along with an expiration time of the backup data in an encryption key storage 118. The controller 112 monitors whether there is an expired backup data (step 306). If the controller 112 identifies expired backup data, the controller 112 identifies a decryption key necessary for decrypting the expired backup data and deletes the key from the encryption key storage 118 (step 308). With this scheme, a data backup system can delete expired backup data efficiently, and the expired backup data is completely unrecoverable.


While specific embodiments of the present invention have been shown and described, many modifications and variations could be made by one skilled in the art without departing from the scope of the invention. The above description serves to illustrate and not limit the particular invention in any way. As stated hereinabove, the present invention may be applied to any data which needs to be deleted upon expiration, not just backup data, and any data generation system, not just backup systems, and it should be understood that such applications are obvious to those skilled in the art.

Claims
  • 1. A method for managing expiration operations of a backup management system to render expired backup data inaccessible, the method comprising: generating backup data from original data contained in a primary data volume as indicated by a data backup policy, wherein the data backup policy governs generation, storage, and expiration of the backup data;storing the backup data in a backup storage unit and encrypting the backup data, wherein the backup data but not the original data is encrypted;storing a decryption key and an associated expiration time in an encryption storage unit for each of a plurality of backup data, wherein:the decryption key is for decrypting the encrypted data;the expiration time is indicated by the data backup policy and indicates the time at which each of the plurality of backup data should be rendered inaccessible;a first backup data and a second backup data of the plurality of backup data are assigned an identical decryption key when the first backup data and the second backup data have an identical expiration time;the first backup data and the second backup data are assigned distinct decryption keys when the first backup data and the second backup data have different expiration times; anddeleting the decryption key at a time indicated by the expiration time for each of the plurality of backup data, such that an expired backup data becomes inaccessible even if the expired backup data is not deleted from the backup storage unit.
  • 2. The method of claim 1 wherein the backup data is a snapshot of the original data.
  • 3. The method of claim 1 wherein the backup data is a backup copy stored on a tape or a virtual tape.
  • 4. The method of claim 1 wherein each backup data is encrypted using a different encryption key.
  • 5. An apparatus for managing expiration operations of a backup management system to render expired backup data inaccessible, the apparatus comprising: a data generation unit to generate backup data from original data contained in a primary data volume, wherein a data backup policy governs generation, storage, and expiration of the backup data;an encryption unit to encrypt the backup data, wherein the encryption unit encrypts the backup data but not the original data;a backup storage unit to store the encrypted backup data;an encryption storage unit to store a decryption key and an expiration time for each of a plurality of backup data, wherein: the decryption key is for decrypting the encrypted data;the expiration time is indicated by the data backup policy and indicates the time and which each of the plurality of backup data should be rendered inaccessible;a first backup data and a second backup data of the plurality of backup data are assigned an identical decryption key when the first backup data and the second backup data have an identical expiration time;the first backup data and the second backup data are assigned distinct decryption keys when the first backup data and the second backup data have different expiration times; anda controller to delete the decryption key at a time indicated by the expiration time for each of the plurality of backup data, such that an expired backup data becomes inaccessible even if the expired backup data is not deleted from the backup storage unit.
  • 6. The apparatus of claim 5 wherein the backup data is a snapshot of the original data.
  • 7. The apparatus of claim 5 wherein the backup data is a backup copy stored on a tape or a virtual tape.
  • 8. The apparatus of claim 5 wherein each backup data is encrypted using a different encryption key.
  • 9. The apparatus of claim 5 wherein the backup data targeted to expire during the same period are encrypted using the same encryption key.
CROSS REFERENCE TO RELATED APPLICATIONS

This application claims priority from U.S. provisional application nos. 60/541,626 filed Feb. 4, 2004 and 60/542,011 filed Feb. 5, 2004, which are incorporated by reference as if fully set forth herein.

US Referenced Citations (175)
Number Name Date Kind
4635145 Horie et al. Jan 1987 A
4727512 Birkner et al. Feb 1988 A
4775969 Osterlund Oct 1988 A
5235695 Pence Aug 1993 A
5269022 Shinjo et al. Dec 1993 A
5297124 Plotkin et al. Mar 1994 A
5438674 Keele et al. Aug 1995 A
5455926 Keele et al. Oct 1995 A
5485321 Leonhardt et al. Jan 1996 A
5666538 DeNicola Sep 1997 A
5673382 Cannon et al. Sep 1997 A
5774292 Georgiou et al. Jun 1998 A
5774643 Lubbers et al. Jun 1998 A
5774715 Madany et al. Jun 1998 A
5805864 Carlson et al. Sep 1998 A
5809511 Peake Sep 1998 A
5809543 Byers et al. Sep 1998 A
5854720 Shrinkle et al. Dec 1998 A
5857208 Ofek Jan 1999 A
5864346 Yokoi et al. Jan 1999 A
5872669 Morehouse et al. Feb 1999 A
5875479 Blount et al. Feb 1999 A
5911779 Stallmo et al. Jun 1999 A
5949970 Sipple et al. Sep 1999 A
5961613 DeNicola Oct 1999 A
5963971 Fosler et al. Oct 1999 A
5974424 Schmuck et al. Oct 1999 A
6021408 Ledain et al. Feb 2000 A
6023709 Anglin et al. Feb 2000 A
6029179 Kishi Feb 2000 A
6041329 Kishi Mar 2000 A
6044442 Jesionowski Mar 2000 A
6049848 Yates et al. Apr 2000 A
6061309 Gallo et al. May 2000 A
6067587 Miller et al. May 2000 A
6070224 LeCrone et al. May 2000 A
6098148 Carlson Aug 2000 A
6128698 Georgis Oct 2000 A
6131142 Kamo et al. Oct 2000 A
6131148 West et al. Oct 2000 A
6134660 Boneh et al. Oct 2000 A
6163853 Dion et al. Dec 2000 A
6163856 Dion et al. Dec 2000 A
6173293 Thekkath et al. Jan 2001 B1
6173359 Carlson et al. Jan 2001 B1
6195730 West Feb 2001 B1
6225709 Nakajima May 2001 B1
6247096 Fisher et al. Jun 2001 B1
6260110 LeCrone et al. Jul 2001 B1
6266784 Hsiao et al. Jul 2001 B1
6269423 Kishi Jul 2001 B1
6269431 Dunham Jul 2001 B1
6282609 Carlson Aug 2001 B1
6289425 Blendermann et al. Sep 2001 B1
6292889 Fitzgerald et al. Sep 2001 B1
6301677 Squibb Oct 2001 B1
6304880 Kishi Oct 2001 B1
6317814 Blendermann et al. Nov 2001 B1
6324497 Yates et al. Nov 2001 B1
6327418 Barton Dec 2001 B1
6336163 Brewer et al. Jan 2002 B1
6336173 Day et al. Jan 2002 B1
6339778 Kishi Jan 2002 B1
6341329 LeCrone et al. Jan 2002 B1
6343342 Carlson Jan 2002 B1
6353837 Blumenau Mar 2002 B1
6360232 Brewer et al. Mar 2002 B1
6389503 Georgis et al. May 2002 B1
6397307 Ohran May 2002 B2
6408359 Ito et al. Jun 2002 B1
6487561 Ofek et al. Nov 2002 B1
6496791 Yates et al. Dec 2002 B1
6499026 Rivette et al. Dec 2002 B1
6557073 Fujiwara Apr 2003 B1
6557089 Reed et al. Apr 2003 B1
6578120 Crockett et al. Jun 2003 B1
6615365 Jenevein et al. Sep 2003 B1
6625704 Winokur Sep 2003 B2
6654912 Viswanathan et al. Nov 2003 B1
6658435 McCall Dec 2003 B1
6694447 Leach et al. Feb 2004 B1
6725331 Kedem Apr 2004 B1
6766520 Rieschl et al. Jul 2004 B1
6779057 Masters et al. Aug 2004 B2
6779058 Kishi et al. Aug 2004 B2
6779081 Arakawa et al. Aug 2004 B2
6816941 Carlson et al. Nov 2004 B1
6816942 Okada et al. Nov 2004 B2
6834324 Wood Dec 2004 B1
6850964 Brough et al. Feb 2005 B1
6877016 Hart et al. Apr 2005 B1
6898600 Fruchtman et al. May 2005 B2
6915397 Lubbers et al. Jul 2005 B2
6931557 Togawa Aug 2005 B2
6950263 Suzuki et al. Sep 2005 B2
6973369 Trimmer et al. Dec 2005 B2
6973534 Dawson et al. Dec 2005 B2
6978283 Edwards et al. Dec 2005 B1
6978325 Gibble et al. Dec 2005 B2
7020779 Sutherland Mar 2006 B1
7032126 Zalewski et al. Apr 2006 B2
7055009 Factor et al. May 2006 B2
7072910 Kahn et al. Jul 2006 B2
7096331 Haase et al. Aug 2006 B1
7100089 Phelps Aug 2006 B1
7111136 Yamagami Sep 2006 B2
7111194 Schoenthal et al. Sep 2006 B1
7127388 Yates et al. Oct 2006 B2
7127577 Koning et al. Oct 2006 B2
7152077 Veitch et al. Dec 2006 B2
7152078 Yamagami Dec 2006 B2
7155465 Lee et al. Dec 2006 B2
7155586 Wagner et al. Dec 2006 B1
7200726 Gole et al. Apr 2007 B1
7203726 Hasegawa et al. Apr 2007 B2
7346623 Prahlad et al. Mar 2008 B2
20010047447 Katsuda Nov 2001 A1
20020004835 Yarbrough Jan 2002 A1
20020016827 McCabe et al. Feb 2002 A1
20020026595 Saitou et al. Feb 2002 A1
20020091670 Hitz et al. Jul 2002 A1
20020095557 Constable et al. Jul 2002 A1
20020144057 Li et al. Oct 2002 A1
20020163760 Lindsay et al. Nov 2002 A1
20020166079 Ulrich et al. Nov 2002 A1
20020199129 Bohrer et al. Dec 2002 A1
20030004980 Kishi et al. Jan 2003 A1
20030005313 Gammel et al. Jan 2003 A1
20030037211 Winokur Feb 2003 A1
20030046260 Satyanarayanan et al. Mar 2003 A1
20030120476 Yates et al. Jun 2003 A1
20030120676 Holavanahalli et al. Jun 2003 A1
20030126136 Omoigui Jul 2003 A1
20030126388 Yamagami Jul 2003 A1
20030135672 Yip et al. Jul 2003 A1
20030149700 Bolt Aug 2003 A1
20030158766 Mital et al. Aug 2003 A1
20030182301 Patterson et al. Sep 2003 A1
20030182350 Dewey Sep 2003 A1
20030188208 Fung Oct 2003 A1
20030217077 Schwartz et al. Nov 2003 A1
20030225800 Kavuri Dec 2003 A1
20040015731 Chu et al. Jan 2004 A1
20040098244 Dailey et al. May 2004 A1
20040103147 Flesher et al. May 2004 A1
20040167903 Margolus et al. Aug 2004 A1
20040181388 Yip et al. Sep 2004 A1
20040181707 Fujibayashi Sep 2004 A1
20050010529 Zalewski et al. Jan 2005 A1
20050044162 Liang et al. Feb 2005 A1
20050063374 Rowan et al. Mar 2005 A1
20050065962 Rowan et al. Mar 2005 A1
20050066118 Perry et al. Mar 2005 A1
20050066222 Rowan et al. Mar 2005 A1
20050066225 Rowan et al. Mar 2005 A1
20050076070 Mikami Apr 2005 A1
20050076261 Rowan et al. Apr 2005 A1
20050076262 Rowan et al. Apr 2005 A1
20050076264 Rowan et al. Apr 2005 A1
20050097260 McGovern et al. May 2005 A1
20050108302 Rand et al. May 2005 A1
20050144407 Colgrove et al. Jun 2005 A1
20050182910 Stager et al. Aug 2005 A1
20050240813 Okada et al. Oct 2005 A1
20060010177 Kodama Jan 2006 A1
20060047895 Rowan et al. Mar 2006 A1
20060047902 Passerini Mar 2006 A1
20060047903 Passerini Mar 2006 A1
20060047905 Matze et al. Mar 2006 A1
20060047925 Perry Mar 2006 A1
20060047989 Delgado et al. Mar 2006 A1
20060047998 Darcy Mar 2006 A1
20060047999 Passerini et al. Mar 2006 A1
20060143376 Matze et al. Jun 2006 A1
20060259160 Hood et al. Nov 2006 A1
Foreign Referenced Citations (21)
Number Date Country
2 256 934 Jun 2000 CA
0 845 733 Jun 1998 EP
0 869 460 Oct 1998 EP
1 058 254 Dec 2000 EP
1 122 910 Aug 2001 EP
1 233 414 Aug 2002 EP
1333379 Apr 2006 EP
1 671 231 Jun 2006 EP
1 671231 Jun 2006 EP
WO9903098 Jan 1999 WO
WO9906912 Feb 1999 WO
WO-0118633 Mar 2001 WO
WO-03067438 Aug 2003 WO
WO-2004084010 Sep 2004 WO
WO2005031576 Apr 2005 WO
WO2006023990 Mar 2006 WO
WO2006023991 Mar 2006 WO
WO2006023992 Mar 2006 WO
WO2006023993 Mar 2006 WO
WO2006023994 Mar 2006 WO
WO2006023995 Mar 2006 WO
Related Publications (1)
Number Date Country
20060143443 A1 Jun 2006 US
Provisional Applications (2)
Number Date Country
60541626 Feb 2004 US
60542011 Feb 2004 US