MANAGING SNAPSHOT/BACKUP COLLECTIONS IN FINITE DATA STORAGE

Information

  • Patent Application
  • 20030220948
  • Publication Number
    20030220948
  • Date Filed
    January 21, 2003
    21 years ago
  • Date Published
    November 27, 2003
    20 years ago
Abstract
A method of managing a collection of members, wherein each member has data preserved in a finite data storage and each member belongs to a different data group of a temporal data store with which the finite data storage is associated, includes the step of deleting the oldest member of the collection upon the addition of a new member to the collection when the number of members of the collection exceeds a predetermined maximum number. The new member of the collection may be added to the collection after a member is deleted if the predetermined maximum number otherwise would be exceeded or, alternatively, the new member of the collection may be added to the collection, the predetermined maximum number of member then exceeded, and a member then deleted.
Description


BACKGROUND OF INVENTION

[0002] Systems that utilize a finite data storage for preserving data include: (1) an online data archive, in which a backup copy of data is physically created and maintained in a finite data storage; and (2) a snapshot system, in which an image of data as it existed at a particular point in time is maintained by preserving data in a finite data storage (the data typically is saved into the finite data storage in a “copy-on-write” operation).


[0003] Furthermore, in the online archive and snapshot system, the finite data storage typically is allocated a predetermined, fixed amount of storage capacity that it cannot exceed. Indeed, only recently have attempts been made at “growing” the finite data storage, as evidenced, for example, by U.S. Pat. No. 6,473,775, issued Oct. 29, 2002, which is incorporated herein by reference.


[0004] As with any system that preserves data utilizing a finite data storage, at some point, the finite data storage will be consumed. At this point, either (1) the online archive or snapshot system will fail, or (2) additional changes to the data that is subject to the online archive or to the snapshot will be denied. Furthermore, consumption of the finite data storage accelerates as additional backups are made and as additional snapshots are taken and maintained.


[0005] Accordingly, online archives are not preferred for preserving backups over extended periods of time. Indeed, the typical archive system is an offline archive system, which includes the copying of data subject to backup onto tape or optical disks and then the offsite storage of such backup media. Snapshots also are not preferred for maintaining images of the data over extended periods of time due to the data storage requirements (which are less than online archives, but nevertheless appreciable). Consequently, snapshot systems also are not considered an acceptable alternative to offline archives.


[0006] A need therefore exists for a method and system for managing a finite data storage for an online archive or snapshot system that avoids over extended periods of time system failure and/or denial of additional writes due to consumption of the finite data storage. A need exists especially for such a method and system that improves the feasibility of utilizing online archives and snapshot systems for data archive purposes. One or more embodiments of the present invention meet one or more of these needs.



SUMMARY OF INVENTION

[0007] In a first aspect of the present invention, a finite data storage of a temporal data store is managed in accordance with a method of the present invention. The temporal data store includes one or more data groups, and each data group itself includes a plurality of members data of each of which is preserved in the finite data storage. Each data group further has associated therewith a time point, and each member of each data group has associated therewith a preservation weight. The method includes the step of, upon detecting that finite data storage consumption has reached a first level, then for each member in order of increasing preservation weight beginning with the one or more members having the lowest preservation weight, successively deleting each member in increasing chronological order beginning with the oldest member first, until the finite data storage consumption has reached a second lower level. Each member of the data group may include a snapshot or a backup.


[0008] In features of this aspect of the present invention, and with specific regard to this method, the second level may be less than the first level, the second level may be the same as the first level, and the second level may be a threshold capacity of the finite data storage. The first level also may be the effective capacity of the finite data storage, such as approximately 90% of the fixed capacity of the finite data storage.


[0009] Each data group also may include a snapshot of an object or a backup of a source.


[0010] Furthermore a snapshot in each data group may be a snapshot of the same object at different points in time, just at different points in time, and a backup in each data group may be a backup of the same source, just at different points in time. For each such snapshot or backup respectively, the preservation weight may be assigned.


[0011] Furthermore, the object or source may include a logical container, a computer-readable medium, a portion of a logical container, or a portion of a computer-readable medium.


[0012] In yet additional features, each preservation weight may be assigned from a predetermined range of preservation weights. Furthermore, a member having the highest preservation weight of the predetermined range may be excepted from the step of being deleted. Preferably, however, a member having the highest preservation weight of the predetermined range is deleted when all members of all data groups data of which is stored in the finite data storage have the highest preservation weight of the predetermined range and the finite data storage consumption has exceeded the first level. Furthermore, an error message preferably is returned if all members of all data groups data of which is stored in the finite data storage have the highest preservation weight of the predetermined range and the finite data storage consumption has exceeded the first level. The error message preferably includes a notification to a system administrator.


[0013] In yet another method of the present invention, a finite data storage used to store snapshots is managed wherein each snapshot has associated therewith a snapshot time and a preservation weight. This method includes the step of, upon detecting that the finite data storage consumption has reached a first level, then successively deleting snapshots as a function of the snapshot times and preservation weights until the finite data storage consumption has reached a second level. This method preferably is automatically performed by a computer without user intervention.


[0014] In another method of the present invention, a finite data storage used to store backups is managed wherein each backup has associated therewith a backup time and a preservation weight. This method includes the step of, upon detecting that the finite data storage consumption has reached a first level, then successively deleting backups as a function of the backup times and preservation weights until the finite data storage consumption has reached a second level. This method preferably is automatically performed by a computer without user intervention.


[0015] In a second aspect of the present invention, a method of managing a collection of members, wherein each member has data preserved in a finite data storage and each member belongs to a different data group of a temporal data store with which the finite data storage is associated, includes the step of deleting the oldest member of the collection upon the addition of a new member to the collection when the number of members of the collection exceeds a predetermined maximum number. The maximum number of members in the collection may be predetermined by a user or administrator of the computer system. The new member of the collection may be added to the collection after a member is deleted if the predetermined maximum number otherwise would be exceeded or, alternatively, the new member of the collection may be added to the collection, the predetermined maximum number of member then exceeded, and a member then deleted. Preferably, the method is automatically performed by a computer without user intervention.


[0016] The invention further includes computer-readable medium having computer-readable instructions for performing any of the methods of the present invention, as well as a computer configuration including computer-readable medium having computer-readable instructions for performing a method of the present invention.


[0017] Further and additional features of the present invention are disclosed and will become apparent from the following description of preferred embodiments thereof.


[0018] Moreover, while a preferred operating environment of the present invention is a conventional computer or server with access to a finite data storage, such as a hard disk drive, it should be understood that the present invention can likewise be implemented in any electronic device or computer configuration in which data is archived, copied, or stored for backup or later retrieval purposes.







BRIEF DESCRIPTION OF DRAWINGS

[0019] Further features and benefits of the present invention will be apparent from a detailed description of preferred embodiments thereof taken in conjunction with the following drawings, wherein similar elements are referred to with similar reference numbers, and wherein:


[0020]
FIG. 1 is an overview of a preferred system of the present invention;


[0021]
FIG. 2 is an overview of another preferred system of the present invention;


[0022]
FIG. 3 is a flowchart of first preferred method of the present invention;


[0023]
FIG. 4 is a flowchart of a function of the preferred method of FIG. 3;


[0024]
FIG. 5 is an alternative flowchart of the function described in FIG. 4; and


[0025]
FIG. 6 is a flowchart of a second preferred method of the present invention.







DETAILED DESCRIPTION

[0026] As a preliminary matter, it will readily be understood by those persons skilled in the art that the present invention is susceptible of broad utility and application in view of the following detailed description of preferred embodiments of the present invention. Many devices, methods, embodiments, and adaptations of the present invention other than those herein described, as well as many variations, modifications, and equivalent arrangements thereof, will be apparent from or reasonably suggested by the present invention and the following detailed description thereof, without departing from the substance or scope of the present invention. Accordingly, while the present invention is described herein in detail in relation to preferred embodiments, it is to be understood that this disclosure is illustrative and exemplary and is made merely for purposes of providing a full and enabling disclosure of preferred embodiments of the invention. The disclosure herein is not intended nor is to be construed to limit the present invention or otherwise to exclude any such other embodiments, adaptations, variations, modifications and equivalent arrangements, the present invention being limited only by the claims appended hereto and the equivalents thereof.


[0027] Turning first to FIG. 1, a computer configuration 100 for implementing preferred methods of the present invention is illustrated. This computer configuration 100 comprises a storage medium 110. The storage medium 110 is a data storage area of the computer configuration 100 from which data is accessible. The storage medium 110 thus may include, for example, the entirety or any portion of a hard disk drive, computer RAM-memory, a USB data storage device, a serial data storage device, a parallel data storage device, or firewire data storage device. As shown in FIG. 1, a finite data storage 112 comprising a fixed percentage or portion of the storage medium 110 is allocated to preserving data of backups and/or snapshots. The remaining storage area 114 of the storage medium 110 stores primary or current data. The system 100 further comprises a central processor 120, which contains the operating system of the computer configuration 100 and which interacts with other peripheral devices (not shown) of a conventional computer configuration, such as user or data interfaces (e.g. terminals, monitors, keyboards, mouse(s), scanners), output devices (e.g., printers), and I/O devices (e.g. modems). The central processor 120 interacts with the storage medium 110 by means of a data manager 130. The data manager 130 comprises software and/or hardware for managing the transfer of data between the processor 120 and the storage medium 110 and for controlling the writing, reading, and deletion of data to and from the storage medium 110. A preferred data manager is disclosed utilized by the source code set forth in the provisional patent application, previously incorporated herein by reference.


[0028] Turning briefly to FIG. 2, an alternative computer configuration 200 for implementing preferred methods of the present invention is illustrated. This computer configuration 200, which also comprises a conventional processor 220 and conventional peripheral devices (not shown), is essentially identical to the computer configuration 100 of FIG. 1; however, it comprises two separate storage media 210, 240. The first storage medium 210 is dedicated to the storage of the primary or current data, while the other storage medium 240 comprises the finite data storage of the invention dedicated to the storage of the backup and/or snapshot data. The central processor 220 interacts with both storage media 210,240 by means of data manager 230, which again comprises software and/or hardware for managing the transfer of data between the processor 120 and both storage media 210,240 and for controlling the writing, reading, and deletion of data to and from both storage media 210,240.


[0029] Regardless of whether data is being saved, read, or deleted to or from the finite data storage 112 of the storage medium 110, as shown in FIG. 1, or to the dedicated storage medium 240 as a whole, as shown in FIG. 2, for performance reasons and, in some cases, to prevent system failure, it is necessary to ensure that 100% storage capacity (i.e., “maximum capacity”) is never consumed. Preferably, no more than 90% of the capacity (“effective capacity”) of the available data storage space should be consumed during operations. To continue successfully saving data to a finite data storage in which the consumption level has reached or exceeded its effective capacity, a system and methodology should be in place to determine what data may be deleted or overwritten to make room for the new data.


[0030] Turning now to FIG. 3, a method 300 for use by a system in managing the preservation of data to the finite data storage is illustrated. First, the system captures (Step 310) or obtains (if merely provided to the system) new data of a snapshot or of a backup (hereinafter generically referred to as “member”) for preserving of the data for future reference. A time reference of the data is identified by identifying the member to which the new data pertains (Step 320), which comprises the point in time of the creation of the member (i.e., the tine that the snapshot was taken or the backup made). If the member is new, i.e., no data for that member is currently preserved in the finite data storage, then a preservation weight of the member is identified (Step 330).


[0031] The value of the preservation weight is preferably within a predetermined range of possible preservation weights. Preferably, the lower the preservation weight, the less important it is that the member, and thus the data of the member, be preserved by the system for any extended period of time. Likewise, the higher the preservation weight, the greater the importance of preservation of the member's data. In one embodiment, the highest preservation weight indicates that the data should be kept in the finite data storage permanently (i.e., it should never be deleted, if possible). Obviously, a system that merely reverses the scale such that the lowest preservation weight indicates the greater importance of the data, and vice versa, is an identical priority system and within the scope of the present invention. The specific value of the preservation weight may be assigned by default, by a system administrator, or by a user with access to the underlying computer configuration and with rights to assign preservation weights to particular snapshots or backups. The preservation weight may also be automatically determined in accordance with a predetermined function, which may be based on the time of creation of the member, the type of information represented by the data of the member, or the type of user for or by which the member was created. Additionally, preferably a particular preservation weight assigned to a member may be changed, for example, by a user or administrator after data thereof is stored in the finite data storage. In this scenario, all members are sorted prior to performing any deletion each time the consumption level of the finite data storage reaches or exceeds a trigger level for initiating the deletion subroutine.


[0032] Next, the capacity of the finite data storage is checked (Step 340). A determination then is made whether the effective capacity of the finite data storage has been reached or exceeded (Step 350). If the consumption level of the finite data storage has reached or exceeded the effective capacity, then the new data is saved (Step 360) in or to the finite data storage. If the consumption level of the finite data storage has reached or exceeded the effective capacity, then the system proceeds (Step 400) to “create space” on the finite data storage to accommodate the new data, as described with reference to FIG. 4.


[0033] Turning now to FIG. 4, the steps of the “create space” function 400 are illustrated. First, all members, data of which is stored in the finite data storage, are sorted (Step 410) by their preservation weights. Then, for any members having the same preservation weights, such members are then sorted (Step 420) by their associated creation times. For example, all members having a preservation weight of “1” are sorted by time, all members with a preservation weight of “2” are sorted by time, and so on up to all members with the highest preservation weights. Then, for the member or members having the oldest time (earliest date of creation), the member with the lowest preservation weight is deleted (Step 430). Furthermore, inherent with deletion of the member, the data for that member that then is being preserved in the finite data storage (and that is not being preserved for an other member) also is deleted. Accordingly, space is freed within the finite data storage and the level of consumption of the finite data storage decreases.


[0034] It should be understood that the step of “deleting” can mean actual “removal” or overwriting of the data from the finite data storage or, more likely, merely designating the address space of the data being deleted as “unoccupied” or “not in use” by the computer operating system. Indeed, typically data on a storage medium is not actually deleted or lost until new data is written over to the same location.


[0035] Once a member has been deleted and its associated data deleted from the finite data storage, the system determines (Step 440) whether the consumption level of the finite data storage now is below a “threshold” capacity. Preferably, the threshold capacity is established at or below the effective capacity of the finite data storage. Thus, if the deletion of the previous member and its associated preserved data in the finite data storage freed up sufficient capacity therein, then the create space function ends and the method returns (Step 360) to save the new data to the finite data storage. On the other hand, if the deletion of the member and data thereof in the. finite data storage did not free up sufficient space in the finite data storage for the consumption level to fall below the “threshold” capacity, then the system returns to Step 430 to delete the next member having the then lowest preservation weight that is the oldest.


[0036]
FIG. 5 illustrates an alternative flowchart to that of FIG. 4. In particular, for those embodiments in which members can be designated as “permanent” and thereby excepted from the afore described deletion step, it is possible for a situation to arise in which consumption of the finite data storage has exceeded the effective capacity but that the only member having data preserved in the finite data storage are those that have been designated as permanent. As discussed previously, in one embodiment, the highest available or allowed preservation weight may be used to designate those particular members that are to be considered permanent. In this situation, the “create space” function 400 proceeds with its sorting steps (Step 410 and 420), as described previously, then the system determines (Step 424) whether all of the members having data remaining in the finite data storage are designated as “permanent.” If not, then the system proceeds with the step of deleting (Step 430) the member with the lowest preservation weight that is the oldest, as discussed in FIG. 4. However, in situations where the determination in Step 424 is positive, the system preferably returns (Step 428) an error message or other error indication. Such an error message or indication could be a notification to the administrator of the error, a notification that no members are being created and/or preserved, writing to an error log, or causing the system to crash. As indicated by the A in the circle jumper, if any of these errors occurs, the process returns to FIG. 3 and simply ends without recording the new data to the finite data storage.


[0037] In yet another alternative embodiment, not shown, if the determination in Step 424 is positive, the system may be preprogrammed by an administrator or user to override the “permanent” designation and treat the “permanent” members as any other members, subjecting them to the same deletion step. Ways of overriding the “permanent” designation include (i) re-designates all data sets with the highest allowed preservation weight from being “permanent” (i.e., not subject to deletion) to being “non-permanent” (i.e., having the highest priority but eligible for deletion) or (ii) re-designates the oldest member as “non-permanent,” deleting it, and then determining whether sufficient space in the finite data storage was freed. Furthermore, no data of new members is permitted to be written to the finite data storage during this time period or, of new data of new members is written, the after each deletion of a member all of the members are resorted in accordance with steps 410 and 420.


[0038] In a second aspect of the present invention, it is desirable to manage members that have been organized in or associated with predetermined “collections.” For example, it may be desirable to maintain only the last five days of all work saved to a server, the last twelve months of month-end reports for the Accounting Department, the last ten “end of the week” account receivable reports, and so on. In each of these example, the source of the backup or object of the snapshot is the same. The difference is that the backup is made or snapshot taken at different, periodic times (e.g., every five days, every month's end, every week's end). The members for the same source are object that differ only in time are referred to herein as a “collection.” The maximum number of members for a collection can be set by a system administrator or by a user, or by default, when a program is setup to take snapshots or to make backups on an ongoing basis. Thus, although the capacity of the finite data storage may be, but is not necessarily, the primary issue of concern, it is generally desirable to limit the number of members of a collection in order to efficiently manage the finite data storage.


[0039] With reference to FIG. 6, a method 600 of managing collections is illustrated. First, the system captures (Step 610) or obtains (if merely provided to the system) new data of a member for preserving of the data in the finite data storage for future reference. A time reference of the data is identified by identifying the member to which the new data pertains (Step 620), which comprises the point in time of the creation of the member (i.e., the tine that the snapshot was taken or the backup made). Whether the member of the data is part of a collection is also identified. Whether the member is part of a collection also is identified (Step 630), and the number of maximum members of the collection is determined (Step 640), if applicable. The new data is saved to the finite data storage and the number of members of the collection is determined (Step 660). A determination is then made (Step 670) whether the number of members in the collection exceeds the maximum allowed. If the maximum allowed is exceeded, then the oldest member in the collection is deleted (Step 680) and then the method ends. If the maximum allowed is not exceeded, then the method ends without deleting the oldest member of the collection.


[0040] Of course, it is contemplated that a user or administrator also may change a member's association with a collection so as to preclude it's deletion from the collection management method disclosed herein. Typically this may be done by simply renaming the member, changing an attribute maintained in association with the member, and/or saving the member apart from the other member of the collection. Thus, a member may exist outside of a collection even if it has the same source or object of other members of a collection. A user or administrator may also change a member's association with a data group so as to preclude its consideration with other members for deletion pursuant to the method described above and shown in FIGS. 3-5 for managing the finite data storage, as will be apparent to one having ordinary skill in the art. Moreover, a user or administrator may delete or remove a member without waiting for its deletion under the preferred embodiments described herein.


[0041] In view of the foregoing detailed description of preferred embodiments of the present invention, it readily will be understood by those persons skilled in the art that the present invention is susceptible of broad utility and application. While various aspects have been described in the context of HTML and web page uses, the aspects may be useful in other contexts as well. Many embodiments and adaptations of the present invention other than those herein described, as well as many variations, modifications, and equivalent arrangements, will be apparent from or reasonably suggested by the present invention and the foregoing description thereof, without departing from the substance or scope of the present invention. Furthermore, any sequence(s) and/or temporal order of steps of various processes described and claimed herein are those considered to be the best mode contemplated for carrying out the present invention. It should also be understood that, although steps of various processes may be shown and described as being in a preferred sequence or temporal order, the steps of any such processes are not limited to being carried out in any particular sequence or order, absent a specific indication of such to achieve a particular intended result. In most cases, the steps of such processes may be carried out in various different sequences and orders, while still falling within the scope of the present inventions. In addition, some steps may be carried out simultaneously. Accordingly, while the present invention has been described herein in detail in relation to preferred embodiments, it is to be understood that this disclosure is only illustrative and exemplary of the present invention and is made merely for purposes of providing a full and enabling disclosure of the invention. The foregoing disclosure is not intended nor is to be construed to limit the present invention or otherwise to exclude any such other embodiments, adaptations, variations, modifications and equivalent arrangements, the present invention being limited only by the claims appended hereto and the equivalents thereof.


Claims
  • 1. A method of managing a collection of members, each member having data preserved in a finite data storage and each member belonging to a different data group of a temporal data store with which the finite data storage is associated, the method comprising the step of deleting the oldest member of the collection upon the addition of a new member to the collection when the number of members of the collection exceeds a predetermined maximum number.
  • 2. The invention of claim 1, wherein each member of the collection comprises a snapshot of an object.
  • 3. The invention of claim 2, wherein each member of the collection comprises a snapshot of the same object.
  • 4. The invention of claim 3, wherein the object comprises one of the group of a logical container, a computer-readable medium, a portion of a logical container, and a portion of a computer-readable medium.
  • 5. The invention of claim 1, wherein each member of the collection comprises a backup of a source.
  • 6. The invention of claim 5, wherein each backup of the collection comprises a backup copy of the same source.
  • 7. The invention of claim 6, wherein the object comprises one of the group of a logical container, a computer-readable medium, a portion of a logical container, and a portion of a computer-readable medium
  • 8. The method of claim 1, wherein the step of deleting comprises removing the data in the finite data storage of the member deleted.
  • 9. The method of claim 1, wherein the step of deleting comprises permitting the data in the finite data storage of the member deleted to be overwritten without preservation.
  • 10. The method of claim 1, wherein the new member of the collection is added to the collection after a member is deleted if the predetermined maximum number otherwise would be exceeded.
  • 11. The method of claim 1, wherein the new member of the collection is added to the collection, the predetermined maximum number of member is then exceeded, and a member is then deleted.
  • 12. A computer-readable medium having computer-readable instructions for performing the method of claim 1.
  • 13. A computer configuration comprising computer-readable medium having computer-readable instructions for performing the method of claim 1.
  • 14. The computer configuration of claim 13, wherein the method is automatically performed by a computer without user intervention.
  • 15. The computer configuration of claim 13, wherein the maximum number of members in the collection is predetermined by a user of the computer system.
  • 16. The computer configuration of claim 13, wherein the maximum number of members in the collection is predetermined by an administrator of the computer configuration.
  • 17. A method of managing a collection of snapshots of the same object, each snapshot being taken at a different point in time and having data preserved in a finite data storage, the method comprising the step of deleting the oldest snapshot of the collection upon the addition of a new snapshot to the collection when the number of snapshots in the collection exceeds a predetermined maximum number.
  • 18. A computer-readable medium having computer-readable instructions for performing the method of claim 17.
  • 19. A method of managing a collection of backups of the same source, each backup being made at a different point in time and having data preserved in a finite data storage, the method comprising the step of deleting the oldest backup of the collection upon the addition of a new backup to the collection when the number of backups in the collection exceeds a predetermined maximum number.
  • 20. A computer-readable medium having computer-readable instructions for performing the method of claim 19.
CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit under 35 U.S.C. § 119(e) of U.S. provisional patent application No. 60/350,434, titled, “Persistent Snapshot Management System,” filed Jan. 22, 2002, which is incorporated herein by reference.

Provisional Applications (1)
Number Date Country
60350434 Jan 2002 US