1. Technical Field
The present invention relates generally to data storage and data processing. More specifically, the present invention relates to efficient data management within a hierarchical data storage system.
2. Description of the Related Art
In a hierarchical data storage system, fast-access storage devices are combined with arrays of relatively slower, less frequently accessed storage devices. As an example, frequently accessed data is generally stored on relatively expensive fast-access storage devices such as direct-access storage devices (DASD), while less frequently accessed data is generally stored on relatively less expensive, slower storage devices such as sequential-access storage media (e.g., tape media). The combination of storage devices in this way helps balance the costs of storing data with the speed at which the data must be accessed.
An example of a hierarchical storage system is a virtual tape storage system (VTS). Generally, a VTS is coupled to one or more host computers for the purpose of managing host data. A key function of the VTS is to provide long term storage of host data, while at the same time, provide relatively fast access to portions of that data. To accomplish this, a VTS typically includes a combination of slow access storage media such as tape cartridges for long term data storage, and storage media such as DASD, where portions of the data are “cached” for relatively fast access. Data which is to be stored long term is stored on tape cartridges, while data which may be frequently accessed is “cached” on the DASD.
In operation of a VTS, a host provides data to the VTS in the form of “volumes” (e.g., a volume may be a particular backup image of host data, archived data, data files, and the like). The VTS receives the volumes from the host and stores each volume on DASD for intermittent storage. A volume of data stored on DASD is referred to as a “virtual volume”. The VTS subsequently transfers the virtual volumes to tape cartridges. A volume of data stored on a tape cartridge is referred to as a “logical volume”. A number of logical volumes may be stored on a single tape cartridge. A cartridge that contains a number of logical volumes is referred to as a “stacked cartridge” since, conceptually, the multiple volumes are efficiently stacked end-to-end on the cartridge.
A typical VTS may contain thousands of stacked cartridges, many of which are of different formats so as to provide versatility within the VTS. As a method of managing the cartridges within a VTS, pooling may be used. As used herein, pools are logical groups of physical cartridges having common attributes. For example, one pool may logically group stacked cartridges of one specific tape format (e.g., 3590 media), another pool may be defined to logically group stacked cartridges of a different format (e.g., LTO media), and yet another pool may be defined to logically group unused or blank cartridges. By grouping the cartridges in this way, efficiencies can be gained by applications which depend on the properties of the cartridge. For example, examining the number of cartridges in a “blank pool” would indicate whether there are enough blank cartridges to accommodate the expected data storage needs of the VTS. Pools are typically embodied as data structures stored in memory of a VTS and include a list of the cartridges logically stored in each pool.
In addition to pooling, a process called “reclamation” is used to manage storage space on tape cartridges in a VTS. Generally, reclamation involves copying active data from a source cartridge to a destination cartridge and occurs when the active storage space on the source cartridge has reached some minimal threshold. Active data refers to data on a cartridge which the host has not expired. Inactive data on a cartridge refers to data which the host has expired. Data may be expired by a host when it is no longer needed or when the data has been superceded by an updated version of the data. A volume containing expired data is referred to as inactive data volume.
Over time, the amount of active data on a given cartridge may comprise only 10% of the total space on the cartridge, with the remaining 90% of the space comprising inactive data. The space consumed by the inactive data, however, is unusable and cannot be overwritten (this is because of the characteristics of tape media, once a tape is full of data, no additional data may be written to the tape). The inactive data space on a cartridge is typically spread throughout the cartridge, resulting in data space “holes” surrounded by active data. In order to reclaim the space consumed by the inactive data, the 10% of active data spread throughout the source cartridge is copied end-to-end to a destination cartridge, effectively squeezing out these “holes”. With only the active data now copied to another cartridge, the source cartridge is now available for storing data, and the source cartridge is said to have been “reclaimed”. As used herein, a “scratch cartridge” refers to a cartridge which has been reclaimed.
While known techniques of reclamation are available to manage storage efficiency, limitations exist. One limitation with respect to reclamation is that the implementation of reclamation is dependent upon the percentage of active data on a source cartridge falling below a predefined threshold. Thus, the only way to trigger the copying of data on a group of source cartridges to a group of destination cartridges is to examine the percentage of active data on a given source cartridge, and if it falls below a predefined threshold, mount the cartridge and migrate the data. This presents an efficiency problem in that not all data is expired by a host at the same rate or using the same criteria. This may result in a particular cartridge never falling below the specified threshold, yet have a relatively high percentage of inactive data. Since a VTS can contain thousands of tape cartridges, the percent of wasted space in a VTS can be significant.
Because of the amount of storage accessible within a VTS, as well as the different formats of storage, the efficient management of data and storage resources of a VTS is very challenging, even with the aid of pooling and reclamation. In addition to the limitations above, common difficulties associated with managing data in a VTS include efficient management of storage space on individual cartridges as well as accommodating for different cartridge formats within the VTS.
For example, a VTS may include a number of tape drives, each of which may require the use of a unique cartridge format. A difficulty arises if a user of the VTS wishes to consolidate all tape drives of VTS to a single tape drive format or to different formats. By consolidating to a single format, and/or switching to different formats, the user runs the risk of having a number of obsolete tape cartridges (e.g., not compatible with the new drive format). As a result, the data on the cartridges will be inaccessible, unless the data can be migrated to media compatible with the drives in the system. Unfortunately, there is no known way to efficiently migrate such data. A similar problem results for a user that desires to upgrade to a new drive format, which may require the use of new cartridges and migration of active data contained on incompatible cartridges.
These challenges and others are made more difficult for VTS systems which include thousands of tape cartridges. Unfortunately, known methods of migration require a user to identify, cartridge by cartridge, the source data to be migrated. This can be a time consuming, and often error-prone process. The down-time and errors may translate into real economic loss for a business relying on the accessibility and accuracy of the data. Additionally, known migration methods are limited in their ability to efficiently transfer data to one or more destination cartridges. The process typically involves manually identifying individual source cartridges one at a time, reading the data from the source cartridge and then writing the data to a destination cartridge. From all of the proceeding, it can be seen that there is a need for an efficient way to manage the data in a virtual tape server, including the management of data on cartridges, and the management of the cartridges themselves.
It has been discovered that by grouping tape cartridges into logical groups called pools, defining reclamation policies, and associating one or more of the reclamation policies with a particular pool, a process can be used to efficiently migrate data from one or more source cartridges to one or more destination cartridges, greatly improving the data management of a hierarchical storage system, such as a Virtual Tape Server (“VTS”). As used herein, migrating data can constitute copying data from a source to a destination if one or more conditions are satisfied. The present invention thus provides more storage space within the VTS, decreased cost associated with the management of the VTS and storage of data within the VTS, and improved efficiency in transferring data from one set of tape cartridges to another set of tape cartridges.
In one embodiment of the present invention, a method of migrating data from a first tape cartridge to a second tape cartridge is described. The method involves operations of obtaining a migration policy having a migration condition, determining whether at least one volume on the first tape cartridge satisfies the migration condition, and if so, copying the volume to a second tape cartridge. These operations are performed transparent to other applications. In another embodiment, the present invention may be implemented in a data storage system including a processor, a host interface coupled to the processor, and a memory unit coupled to the processor. The memory unit includes a storage management engine and a policy based migration engine. The policy based migration engine is configured to select a migration policy having a migration condition, and if data on a first removable storage media satisfies the migration condition, the data is migrated from the first removable storage media to a second removable storage media. In yet another embodiment, the invention may be implemented by a program of machine-readable instructions stored on a computer readable medium. The instructions are executable by a processor of a hierarchical data storage system to perform a method of migrating data from a first tape cartridge of the hierarchical data storage system to a second tape cartridge of the hierarchical data storage system as described herein.
For a better understanding of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings, in which:
Introduction
The management of tape cartridges in a Virtual Tape Server (“VTS”), and the data on such cartridges, is a challenging task. A VTS can contain thousands of tape cartridges, and the data on these tape cartridges must be efficiently spread across available resources. Within a VTS, it is often necessary to migrate data on the tape cartridges to other storage devices of the VTS to take advantage of the efficiencies provided by such other storage devices. Accordingly, the present invention groups tape cartridges into logical groups called pools and provides methods to efficiently transfer the data from one pool to another pool according to specific policies. This process is referred to herein as policy based migration. Depending on whether a given policy is satisfied, a reclamation process, for example, can be used to copy data from a source cartridge to a destination cartridge and reclaim the source cartridge. Using the reclamation process in this way provides a number of advantages, including the ability to operate on a group of cartridges via pools and the ability to execute the procedure with minimal impact to the VTS and/or any attached hosts (e.g., as a background process, at a time when no other resources need the system, transparent to the user and host applications, and the like). In so doing, more usable storage space within the VTS as well as decreased cost associated with the management of the VTS can be obtained. As used herein, “migration” is used to describe the copying data from a source cartridge to a destination cartridge for any number of reasons. For example, upgrading from an old tape format to a new tape format, transferring data to a format more tuned to the storage needs of the data, transferring data to lower cost media, and the like.
The following sets forth a detailed description of the best contemplated mode for carrying out the invention. The headings provided herein are intended to aid in the description of the present invention and are not intended to limit the scope of the present invention. The description herein is intended to be illustrative of the invention and should not be taken to be limiting.
An Exemplary Hierarchical Storage System
In operation, host 102 stores data to and requests data from VTS 100. In an exemplary implementation, host 102 may be embodied as a server, network attached storage device, personal computer, terminal, application program and the like. Control unit 104 exchanges data between host 102 and cache 106, and between host 102 and library 108. The exchanges are conducted in accordance with commands from host 102, such as tape commands. Control unit 104 exchanges data between cache 106 and tape drives 122 in accordance with commands from the control unit 104. Control unit 104 may be implemented by the execution of software on a microprocessor (e.g., a RISC based processor, INTEL-based processor, or other instruction based processor). Control unit 104 and cache 106 may be embodied, for example, in an IBM model 3494 model B20 Virtual Tape Server.
Control unit 104 directs operations of library manager 112. In one embodiment, control unit 104 receives commands from host 102 and, in turn, issues commands to library manager 112 to carry out the host commands. In response to such commands, data may be transferred between hosts 102 and cache 106, between host 102 and tape cartridges 120, and/or between cache 106 and tape cartridges 120. In the presently described embodiment, control unit 104 is implemented as computer 200 (shown and described in
Cache 106 may comprise DASD 110 configured in one or more storage forms, such as redundant arrays of inexpensive disks (i.e., RAID). Cache 106 provides a fast-access data storage location for data utilized by host 102. In operation, host-created volumes of data are received from host 102 and “stacked” (i.e., stored) in cache 106. These volumes are then copied to physical tape cartridges 120 of tape library 108, either immediately (e.g., within fractions of a second), or upon some predetermined criteria, such as access frequency. In one embodiment, host 102 views (e.g., uses tape related protocols to communicate with) the storage space provided by cache 106 as a number of tape devices, when in actuality, the storage space is comprised of DASD. Because host 102 sees cache 106 as tape drives, host 102 can operate on data stored in cache 106 (and library 108) via tape commands. The interaction between host 102 and tape drives 122 of VTS 100 occurs through control unit 104.
Control console 130 is coupled to control unit 104 via serial, TokenRing, Ethernet, USB or other known communication interface. In one embodiment, control console 130 provides a user interface for setting up policies and monitoring the activities of the control unit 104 and the exemplary hierarchical storage system 100.
Automated tape library 108 comprises hardware, software, and interconnections to manage the storage of data on removable media. In the presently described embodiment, removable media consists of tape cartridges 120. However, in other embodiments removable media may consist of optical media and/or other media adapted to be removable within library 108. Tape cartridges 120 are stored in storage area 114, having storage bins 116. An accessor 118, having a robotic arm 124, selectively transfers tapes 120 to/from bins 116 from/to tape drives 122 for reading and writing of data on tapes 120 by tape drives 122 (accessor 118 with robotic are 124 may also be referred to as a gripper). One of ordinary skill in the art will recognize that accessor 118 and robotic arm 124 may be implemented any number of ways to provide a mechanical (or robotic) device to transport cartridges. In one exemplary implementation, library 108 may be embodied as an IBM 3494 tape library including IBM 3590, 3592 and/or LTO tape drives to access data on associated tapes. As mentioned above, library 108 includes library manager 112 to manage operations of library 108. In the presently described embodiment, library manager 112 is embodied as executable code stored on memory (not shown) of library 108 and configured to execute on one or more processors (not shown) of library 108.
Turning now to a more detailed description of control unit 104,
Memory unit 204 may include a local cache or random access memory (not shown) and/or a nonvolatile memory (not shown). Memory unit 204 may be used to store programming instructions executed by processor 202. For example, memory unit 204 includes storage management engine 208 and policy based migration engine 210. In the presently described embodiment, each of storage management engine 208 and policy based migration engine 210 are implemented in software. Storage management engine 208 manages cache 106 and the volumes stored therein. In addition, storage management engine 208 controls the movement of data between cache 106 and tape cartridges 120. In one embodiment of the present invention, storage management engine 208 can be implemented by IBM's Tivoli Storage Manager.
Policy based migration engine 210, which embodies techniques of the present invention in software form, provides techniques to efficiently manage data storage cartridges 120. As described above, a hierarchical data storage system such as system 100 may comprise thousands of tape cartridges of various formats storing various types of data. In such an environment, it becomes critical to be able to efficiently manage the storage provided by the tape cartridges as well as provide an efficient migration process to migrate data from the existing tape cartridges to newer and/or different formats of tape cartridges, for example. To address these needs, the present invention provides techniques to efficiently manage data on tape cartridges 120. These techniques, described in detail below with reference to
In the presently described embodiment, policy based migration engine 210 may be embodied in machine-readable instructions executed by processor 202. The machine-readable instructions may reside on a programmed product comprising signal-bearing media tangibly embodying a program of machine-readable instructions executable by processor 202 to perform method of computation, store or access data, and the like. The signal bearing media may comprise, for example, RAM of memory unit 204. Alternatively, the instructions may be stored in another signal-bearing media, such as ROM 212, diskette, magnetic storage device, optical storage device, or other signal-bearing media including transmission signals such as physical and/or wireless communication links. In the presently described embodiment, the machine readable instructions comprise C language code. It will be recognized that while storage management engine 208 and policy based migration engine 210 are described as implemented in software, each may also be implemented in hardware, a combination of software and hardware, or other compatible media capable of executing the techniques described herein.
One of ordinary skill in the art will recognize that computer 200 may be implemented in a computer having fewer or more components than computer 200. For example, all or part or memory unit 204 may be included on processor 202.
Exemplary Policy Based Migration
Initially in configuring a system for policy based migration, a source pool is selected on which the migration policy is to act (operation 302). The source pool is a logical group of cartridges that are to be reclaimed according to a defined migration policy. Next, a migration policy is selected (operation 304). The migration policy sets the criteria which triggers a reclamation process to initiate the copy of data from a source cartridge to a destination cartridge. In one embodiment of the present invention, the reclamation policies include one or more of a “percent of active data” policy, a “time since last access” policy, a “time since last data written” policy, and a “rate of expiration of data” policy.
The “percent of active data” policy is used to reclaim a cartridge when the amount of data on the active data volumes on a cartridge falls below a pre-defined percentage of the overall data on the cartridge when the cartridge was full. The “time since last access” policy is used to reclaim a cartridge when a pre-defined period of time has elapsed since data on the cartridge was accessed (data on a cartridge is accessed when a host requests the data associated with a volume, the cartridge containing the volume is loaded on a tape drive 122 and one or more data records are read from the cartridge). The “time since last data written” policy is used to reclaim a cartridge when a pre-defined period of time has elapsed since data was last written on the cartridge. The “rate of expiration of data” policy is used to reclaim cartridges when a pre-defined period of time has elapsed since a portion of the data on a cartridge became expired.
Following selection of one or more of the policies, parameters associated with the selected migration policy are defined (operation 306). For the “percent of active data” policy, a percentage is defined. For the “time since last access”, “time since last data written” and “rate of expiration of data” policies, a period of time is defined. That period of time can be in seconds, hours, days or another suitable measure of time. For the “rate of expiration of data” policy, a minimum percentage of active data on the volume can be defined as well.
Next, a target pool is defined (operation 308). The target pool consists of those cartridges which are to receive the active data volumes from the cartridges of the source pool when the migration policy is executed and necessary conditions are satisfied. If there are other source pools for which a migration policy is to be defined (decision block 310), the operations 302-308 are repeated. Otherwise, the definition of the reclamation policies is complete and reclamation evaluations may be performed by the policy based migration engine 210.
The evaluation of the reclamation policies may begin by many methods. It may be continuous once the policies have been established or be started based on other criteria. For example, the exemplary policy based migration engine 210 may perform evaluations for reclaimable cartridges periodically, such as an hourly basis, or when processing cycles are available for reclaim or when the number of available scratch cartridges falls below a threshold or other methods know to those skilled in the art. Using such a process, at periodic intervals, policy based migration engine 210 would evaluate each cartridge in a given pool to determine whether the migration conditions are satisfied. If the migration conditions were satisfied, policy based migration engine 210 would initiate the migration of data from that cartridge to a cartridge in the associated destination pool. The source cartridge would then be available as a scratch cartridge, and the process would continue for the remaining cartridges within the pool. This process is described in more detail below.
Reclamation involves evaluating cartridges in an automated tape library 108 to determine if one or more cartridges in the library are eligible for reclaim. If a cartridge within the library is eligible for reclaim, the active data volumes of that cartridge are eligible to be copied to a destination cartridge within a target pool. Accordingly, in operation 402, a first tape cartridge within the library is selected and the migration policy defined for the pool the cartridge is obtained (operation 404). Next, the policy based migration engine 210 determines whether or not the cartridge is eligible for reclaim according to the obtained migration policy (decision block 406).
If the cartridge is eligible for reclaim, the process continues to operation 408, were the cartridge is reclaimed (“Yes” branch of decision block 406 and operation 408). In being reclaimed, all active data volumes are migrated from the source cartridge to a destination cartridge with available space in the target pool. The active data volumes are placed end to end, efficiently using the storage space on the cartridge in the target pool. Until the cartridge in the target pool becomes full, data from other reclaimed cartridges can be placed on it as well. When the cartridge has been reclaimed, the process continues to the other cartridges in the library not yet evaluated for reclaim, if any. (“Yes” branch of decision block 410, and operation 412). If, however, the cartridge is not eligible for reclaim, the process continues to check the other cartridges in the library, if any (“No” branch of decision block 406, “Yes” branch of decision block 410 and operation 412). Once all of the cartridges in the library have been checked for eligibility of reclamation (and reclaimed accordingly) (operations 404-412), the process ends. A more detailed description of each migration policy is now provided.
In one embodiment of the present invention, the cartridges selected for reclamation evaluation (operations 402 and 412) are selected alphanumerically by their volume serial number. Alternatively, all cartridge selection may occur on a pool by pool basis. Once cartridges in the first pool have been evaluated, cartridges from another pool can be selected for evaluation. Those skilled in the art will recognize that there are many possible criteria for selecting cartridges for evaluation without departing from the scope of the present invention.
Percent of Active Data Migration policy
The “percent of active data” migration policy performed by policy based migration engine 210 is described with reference to
In operation, a “percent of active data” policy is defined for pool 502 and as part of that definition, pool 504 is defined as the target pool. In the presently described embodiment, source pool 502 and target pool 504 contain high capacity cartridges, for example cartridges capable of storing 60 GBs of data. In one embodiment of the present invention, pools 502 and 504 are defined with storage management software (e.g., storage management engine 208). In accordance with the present invention, a cartridge 506 is selected (operations 402 or 412) and the policy assigned for the cartridge is the “percent of active data” policy. Following this assignment, the policy based management software (e.g., policy based management engine 210 of
In the present embodiment, a cartridge 506 is eligible to be reclaimed under the “percent of active data” policy if the amount of data on the active data volumes currently on the cartridge relative to the full capacity of the cartridge falls below a pre-defined value (“Yes” branch of the decision block 406). The pre-defined value, for example, may be anywhere in the range from 1 to 99 percent of the storage capacity of the cartridge. When a cartridge contains an amount of inactive data, it is likely to be intermixed with active data and the efficiency of the storage for the cartridge is reduced. Reclaiming the cartridge transfers only the active data volumes to a tape cartridge 512, placing the active data volumes end to end, efficiently using the storage space on the tape cartridge 512, and at the same time, reclamation will provide an empty cartridge 506 to store new data.
In determining whether data on cartridge 506 is in need of reclamation, an actual amount of data stored on each cartridge 506 at full capacity is maintained (e.g., maintained in memory unit 204) and a current percentage of active data is calculated based on the amount of data on the current active data volumes and the actual amount of data stored when full and is compared to the pre-defined percentage (decision block 406). If the current percentage of active data on cartridge 506 is less than the pre-defined percentage, the data on cartridge 506 is eligible for reclamation, resulting in the active data volumes being moved to archival cartridge 512 (“Yes” branch of decision block 406 and operation 408). If however, the current percentage of active data on cartridge 506 is greater than or equal to the pre-defined percentage, then the data on cartridge 506 is not eligible to be reclaimed and the active data volumes remain on the cartridges 506 (“No” branch of decision block 406). In one embodiment of the present invention, the actual amount of data stored on a cartridge when the cartridge is full is recorded by storage management engine 208 in memory unit 204 whenever the storage management engine 208 fills the cartridge to capacity. However, one of ordinary skill in the art will recognize that other methods of obtaining and storing the actual amount of data stored for a cartridge can be implemented. In addition, simply using the maximum capacity for the cartridge can provide a usable value.
Once it has been determined that data of a cartridge 506 is eligible for reclamation, the data is migrated to a cartridge having the desired characteristics to store the data (operation 408). In furtherance of this, each volume with active data is copied to available space on cartridges 512 of pool 504 (operation 408). Referring to
While the presently described embodiment of the “percent of active data” policy is described as above, one of ordinary skill in the art will recognize that the present invention can be extended. For example, the present embodiment does not limit the copying of active data volumes to only one pool 504 but may be to a number of cartridges contained in a number of pools.
Time Since Last Access Migration policy
The “time since last access” migration policy performed by policy based migration engine 210 is now described in accordance with the present invention. In the presently described example, it is desirable to manage the data in a hierarchical data storage system (e.g., system 100) to account for data needing to be accessed relatively quickly as well as data needed to be stored for a lengthy period of time. Some tape cartridge formats provide for relatively fast access of data on the cartridge, while others are designed more for long term storage of data. Generally, there are cost differences between these formats. Accordingly, data performance and cost savings can be gained by efficiently managing the data stored on the various cartridges. Accordingly, a “time since last access” migration policy is defined. In general, the “time since last access” policy addresses the management of data that, when created and for sometime thereafter, has a relatively high likelihood of being accessed by a host and for which access time is important. Accordingly, it is desirable that the data be stored initially on a cartridge having a relatively fast access time. However, at some point after the creation and writing of the data, access to the data may be less frequent. Consequently, the fast access to the data may not be desired, and the data may be transferred to a cartridge having a slower access time, and possibly lower cost. As such, the present invention allows for migration of the infrequently accessed data from cartridges 506 to cartridges 512. For clarity of explanation, the “time since last access” policy is explained in reference to
In operation, a “time since last access” policy is defined for pool 502 and as part of that definition, pool 504 is defined as the target pool. Source pool 502 contains fast access type storage cartridges, for example cartridges having a typical access time of 20 seconds or less. Target pool 504 includes archival type data cartridges, for example a cartridge capable of storing 300 GB of data or more for an extended period of time (e.g., decades). Typically, the archival type cartridges have relatively slower access times (e.g., 100 seconds). In one embodiment of the present invention, pools 502 and 504 are defined with storage management software (e.g., storage management engine 208). In accordance with the present invention, a cartridge 506 is selected (operations 402 or 412) and the policy obtained for the cartridge is the “time since last access” policy (operation 404). The policy based management software (e.g., policy based management engine 210 of
In determining whether data on cartridge 506 is in need of reclamation, an actual last access time to data on each cartridge 506 is maintained (e.g., in memory unit 204) and the difference between the current time and the actual last access time is compared to the pre-defined period of time (decision block 406). If the difference between the current time and the actual last access time for cartridge 506 is greater than or equal to the pre-defined period of time, the data on cartridge 506 is not frequently accessed and is eligible to be reclaimed, resulting in the active data volumes being moved to archival cartridge 512 (“Yes” branch of decision block 406 and operation 408). If however, the difference between the current time and the actual access time for cartridge 506 is less than the pre-defined period of time, then the data on cartridge 506 is considered frequently accessed and is not eligible to be reclaimed and the active data volumes remain on the fast access cartridges (“No” branch of decision block 406). In one embodiment of the present invention, the actual last access time is recorded by storage management engine 208 in memory unit 204 whenever a host 102 accesses data on cartridges 506. However, one of ordinary skill in the art will recognize that other methods of obtaining and storing the last access time of a cartridge can be implemented. In addition, last access times for the individual volumes stored on the cartridge 506 could also be stored and used in determining if the cartridge is eligible for reclaim.
Once it has been determined that data of a cartridge 506 is eligible for reclamation, the data is migrated to a cartridge having the desired characteristics to store the data (operation 408). In furtherance of this, each volume with active data is copied to available space on cartridges 512 of pool 504 (operation 408). Referring to
While the presently described embodiment of the “time since last access” policy is described as above, one of ordinary skill in the art will recognize that the present invention can be extended. For example, the present embodiment can be extended to cover the identification and copying of individual volumes from cartridges 506 to cartridges 512. Additionally, copying is not limited to targets of one pool 504 but may be to a number of cartridges contained in a number of pools.
Time Since Last Data Written Migration Policy
It is desirable to manage the long time archival of data in a hierarchical data storage system (e.g., system 100). Accordingly, a “time since last data written” migration policy performed by policy based migration engine 210 is described in accordance with the present invention. In general, the “time since last data written” policy addresses the management of data that was written to cartridge 506 for long term retention. However, cartridges having improved storage capacity, improved retention time, less cost, and the like may be introduced into the market. Consequently, it would be advantageous to migrate the data from the older technology cartridges to cartridges of newer technology. In the least, the migration would improve the reliability of the storage of data within the hierarchical data storage system, while possibly decreasing the total cost of ownership of the system at the same time. For clarity of explanation, the “time since last data written” policy is described with reference to
In operation, pools 502 and 504 are defined as the source and target pools, respectively for the “time since last data written” policy. Source pool 502 contains cartridges designed for long term storage of data, for example IBM 3590 model E1A K media cartridges. Target pool 504 includes cartridges having improved long term storage characteristics as compared to cartridges 506, for example IBM 3592 model J1A JA media cartridges. In one embodiment of the present invention, pools 502 and 504 are defined with storage management software (e.g., storage management engine 208). In accordance with the present invention, a cartridge 506 of pool 502 is selected (operations 402 or 412) and the policy obtained for the cartridge is the “time since last written” policy (operation 404). The policy based management software (e.g., policy based management engine 210 of
In determining whether data on cartridge 506 is in need of reclamation, an actual time since last data written to each cartridge 506 is maintained (e.g., in memory unit 204) and the difference between the current time and the actual last time since data written time is compared to the pre-defined period of time (decision block 406). If the pre-defined period of time has elapsed since the last data was written to cartridge 506, it is assumed that long term storage of the volume is desired and, consequently, the active data volumes on cartridge 506 should be stored on cartridges having preferable long term storage characteristics (“Yes” branch of decision block 406 and operation 408). If however, the pre-defined period of time has not elapsed since the last data was written to cartridge 506, then it is not necessary to transfer the active data volumes on cartridge 506 to another cartridge. In one embodiment of the present invention, the actual time since last data written is recorded by storage management engine 208 in memory unit 204 whenever a host 102 writes data on cartridge 506. However, one of ordinary skill in the art will recognize that other methods of obtaining and storing the last since last data written of a volume can be implemented. In addition, the time when data was last written for the individual volumes stored on the cartridge 506 could also be stored and used in determining if the cartridge is eligible for reclaim.
Referring to
When all active data volumes of cartridge 506(2) have been copied, the cartridge 506(2) will be eligible for use to store new data or can be removed from the library. In one embodiment of the present invention, the policy based migration software examines pool 502 at a time initiated by a user (e.g., upon the installation of tape drives and tape cartridges having improved storage characteristics the user will want to migrate the data from the older cartridges to the newer cartridges, and use the new cartridges for long term storage).
While the presently described embodiment of a “time since last data written” migration policy is described as above, one of ordinary skill in the art will recognize that the present invention can be extended. For example, the present embodiment can be extended to cover the identification and copying of a single active data volume from cartridges 506 to cartridges 512. Additionally, copying is not limited to targets of one pool 504 but may be to a number of cartridges contained in a number of pools.
Rate of Expiration of Data Migration Policy
The “rate of expiration of data” migration policy performed by policy based migration engine 210 is now described in accordance with the present invention. For aid in explanation of the policy, the following description refers to
In the presently describe example, it is desirable to maximize the storage efficiency of data cartridges 506 of the system (e.g., system 100). Accordingly, a “rate of expiration of data” migration policy is defined. In general, the “rate of expiration of data” policy addresses the management of data intended for long term storage but initially written to a data cartridge that also has short term storage data written on it. For example, some of the data volumes 508 on cartridges 506 contain short term type data that generally expires a few weeks after being written. However, it is often the case that other data volumes which must be stored longer than a few weeks may also be written to cartridge 506. If most of the data volumes are of the short term storage type, the data generally expires within a few weeks, and the cartridge is reclaimed and used for additional short term storage. However, when using a migration policy such as “percent of active data”, the presence of the long term data volumes can prevent the reclamation of the cartridge until a portion of the long term data has expired as well. Consequently, it is advantageous to reclaim cartridges with long term storage data written on them after the short term storage data has expired, so the storage space of the cartridges can be reclaimed and the cartridges can be reused.
In operation, pools 502 and 504 are defined as the source and target pools, respectively with a “rate of expiration of data” policy. Source pool 502 contains cartridges designed for long term storage of data, for example IBM 3590 model E1A K media cartridges. Target pool 504 includes cartridges 512 having improved long term storage characteristics as compared to cartridges 506. In one embodiment of the present invention, pools 502 and 504 are defined with storage management software (e.g., storage management engine 208). In accordance with the present invention, a cartridge 506 of pool 502 is selected (operations 402 or 412) and the policy obtained for the cartridges is the “rate of expiration of data” policy. The policy based management software (e.g., policy based management engine 210 of
In determining whether data on cartridge 506 is in need of reclamation, an actual last time of expiration for each cartridge 506 is maintained (in memory unit 204 for example). If the pre-defined time set by the user has elapsed since the actual last expiration time, the cartridge 506 is eligible for reclaim (“Yes” branch of decision block 406). If however, the pre-defined time has not elapsed since the actual last expiration time for the cartridge 506, then the cartridge 506 is not eligible for reclaim. In one embodiment of the present invention, the actual last time of expiration is recorded by storage management engine 208 in storage unit 204 whenever a host 102 expired the data associated with one of the volumes 510 on cartridge 506. However, one of ordinary skill in the art will recognize that other methods of obtaining and storing the last time data associated with the cartridge was expired can be implemented.
Once it has been determined that data of a cartridge 506 is eligible for reclamation, the data is migrated (operation 408). In furtherance of this, each volume having active data is copied to available space on cartridges 512 of pool 504 (operation 408). Referring to
When all active data volumes of a cartridge 506(2) have been copied, the cartridge 506(2) will be eligible for use to store new data. In another embodiment of the present invention, in addition to determining if the pre-defined time period has elapsed since the last expiration of data on the cartridge, the amount of active data remaining on the cartridge 506(2) can be considered in the determination if the cartridge is eligible for reclamation. It is preferable that the active data on a cartridge 506 fall below the pre-defined threshold and that the pre-defined time has elapsed since data on the cartridge was expired for the cartridge to be reclaimed. This is preferable to prevent a cartridge from being needlessly reclaimed repeatedly when it contains only long term data.
While the presently described embodiment of the “rate of expiration of data” policy is described as above, one of ordinary skill in the art will recognize that the present invention can be extended. For example, the present embodiment can be extended to include expiration of records or groups of records of data or the identification and copying of a single active data volume from cartridges 506 to cartridges 512. Additionally, copying is not limited to targets of one pool 504 but may be to a number of cartridges contained in a number of pools.
While the descriptions above have been provided in relation to the examination of cartridges and data on cartridges, other techniques of evaluating data for reclaim may be used. For example, the relevant data associated with the cartridges may be stored as records in a database. Such an exemplary database is described below with reference to
Exemplary Database
In operation (e.g., of the techniques described in
Combination of Policies
While the presently described embodiment of each of the policies, “percent of active data”, “time since last access”, “time since last data written” and “rate of expiration of data” are described individually, one of ordinary skill in the art will recognize that in examining a cartridge 506 to determine its eligibility for reclaim, a combination of the policies can be used. For example, a cartridge 506 could be evaluated for both the “percent of active data” and “time since last data written” policies and if either criterion for reclaim is satisfied, the cartridge 506 would be reclaimed. In addition, instead of examining each cartridge and reclaiming it if eligible, the examination could be done separate and apart from the actual reclamation, resulting in a list of cartridges to be reclaimed. The reclaim step could further determine the order in which the volumes are reclaimed based on criteria such as reclaiming first those cartridges that have the smallest amount of active data on them to move, and/or first reclaiming cartridges of a type that are needed to store new data, and/or first reclaiming cartridges of a type which contain data having a high level of priority and importance. Furthermore, the migration of data is not limited to tape cartridges but may also include the migration of data from a tape cartridge to another storage device such as DASD, optical media, flash memory, combinations thereof, and the like. Moreover, while the present invention has been described with respect to a VTS system, one of ordinary skill in the art will recognize that the present invention can be implemented in other systems, including an automated tape library.