The present invention relates generally to systems and methods for reading data from, and writing data to, serially accessible storage media such as tape and, more particularly, to systems and methods for storing multiple copies of data on different locations either on the same serially accessible storage media or on a different serially accessible storage media.
There are various types of media used for the storage of data. Each media type has particular characteristics that typically dictate the environment/application that it is best suited for. For example, disk media is typically used for real-time data storage when fast access to a particular location on the media is required. Tape media, and in particular magnetic tape, is typically used for off-line data storage of large amounts of data such as a backup or archive copy of data. Disk media is relatively expensive when compared to other types of media such as tape. Tape has the disadvantage of a relatively slow access time, when compared to disk, as the tape is wound about a reel and must be accessed serially by either forwarding or rewinding the tape to the desired location for reading/writing data. It would be desirable to improve the access time for tape media such that the cost benefit of tape could be used in more types of environments/applications that traditionally use disk (with its associated faster access time and lower latency). In addition, with appropriate management of the data, the technique for improving serially accessible media's access time can provide the additional benefit of data redundancy.
As seen in
U.S. Pat. No. 6,061,194 describes a technique for writing duplicate data at a fixed azmith angle from the original data on a disk platter, in order to reduce rotational latency when reading the data. This duplicate data is written on the same platter as the original data, and the media is relatively expensive when compared to tape.
It would be advantage to provide a technique for improving access time for serially accessible storage media, and to improve data redundancy in a storage system having such media. Examples of serially accessible media include magnetic tape, optical tape, and charge coupled device (CCD) shift registers.
A system and method for reducing the access time in a storage system having serially accessible media. One or more duplicate copies of data are maintained at different offset locations on serial media, which in the preferred embodiment is tape (magnetic or optical). When a request is made to read the data, a determination is made as to which copy of the data—either the original data or one of the duplicate copies—will have the shortest access time for accessing the data. Generally, this would be the data copy that will be closest to the data transducer when the tape is positioned for access, such as a tape cartridge being loaded in a tape drive. Once the tape is ready to be accessed, the tape is positioned to access the copy of the data that is in closest linear proximity with the reading transducer. Thus, the copy of the data having the lowest access latency is chosen to satisfy the particular I/O request.
In one embodiment, the duplicate data is located at a different offset location than the original data on the same tape media.
In an alternate embodiment, the duplicate data is located at a different offset location than the original data on a different tape media.
In yet another embodiment, multiple duplicate copies of the original data are maintained at a plurality of differing offset locations, either on the same media, some on the same media and some on different media, or all on different media.
The invention will be readily understood by the following detailed description in conjunction with the accompanying drawings, wherein like reference numerals designate like structural elements, and in which:
A dual reel cassette such as that shown in
In accordance with the present invention, one or more duplicate copies of data are maintained at different offset locations on serially accessible media. When a request is made to read the data, a determination is made as to which copy of the data will be closest to the data transducer when the tape is positioned for access, such as by being loaded in a tape drive. In other words, a determination is made as to which copy of the data will have the lowest access latency. This is the data copy that is used to satisfy the I/O request. Once the tape is ready to be accessed, the tape is positioned to access this data. In one embodiment, the duplicate data is located at a different offset location than the original data on the same tape media. In an alternate embodiment, the duplicate data is located at a different offset location than the original data on a different tape media. In yet another embodiment, multiple duplicate copies of the original data are maintained at a plurality of differing offset locations, either on the same media, some on the same media and some on different media, or all on different media.
In the preferred embodiment, if original data is written to a tape location in first zone 246, a duplicate copy of the data is written to a tape location in second zone 248, either on the same tape or on a tape in a different cartridge. This provides an overall reduction in average data access time for subsequent data access, as will now be illustrated with reference to
Assume that
By extension, any number of intermediate portions between the first end and the second end can be defined, thus partitioning the tape into any number of different zones to accommodate the situation where a plurality of duplicate copies of data are to be copied onto the tape. For example, a three zoned (254, 256, 258) tape for accommodating a system that maintains an original copy and two duplicate copies of data is shown in
The technique for determining which copy of data has the lowest latency will now be described. Referring to
Head access point (HAP) 302 is the tape location that will be adjacent to or in contact with the transducer when the tape is first loaded into a tape drive. For example,
Referring now to
In
As can be seen by the examples shown in
For example, in
|(head access point)−(zone offset+data offset w/in zone)|
For the example shown in
The access time for original data block D1 in
D1 access time=|(head access point)−(zone offset+data offset w/in zone)|
=|(4)−(0−1)|
=3
The access time for duplicate data block DA in
DA access time=|(head access point)−(zone offset+data offset w/in zone)|
=|(4)−(13+1)|
=16
In this case, original data block D1 would be chosen to satisfy the data I/O request for the scenario in
For the example shown in
D1 access time=|(head access point)−(zone offset+data offset w/in zone)|
=|(11)−(0+1)|
=10
The access time for duplicate data block DA in
DA access time=|(head access point)−(zone offset+data offset w/in zone)|
=|(11)−(13+1)|
=3
In this case, duplicate data block DA would be chosen to satisfy the data I/O request for the scenario in
For the example shown in
D1 access time=|(head access point)−(zone offset+data offset w/in zone)|
=|(17)−(0+1)|
=16
The access time for duplicate data block DA in
DA access time=|(head access point)−(zone offset+data offset w/in zone)|
=|(17)−(13+1)|
=3
In this case, duplicate data block DA would be chosen to satisfy the data I/O request for the scenario in
For
For
For
For
For
Typically, the determination of how many duplicate copies of data are to be maintained in a given tape system environment, and hence the number of zones that are needed to be established, are part of a system initialization or set-up process, and are not modified on a regular or frequent basis. Thus, the zone size/offset parameters tend to be somewhat static in value. However, the head access point value would typically change each time a cartridge completes a drive load/unload sequence. Because of the dynamic changing of this value, it is preferable to maintain the tables shown in
As can be seen from the organization of the tables shown in
Tables 3-1 through 3-18 show the assumptions, data and calculated results used for generating the graph shown in
Tables 3-1 through 3-4 show various parameters of a two (2) zone layout with 1800 blocks/track. The D1 Zone Offset is 0, and the DA zone offset is in the middle of the tape at offset 900. The D1 Data offset is at 2 (within zone D1), and the DA data offset is at 2 (within zone DA). The tables shows which copy of data is selected for various head access points (HAP) in this two zone layout. As can be seen in Table 3-1, which shows HAP 0 through HAP 46, the D1 copy of data is selected as it has the smallest access time. Table 3-2 shows HAP 407 through HAP 466, and also shows the transition point (which is circled) where the DA copy of data begins to be selected for HAP greater than 452. Table 3-3 shows that the DA copy of data continues to be selected, and also shows the instance where the HAP 902 coincides with the DA data copy (i.e. where the DA access time is zero, as shown by the table entry highlighted by arrows). Table 3-4 shows HAP 1787 through 1799, where the DA copy of data continues to be selected. It can be seen that the average access time for the selected data is 336.25 units of time, which is less than one half the average access time if only copy D1 where selected (i.e. not taking advantage of selecting the duplicate copy).
Tables 3-5 through 3-9 show various parameters of a three (3) zone layout with 1800 blocks/track. The D1 Zone Offset is 0, the DA zone offset is ⅓ of the way from the beginning of the tape at offset 600, and the DB zone offset is ⅔ of the way from the beginning of the tape at offset 1200. The D1 Data offset is at 2 (within zone D1), the DA data offset is at 2 (within zone DA), and the DB data offset is at 2 (within zone DB). These tables show which copy of data is selected for various head access points (HAP) in this three zone layout. As can be seen in Table 3-5, which shows HAP 0 through HAP 48, the D1 copy of data is selected as it has the smallest access time. Table 3-6 shows HAP 289 through HAP 348, and also shows the transition point (which is circled) where the DA copy of data begins to be selected for HAP greater than 302. Table 3–7 shows that the DA copy of data continues to be selected, and also shows the instance where the HAP 602 coincides with the DA data copy (i.e. where the DA access time is zero, as shown by the table entry highlighted by arrows). Table 3-8 shows HAP 889 through HAP 948, and also shows the transition point (which is circled) where the DB copy of data begins to be selected for HAP greater than 902. Table 3-9 shows that the DB copy of data continues to be selected, and also shows the instance where the HAP 1202 coincides with the DB data copy (i.e. where the DB access time is zero, as shown by the table entry highlighted by arrows). Table 3-10 shows HAP 1789 through 1799, where the DB copy of data continues to be selected. It can be seen that the average access time for the selected data is 199.17 units of time, which is less than one quarter the average access time if only copy D1 where selected (i.e. not taking advantage of selecting the duplicate copy).
Tables 3-11 through 3-18 show various parameters of a four (4) zone layout with 1800 blocks/track. The D1 Zone Offset is 0, the DA zone offset is ¼ of the way from the beginning of the tape at offset 450, the DB zone offset is ½ of the way from the beginning of the tape at offset 900, and the DC zone offset is 1 of the way from the beginning of the tape at offset 1350. The D1 Data offset is at 2 (within zone D1), the DA data offset is at 2 (within zone DA), the DB data offset is at 2 (within zone DB), and the DC data offset is at 2 (within zone DC). These tables show which copy of data is selected for various head access points (HAP) in this four zone layout. As can be seen in Table 3-11, which shows HAP 0 through HAP 48, the D1 copy of data is selected as it has the smallest access time. Table 3-12 shows HAP 169 through HAP 228, and also shows the transition point (which is circled) where the DA copy of data begins to be selected for HAP greater than 227. Table 3-13 shows that the DA copy of data continues to be selected, and also shows the instance where the HAP 452 coincides with the DA data copy (i.e. where the DA access time is zero, as shown by the table entry highlighted by arrows). Table 3-14 shows HAP 649 through HAP 708, and also shows the transition point (which is circled) where the DB copy of data begins to be selected for HAP greater than 677. Table 3-15 shows that the DB copy of data continues to be selected, and also shows the instance where the HAP 902 coincides with the DB data copy (i.e. where the DB access time is zero, as shown by the table entry highlighted by arrows). Table 3-16 shows HAP 1069 through HAP 1128, and also shows the transition point (which is circled) where the DC copy of data begins to be selected for HAP greater than 1127. Table 3-17 shows that the DC copy of data continues to be selected, and also shows the instance where the HAP 1352 coincides with the DC data copy (i.e. where the DC access time is zero, as shown by the table entry highlighted by arrows). Table 3-18 shows HAP 1789 through 1799, where the DC copy of data continues to be selected. It can be seen that the average access time for the selected data is 140.00 units of time, which is less than one sixth the average access time if only copy D1 where selected (i.e. not taking advantage of selecting the duplicate copy).
The previous analysis was based upon random HAPs, where the tape is left in its final position after completion of a tape access operation. It may be desirable to pre-bias to either a supply-reel biased state or a take-up reel biased state after completion of a previous tape access operation, such that the media is maintained in a known state. This would allow for further reductions in data access times. In such a system, the duplicate copy of data is stored on a different cartridge, but not necessarily in a different zone. Instead, the differing cartridges are maintained in different biased states after a previous I/O access, such as writing original and duplicate data. Then, upon receipt of a subsequent I/O request, the cartridge containing the data with the lowest latency is chosen. Again, refer to
As one example, assume that the supply reel biased state is defined to be that all tape is on the supply reel—i.e. it is fully rewind after completion of access by a tape drive. This reel will be used to store the duplicate copy of the data item. The take-up reel biased state is defined to be that the tape is positioned to be half on the take-up reel and half on the supply reel (as shown in
Maintaining cartridges with biased load points can be extended to more than two cartridges. For example, a three cartridge system such as that shown in
In an alternate embodiment, a race situation is created where at least some of the plurality of media having a copy of the selected data are loaded into respective media drives, and the drive that is first to access the copy of the data provides such data to the requester. In this embodiment, it is preferable to store the duplicate copies of data in different zones on the respective media. A request is received from a requester to read data. A determination is made as to which of a plurality of serially accessible media contain a copy of the requested data. Some or all of media containing a copy of the requested data are loaded into respective media drives. The drives to seek to the copy of the data on their respective media, and the data copy is read. The media drive that is first to access the requested data is used to provide the data to the requestor.
The invention described herein is particularly useful when used in a media library system comprising a plurality of tape drives and media cells. Such a system, as shown at 920 in
Finally, it should be noted that the one or more duplicate copies of data that are maintained to reduce data latency are also available as a redundant copy of data that be can used in lieu of the original data in the event of data loss in the original data, borrowing from techniques used in a traditional data restoration operation.
Number | Name | Date | Kind |
---|---|---|---|
4796110 | Glass et al. | Jan 1989 | A |
5463758 | Ottesen | Oct 1995 | A |
5623471 | Prigge | Apr 1997 | A |
5883864 | Saliba | Mar 1999 | A |
6061194 | Bailey | May 2000 | A |
6662281 | Ballard et al. | Dec 2003 | B2 |