The present invention relates to the reduction in time required for reading a file in a technique for reading and writing data on a tape medium via a tape drive file system. More specifically, the present invention relates to a method of improving the reading speed of a file fragmented by editing (changing) through a tape drive file system.
An LTFS (Linear Tape File System) can be used, as a tape drive file system, together with a fifth-generation LTO (Linear Tape-Open) tape drive (LTO-5) and a fourth-generation IBM enterprise tape drive TS1140. In general, when data written on a tape medium (tape, medium) are read in order, the same level of reading performance as a HDD (Hard Disk Drive) is provided. For example, in the case of LTO-5, it is possible to read data at a transfer rate of 140 MB/sec.
However, when the data to be read are scattered on the tape medium, since it requires 30 seconds on average and more than a minute order at most to align (seek) each piece of data, the average reading transfer rate is remarkably lowered. Further, the tape drive performs processing that takes two to five seconds to rewind a tape medium, called back hitch, in order to perform positioning to data parts fragmented on the medium. The back hitch operation for rewinding a tape is required to read fragmented file data, and this is a factor to reduce performance.
Japanese Patent Application Publication No. 2006-164017 describes that data before being appended and written is rewritten (defragged) to the appended/written data side to reduce the required reading time when data is appended and written.
In one illustrative embodiment, a method is provided for updating a file written on a medium in a system including a tape drive connected to a host. The illustrative embodiment receives, from the host, a change data part that is changed in the file as an update target. The illustrative embodiment writes the change data part to a data end position of the file including a non-change data part that is not changed in the file sequentially stored on the medium. The illustrative embodiment calculates seek time required for positioning of a head of the tape drive to a medium position of the change data part when data of the file that is not updated is aligned sequentially. The illustrative embodiment copies the change data part in an external storage device when the seek time is more than or equal to a predetermined value.
According to the above-mentioned method to which the present invention is applied, the reading speed of a file updated through the LTFS can be improved.
In other illustrative embodiments, a computer program product comprising a computer useable or readable medium having a computer readable program is provided. The computer readable program, when executed on a computing device, causes the computing device to perform various ones of, and combinations of, the operations outlined above with regard to the method illustrative embodiment.
In yet another illustrative embodiment, a system/apparatus is provided. The system/apparatus may comprise one or more processors and a memory coupled to the one or more processors. The memory may comprise instructions which, when executed by the one or more processors, cause the one or more processors to perform various ones of, and combinations of, the operations outlined above with regard to the method illustrative embodiment.
These and other features and advantages of the present invention will be described in, or will become apparent to those of ordinary skill in the art in view of, the following detailed description of the example embodiments of the present invention.
An embodiment for writing and reading a file to change the file in a storage system including a tape drive connected to a host will be described below. In the present invention, a file system detects that the position of the file on a tape medium is fragmented. In the embodiment, a part of the fragmented file is stored in a cache to reduce the time required for reading the file written on the tape medium in a fragmented manner.
The present invention is directed to a case where an application writes only a part of a file to update the file written on a tape medium. In other words, when a certain file is written dividedly to multiple areas on the tape medium, a part written upon updating (change data part) is stored in a cache other than the tape. The file part written upon updating (change data part) is recorded in an IP (Index Partition), and upon mounting the tape medium, a tape drive file system (LTFS) reads in advance file fragments recorded in the IP and copies the file fragments to a memory. Further, if the LTFS also holds the file fragments in an external cache such as a disk, the time required for reading the fragmented file can be reduced.
In a tape drive file system such as LTFS, data written on a medium can be shown as a file. When the tape medium is written by the user using the LTFS, meta-information called an index file (also simply called “index”) is written to the tape medium in addition to a file main body. The index includes, as meta-information, the file name and the creation date of the file, and position information and size information on the medium (extents). Principally, the latest index is written to the IP. The file main body and a history of indexes are written to the DP.
When a file on the tape medium is read and written using the LTFS, data are read and written in units of data structures called records. The record is managed by number for each partition in which the record is recorded, such as a number indicating what number from the beginning of the partition the record is. The correspondence between each file and records (e.g., information indicating that file A consists of records # N to # N+α) is also stored in the index.
The interface 110 performs communication with the host 300 through a network. For example, the interface 110 receives, from the host 300, a write command for instructing data writing to a tape cartridge (medium, tape medium) 40. The interface 110 also receives, from the host 300, a read command for instructing data reading from the medium 40. The interface 110 has the function of compressing write data and decompressing read data, increasing the storage capacity onto the medium nearly twice as large as the actual data.
The tape drive 60 performs reading from and writing to the medium 40 in units of data sets (DSs) each of which is made up of multiple records sent from the application on the host 300. The size of DS is typically 4 MB. In the host 300, the file system specifies a file and a SCSI command specifies records to send a write/read request (Write/Read) to the tape drive. The DS consists of multiple records.
Each DS includes management information on the data set. User data is managed in units of records. The management information is included in a data set information table (DSIT). The DSIT includes the number of records and the number of FMs included in the DS, and further the cumulative number of records and the cumulative number of FMs written from the beginning of the medium.
The buffer 120 is a memory for temporarily accumulating data to be written to the medium 40 or data read from the medium. For example, the buffer 120 is constituted by a DRAM (Dynamic Random Access Memory). The recording channel 130 is a communication channel used to write data accumulated in the buffer 120 to the medium 40 or to temporarily accumulate data read from the medium 40.
The read/write head 140 has a data read/write element to perform data writing to the medium 40 and data reading from the medium. The read/write head 140 according to the embodiment also has a servo reading element for reading a signal from a servo track provided on the medium 40. The positioning unit 160 instructs the travel of the read/write head 140 in the shorter side direction (width direction) of a cartridge 40. The motor driver 170 drives the motor 180.
The tape drive 60 writes data to the tape or reads data from the tape in accordance with the command received from the host 300. The tape drive 60 includes a buffer, a read/write channel, a head, a motor, reels on which a tape is wound, a read/write control, a head position control system, and a motor driver. The tape drive removably mounts the tape cartridge therein. The tape travels in the longitudinal direction along with the rotation of the reels. The head travels in the longitudinal direction of the tape to write data or read data from the tape. The tape cartridge 40 includes a non-contact nonvolatile memory called a cartridge memory (CM). The CM carried in the tape cartridge 40 is read and written by the tape drive 60 in a non-contact manner. The CM stores cartridge attributes. Upon reading and writing, the tape drive takes the cartridge attributes from the CM to enable the optimum reading and writing.
The control unit 150 controls the entire tape drive 60. In other words, the control unit 150 controls waiting of data to the medium 40 and reading of data from the medium 40 in accordance with the command received at the interface. The control unit 150 also controls the positioning unit 160 according to a read servo track signal. Further, the control unit 150 controls the operation of the motor through the positioning unit 160 and the motor driver 170. Note that the motor driver 170 may be connected directly to the control unit 150.
The tape drive 60 is loading the medium 40 on which a file to be updated is stored to read the index from the medium and copy the index to a cache 80. The cache 80 is an external storage device such as an HDD, an SDD, or a DRAM. The cache may also be a memory (DRAM) in the tape drive. The LTFS copies the index stored in the IP of the medium simultaneously with the time when the tape drive loads the medium. The LTFS can refer to the index copied in advance to the cache 80 to determine whether the file is fragmented.
When File 1 having size L is recorded on the medium at first in
In order to quantify the deterioration of reading performance, the “time required for positioning” is calculated. The “time required to align” the head of the tape drive to a data part on the medium is the reciprocating time between the position of a data part (non-change data part) that is not changed in the file by the update of the file and the position of a change data part in the file. Strictly speaking, the reciprocating time is the total time of the seek time for positioning from the tail position of the non-change data part to the leading position of the change data part, and the seek time for positioning from the tail position of the change data part to the leading position of the next change data part. The non-change data parts of the file are continuously appended and written in the longitudinal direction of the medium, whereas the change data parts are fragmented on the medium. It takes given time to align the head to the position of a change data part due to unnecessary travel distance and back hitch caused by reversing the direction. When numerous parts are edited, the seek time is excessively lengthened, and this is a factor to limit the improvement of reading performance. An embodiment to solve this problem will be described below.
At step (620): It is checked whether the seek time for total positioning necessary to read the file is a certain time or longer. The “seek time for total positioning” necessary to read the edited file is a total value of respective “times required for positioning” of multiple change data parts. Here, it is assumed that the number of change data parts in the edited file is only one. In this case, it is determined whether the “seek time for positioning” from the position of a non-change data part to the position of the change data part in the file on the tape medium is more than or equal to a threshold value determined statically or dynamically. For example, when the seek time for positioning can be determined to be more than or equal to twice the time (threshold value) required for back hitch (two to five seconds), the change data part is selectively copied to the cache or the like (step 630).
At step (630): Apart written at the update time is stored in the cache. The change data part stored in the IP is copied to the cache in advance. The position information on the cache to which the change data part is copied can be included in the index. At step (640): The LTFS requests the tape drive to write the file to the tape medium. Like in the conventional way, the change data part is appended and written to the end (EOD) of data in the DP of the medium. Further, in the present invention, the change data part is stored in the IP of the medium. Then, when the tape drive 60 mounts the medium 40 therein, the index and the change data part stored in the IP are copied to the cache 80 at the same time.
Note that the change data part may be stored in the cache 80 before being copied from the medium 40. The position information on the cache 80 to which the change data part is copied can be included as an extent in the index. Alternatively, the change data part can be stored in an HDD cache, rather than in the IP, at the file update time to omit the processing for reading file fragments from the IP in advance at the time of mounting the tape medium.
At step (720): It is checked whether a part of the file is in the cache. The LTFS can refer to the index to check whether the index and a change data part stored in the IP are stored in the cache 80. The check on the change data part in the cache may be made after the index and the change data part stored in the IP are copied to the cache simultaneously with the time when the tape drive loads the medium. The index includes an extent that points out the position information on the cache to which the change data part is copied. Therefore, the index (the extent in the index) can be referred to check whether the change data part exists in the cache. When the cache contains the change data part (YES), the procedure moves to processing in step 730. When the cache does not contain the change data part (NO), the procedure moves to processing in step 740.
At steps (730), (740): The part stored in the cache is read from the cache, rather than from the tape medium. The non-change data part is read sequentially through the buffer (step 740). The change data part is read from the external storage device (730). In the processing step 740, the positioning of the non-change data part to non-change data in the file on the medium can be read sequentially. Simultaneously, in step 730, since the positioning and travel of the head on the medium to the change data part of the edited file are omitted, the seek time for the positioning of the change data part can be omitted. Specifically, traveling a predetermined distance associated with the positioning from the non-change data part to the change data part of the edited file, and reversing the direction caused by back hitch, and deceleration/acceleration can be nullified. The LTFS refers to the index to eventually transfer, to the application, the non-change data part from the tape medium and the change data part from the cache sequentially as one edited file as a whole.
As described above, the method of the aforementioned embodiment can store a change data part in an external cache to improve the reading performance of an updated file. While the present invention has been described with reference to the embodiment, the scope of the present invention is not limited to the aforementioned embodiment. It is obvious to those skilled in the part that various changes and adoption of alternative modes are possible without departing from the spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
2013-201140 | Sep 2013 | JP | national |