The present application claims priority from Japanese application P2003-381610 filed on Nov. 11, 2003, the content of which is hereby incorporated by reference into this application.
The present invention relates to a file system which is excellent in providing redundancy and exhibits an excellent read performance.
For access to a disk drive mounted under an operating system (OS), a file system allows a data file to be divided to several blocks, written into a single volume, and read from the single volume on a block basis.
As a known technique for a file system ensuring redundancy of a data file, U.S. Pat. No. 5,724,500 discloses a method in which backup regions of volumes are each divided to a number corresponding to a given number of cylinders provided in each multiwritten volume, pieces of data from the backup regions each divided to the given number are inputted concurrently and parallelly, and the inputted pieces of data are outputted to a backup destination in the order of the cylinders through a buffer within a CPU, thereby reducing backup time for the multiwritten volumes remarkably.
Furthermore, there is known a technique in which, when a disk controller in a storage system receives a write request from a host computer, the disk controller issues the same write request to a mirror disk drive in the same pair (refer to JP 2003-157151 A).
In the above-mentioned related art, even when there are plural mirror volumes and backup volumes, only a single volume is mounted on a host side (or OS side). Therefore, there is no consideration for the amount of time necessary when reading a large-capacity file.
An embodiment of the present invention provides a file system capable of ensuring redundancy when writing and capable of reading in parallel from multiple volumes, thereby reading large-capacity data files in a short time.
An embodiment of the present invention is provided with a storage system including plural volumes, and a volume allocation table in which plural volumes are set for each directory. When a write command is received, a file is divided on a block basis, the same block of all volumes are written the volumes are set in the volume allocation table, and attributes of the file written into the volumes and block information about the divided blocks are stored in a file allocation table.
On the other hand, when a read command is received, a volume is determined from the volume allocation table based on a file directory, the number of blocks is obtained for each volume from a block reading table in which the numbers of blocks read out in a single reading operation from each volume are set, the number of blocks to be read is determined for each volume by referring to the volumes, the numbers of blocks, and the block information in the file allocation table, reading is performed from each volume for each determined number of blocks in parallel, and the plural read blocks are rearranged based on the block information in the file allocation table, and a file is assembled.
Therefore, the embodiment of the present invention enables high-speed file reading and writing processings.
Hereinafter, an embodiment of the present invention is explained based on the accompanying drawings.
The host computer 1 is provided with a CPU, a memory, a display device, a keyboard, a mouse, and the like, which are not shown in the diagram. An application 4 is executed on an operating system (OS) 5.
When the application 4 and the OS 5 access the storage system 3, the access is made to the storage system 3 via a file system 6. When a request to access the storage system 3 is received from the OS 5, the file system (“Fast Redundant File System” in the diagram) selects a volume and block to be accessed, as described below, and makes a request to the storage system 3.
The storage system 3 is provided with plural volumes #1 to #n. At each volume, a disk controller 31 controls on a disk drive constituting each volume, to read and to write to/from the volume and the block requested by the disk control unit 6 of the host computer 1 via the SAN 2. It should be noted that the storage system 3 saves the same data in plural volumes in order to give the data file redundancy.
The file system 6 is mainly constituted of a write control unit 7 that writes files divided to blocks into any of the independent volumes #1 to #n, a read control unit 8 for reading on a block basis in parallel from the plural volumes 30 (#1 to #n), and various tables 9 to 12 having set therein correspondences between a file and the volumes 30 and blocks. It should be noted that, in
First, the various tables 9 to 12 are explained.
For example, next to record #1 is information about volumes in a directory “/home” which the file system 6 mounts. Four volumes #1, #2, #3, #4 are mounted in the directory “/home”. When there are plural volumes under the mounted directory, a flag (YES or NO) is provided to distinguish which is the primary volume. The “YES” flag is set for the primary volume.
The primary volume corresponds to the independent volume 30 that the file system 6 accesses when writing the file, as described below. The mounted volumes other than the primary volume are secondary volumes, which replicate the content of the primary volume, as described below.
Furthermore, when the one volume 30 is mounted in the directory, it is not particularly necessary to set the primary volume flag. For example, only the one volume #10 is mounted under the mounted directory (Mounted directory) represented by “/tmp” in the diagram. In this case, the flag is maintained “NO”. Since the file system 6 includes one volume, this is treated as the primary volume.
In other words, the table is constituted with each file's name (“Filename” in the diagram), updated date (“Date” in the diagram), access state which is indicated by read or write (“Access state” in the diagram), access right (“Access right” in the diagram), list of block number of the volume used by the file, the directory which the file belongs to, etc., which are defined for each file (“Record #” in the diagram). It should be noted.
The block reading table 11 is set with the number of blocks read from each volume in response to a single read command. For example, at volume #1 four blocks are read by a single read command, and at the volume #2 three blocks are read by a single read command.
It should be noted that the differences in the number of blocks between volumes are predetermined according to the reading speed and other performance of the disk drive constituting the volume.
Furthermore, a volume reading table 12 is created by the read control unit 8 when reading the file. This is explained below.
Next, explanation is given regarding the write control unit 7 and the read control unit 8.
First, when a write command is received from the application 4, the write control unit 7 refers to the above-mentioned volume allocation table of
When the disk controller 31 of the storage system 3 receives the write command from the write control unit 7 of the file system 6. The disk controller 31 writes the data into the designated primary volume, and also writes the same data into the secondary volume that corresponds to the primary volume, the primary volume and the secondary volume constituting a pair of volumes.
The designation of the secondary volume is notified to the storage system 3 by the write control unit 7 upon each write command, or the file system 6 notifies the storage system 3 of it at predetermined timing (e.g., when initialization is performed, or when the volume allocation table is updated).
Next, the write control unit 7 refers to the volume allocation table 9 shown in
Then, the write control unit 7 issues the write command to the storage system 3 to write data to the volume #1, and also transfers the data block (step S3 of
Accordingly, the disk controller 31 of the storage system 3 writes the data forwarded from the write control unit 7 into the volume #1 on a block basis, and writes the same data into the volumes #2, #3, and #4 which are the secondary volumes of the primary volume #1. For example, when a file in the directory “/home” is divided to 1 to 10 blocks, the same data is written into the blocks number 1 to 10 of each of volumes #1 to #4.
Then, when the writing ends, the disk controller 31 notifies the write control unit 7 that the writing has ended, and the write control unit 7 also notifies the application 4 that the writing has ended (step S4 of
In this way, writing is performed in block units to the primary volume obtained from the volume allocation table, based on the mounted directory, and the mirroring or replication function of the storage system 3 writes the replicates into the secondary volumes which were predetermined.
Next, explanation is given regarding the read control unit 8.
In
When the host computer 1 has plural ports connected to the SAN 2, the block reading units 81 may be provided to each port. Preferably, the block reading units 81 and the ports are provided in association with the number of volumes that read at a time. Alternatively, when there are a small number of physical ports connected to the SAN 2, the block reading units 81 may be provided in association with the volumes that read the logical ports at a time, and plural block reading units 81 may be provided to correspond to the logical ports.
As shown in the above-mentioned block reading table of
Therefore, the block assembling unit 80 rearranges the blocks that were read in parallel by the block reading units 81, and reassembles the original file, based on the volume numbers set in the above-mentioned volume allocation table 9 of
Next, referring to
Next, the volume number that was read out is added to the list under “Volume #” in the volume reading table shown in
Then, the list of block numbers where the file is stored is read from the record corresponding to the file name X for which the read command was issued from the file allocation table of
Here, “Volume #” in the volume reading table 12 is referred to determine whether there are multiple volumes (step S15 of
Next, when there are plural blocks, each volume written under “Volume #” of
This assignment will be explained with respect to a case where, for example, a file “A” in “/home” is stored into the volumes #1 to #4, and the file allocation table 10 is notified that block numbers 1 to 40 are being used.
Regarding the volumes #1 to #4, the block reading table 11 of
In other words, since the volume #1 reads 4 blocks at a time, the block numbers 1 though 4 are written to update the list of block number in the volume reading table 12. Next, since the volume #2 reads 3 blocks at a time, the next 3 block numbers after those read by the volume #1, which are the block numbers 5 to 7, are written. Since the volume #3 reads 2 blocks at a time, the next 2 block numbers after those read by the volume #2, which are the block numbers 8 and 9, are written. Since the volume #4 reads 1 block at a time, the next block after those read by the volume #3, which is the block number 10, is written. Then, returning again to the volume #1 at the top of the volume reading table 12, the 4 blocks after the block number 11 read by the volume #4, which are the block numbers 11 to 14, are written to update the list of block number in the volume reading table 12. This processing continues to the last of the block numbers.
Thus, as shown in
Next, based on the list of block number determined in the volume reading table 12 of
In
Then, the processing verifies whether the read command was actually executed (step S20 of
On the other hand when the single reading was not complete at step S20 of
In the error trap processing, first, after waiting for a predetermined period for the reading to finish (steps S25, S26 of
The reading processing that was performed at steps S18 to S21 of
Therefore, when a single read command is executed, from the volume #1 are read the block numbers 1 to 4, from the volume #2 are read the block numbers 5 to 7, from the volume #3 are read the block numbers 8 and 9, and from the volume #4 are read the block number 10. In total, the block numbers 1 to 10 are read. Then, this reading is performed 4 times, thus completing the reading of the block numbers 1 to 40 that are set in the volume reading table 12 of
Then, when all the reading is complete, the block assembling unit 80 of
According to the reading processing described above, the requested files can be read from multiple volumes simultaneously (in parallel), each in multiple block units. This enables a large-capacity file to be read at high speed. For example, a large-capacity file such as a moving image file or a music file can be processed at high speed, and, in the rare even that a failure or the like occurs in one of the volumes of the storage system 3, the file can be read from the other normal volumes, thus providing a high-speed file system that is redundant.
Furthermore, since the number of blocks that each volume can read at a time can be set to a value reflecting the volume's reading performance (the disk drive's reading performance), all the volumes do not have to have the same performance. This produces an advantage that the existing storage system 3 can be utilized effectively, suppressing expanding equipment investment while enabling adoption of a high-speed file system.
As described above, the file system 6 of the present invention is provided with a volume allocation table 9 where all volumes used to write and read are mounted under each directory. As shown in
Next, when reading, any block can be read from any volume because all the volumes are mounted. The number of blocks that are read for a single read command is set for each volume beforehand (in the block reading table 11). From the file allocation table 10 and the block reading table 11, where the files' block numbers and the like are recorded, the volume reading table 12, which assigns the block numbers of the file that will be read in volume units, is created. Accordingly, as in Read 1 to 4 shown in
Furthermore, by providing the block assembling unit 80 which rearranges the data divided to blocks after the reading is finished, the differences between the read finish times of the each of the volumes are absorbed, so that the file can be created from the divided blocks, maintaining both high speed and reliability.
Furthermore, when a failure occurs in any of the volumes that are being read, the assigned block numbers are reassigned to the other normal volumes, which ensures the redundancy of the data, and enables the realization of a file system possessing redundancy and high-speed reading performance.
It should be noted that when an application is reading a file and another application also tries to read the same file, the reading by the other application is allowed. On the other hand, when writing is being performed, another application can read the same file that is being written, but writing is prohibited. Because of this, whether the application's operation to access the file is to read or write can be judged based on whether the access state in the file allocation table 10 is “read” or “write”.
When the write control unit 7 receives the write command for a file in “/home” from the application 4, the write control unit 7 first divides the file to plural blocks so that the blocks have given data lengths, and then by referring to the volume allocation table 9 of
Next, after the list of volume numbers (step S42 of
In this case, since it is not necessary to distinguish between primary and secondary volumes mounted under each directory, the processing load required for writing is reduced. Furthermore, since the writing is performed in volume units in parallel, plural ports connectable to each volume of the storage system 3 may be used similarly to the above-mentioned the block reading units 81. It should be noted that writing is similar to the first embodiment.
The foregoing descriptions illustrated the case where the write control unit 7 performs the mirroring by means of software. However, the storage system 3 may also perform the mirroring by means of hardware, although this is not shown in the diagrams.
It should be noted that, in the above-mentioned embodiment, the block reading table 11 of
When mounting the volumes under each directory, if the OS 5 has a GUI the volume allocation table 9 can be operated from the GUI. Explanation is now given regarding a case where, for example, as shown in
First, the user inputs a desired directory name into the path input column 93. Then, the user clicks the mouse on the volume that the user wishes to select from the volume selection column 91. After that, a button 95 is clicked to move the selection to the determining column 92. Thus, the volume allocation table 9 can be set.
Furthermore, the number of blocks in the block reading table 11 may be set at the time the volume is being mounted. In this case, a pull-down list 94 shown in
After mounting of the volume and setting of the number of blocks have been finished, an OK button 96 can be clicked to reflect the settings in the volume allocation table 9 and the block reading table 11, as shown in
It should be noted that the embodiments described above illustrated an example in which the host computer 1 and the storage system 3 are connected by means of the SAN 2, but they may be connected by means of a LAN.
Furthermore, the embodiments described above illustrated an example in which the storage system 3 performs the replication of the primary volume written by the write control unit 7, and an example in which the write control unit 7 performed the mirroring by means of software. However, it is also possible for the storage system 3 to perform the mirroring of a RAID 1 or the like only when writing.
While the present invention has been described in detail and pictorially in the accompanying drawings, the present invention is not limited to such detail but covers various obvious modifications and equivalent arrangements, which fall within the purview of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
2003-381610 | Nov 2003 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
5724500 | Shinmura et al. | Mar 1998 | A |
5745915 | Cooper et al. | Apr 1998 | A |
6073209 | Bergsten | Jun 2000 | A |
RE38410 | Hersch et al. | Jan 2004 | E |
6715054 | Yamamoto | Mar 2004 | B2 |
6728832 | Yamamoto et al. | Apr 2004 | B2 |
7024534 | Sasaki et al. | Apr 2006 | B2 |
20020010762 | Kodama | Jan 2002 | A1 |
Number | Date | Country |
---|---|---|
2003-157151 | May 2003 | JP |
Number | Date | Country | |
---|---|---|---|
20050102484 A1 | May 2005 | US |