The present invention relates to an apparatus and method for managing a file in a distributed storage system (DSS), and more specifically, to an apparatus and method for managing a file in a distributed storage system, in which switching between an active file and an archive file is automatically performed by comprehensively considering a degree of aging, the number of connections, a modification state and the like of the file in the distributed storage system.
A distributed storage system or a parallel storage system is a storage system which virtualizes a plurality of storage devices as one storage device. Such a distributed storage system does not store one file in one storage device, but the file is duplicated, stored and used in a plurality of virtualized storage devices in a distributed manner.
As an existing Redundant Array of Inexpensive Devices (RAID) storage device integrates a plurality of hard disks into one storage device to construct a further larger, further faster and further stable storage device, the distributed storage system may provide functions of a further larger, further faster and further stable storage system by configuring a plurality of storage devices into one storage device.
Such a distributed storage system technique is used as a core technique in cloud computing or the like, and if the number of storage devices configuring the distributed storage system increases further more, capacity and performance of the distributed storage system are proportionally enhanced, and cost-effectiveness of the Total Cost of Owner-ship is maximized. Therefore, the distributed storage system may provide high-level performance and expandability which cannot be provided by existing storage systems.
In relation to this,
Referring to
Meanwhile, in such a distributed storage system, a plurality of storage servers 110 is divided into active servers 111 and archive servers 112 in order to efficiently store files, and relatively aged files (data or contents) are stored in the archive servers 112 having a somewhat low performance, and thus limited storage media can be efficiently used.
However, since a method of managing a file according to a conventional technique divides files (data or contents) into active files and archive files simply based on age and backs up aged archive files into the archive servers 112 having relatively low performance, even the files consistently and frequently requested by clients, although an extended period of time has passed after being created, are stored in the archive servers, and thus system performance is degraded.
That is, in the conventional techniques, since archive files are selected only based on a degree of aging without considering the number of current connections, a modification state or the like of the files in the least, even the files that are consistently and frequently requested by the clients are stored in the archive servers. Furthermore, if a file is selected as an archive file and moved into an archive server, it is not automatically restored to an active file although the file is frequently inquired by the clients later, and thus overall system performance and efficiency are degraded.
Therefore, the present invention has been made in view of the above problems, and it is an object of the present invention to provide an apparatus and method for managing a file, which is capable of efficiently managing files (data or contents) and economically managing disks in a distributed storage system.
Another object of the present invention is to provide an apparatus and method for managing a file, in which switching between an active file and an archive file is automatically performed by comprehensively considering the number of connections and a modification state, as well as a degreed of aging, in a distributed storage system.
Still another object of the present invention is to provide an apparatus and method for managing a file, in which files are periodically relocated, and if the number of inquiries on a certain file increases and exceeds a predetermined level or contents of the file is modified or changed, the file is automatically restored to an active file, thereby efficiently managing the file in a distributed storage system.
Still another object of the present invention is to provide an apparatus and method for managing a file, which is capable of efficiently implementing Information Lifecycle Management (ILM) of a Disk to Disk (D2D) level in a distributed storage system.
Still another object of the present invention is to provide a distributed storage system which efficiently uses the apparatus and method for managing a file described above.
To accomplish the above objects, according to one aspect of the present invention, there is provided a file management apparatus of a distributed storage system, the apparatus including: a retention time calculation unit for calculating a retention time of the file based on at least one of a current time, a file creation time, a file modification time and a recent file inquiry time; a file selection unit for selecting the file as an archive file if the retention time of the file is larger than a predetermined reference time; and a file management unit for relocating an original file and some or all of copy files of the file selected as an archive file from an active server to an archive server or from an active disk to an archive disk.
According to another aspect of the present invention, there is provided a distributed storage system including: a plurality of storage servers including an active server and an archive server for storing a file in a distributed manner; and a metadata server for managing metadata of the file, wherein the metadata server calculates a retention time of the file based on at least one of a current time, a file creation time, a file modification time and a recent file inquiry time, and relocates an original file and some or all of copy files of the corresponding file from the active server to the archive server if the retention time of the file is larger than a predetermined reference time.
According to still another aspect of the present invention, there is provided a distributed storage system including: at least a storage server including an active disk and an archive disk for storing a file in a distributed manner; and a metadata server for managing metadata of the file, wherein the metadata server calculates a retention time of the file based on at least one of a current time, a file creation time, a file modification time and a recent file inquiry time, and relocates an original file and some or all of copy files of the corresponding file from the active disk to the archive disk if the retention time of the file is larger than a predetermined reference time.
According to another aspect of the present invention, there is provided a file management method of a distributed storage system, the method including the steps of: calculating a retention time of the file based on at least one of a current time, a file creation time, a file modification time and a recent file inquiry time; selecting the file as an archive file if the retention time of the file is larger than a predetermined reference time; and relocating an original file and some or all of copy files of the file selected as an archive file from an active server to an archive server or from an active disk to an archive disk.
According to the present invention, since switching between an active file and an archive file is automatically performed by comprehensively considering the number of connections and a modification state, as well as a degreed of aging, in a distributed storage system, efficient management of files and economic management of disks are enabled, and thus system performance and efficiency are improved.
In addition, according to the present invention, if the number of inquiries on a certain file relocated to an archive server increases and exceeds a predetermined level or the file is modified or changed in a distributed storage system, the file is automatically restored to an active server, and thus an efficient backup and restoration system can be constructed.
In addition, according to the present invention, since Information Lifecycle Management (ILM) of a Disk to Disk (D2D) level is efficiently implemented in a distributed storage system, old and less useful files are moved to a disk of a low cost, and thus overall cost of the entire system is reduced.
The preferred embodiments of the present invention will be hereafter described in detail, with reference to the accompanying drawings. Furthermore, in the drawings illustrating the embodiments of the present invention, elements having like functions will be denoted by like reference numerals and details thereon will not be repeated.
Before describing the present invention in detail, the Information Lifecycle Management (ILM) will be briefly described.
Generally, information (files, data and contents) has a lifecycle including creation, use, long-term storage, deletion and the like. The ILM manages the information according to a situation considering such an information lifecycle (i.e., considering the current stage of the information in the lifecycle). That is, the ILM efficiently manages gradually increasing data by using an optimum storage relevant to changes in the value of the information.
For example, files created just before are actively used in most cases, and tasks for modifying and inquiring the files are frequently generated. Therefore, it is preferable to broaden the bandwidth, increase the number of copy files, and store the files in a storage medium having a good performance so as to easily access the files. In comparison, the number of inquiries on aged information is decreased, and modifications on the aged information almost do not occur. Accordingly, such files do not need a broad bandwidth and are preferably stored in a storage medium having a large capacity with a relatively low performance.
In this manner, if utilization of certain information is lowered, cost of the storage system is attempted to be reduced by moving the information from an active disk to an archive disk, and such a method is referred to as a D2D backup. The present invention proposes a method of implementing a further efficient ILM at the D2D level and particularly proposes a method of efficiently managing a file comprehensively considering the number of connections and a modification state to overcome the limitations of a conventional backup method which simply considers only a degree of aging of a file.
Referring to
Referring to
Describing additionally, the file management apparatus according to the present invention is configured as a separate apparatus or server in a distributed storage system (refer to
Although it is not shown in the figure, in the distributed storage system according to another embodiment of the present invention, the storage servers for storing files in a distributed manner may not be divided into active servers and archive servers, and each of the storage servers may be implemented to include an active disk and/or an archive disk.
In relation to this,
In addition,
Meanwhile,
Then,
Hereinafter, a file management apparatus and method in a distributed storage system according to the present invention will be described in detail with reference to
First, referring to
For example, the retention time calculation unit 241 and 321 may be implemented to calculate the first retention time by subtracting the file creation time or the file modification time from the current time in order to consider the time point when the files is created or modified and to calculate the second retention time by subtracting the recent file inquiry time from the current time in order to consider the time point when the information is finally inquired.
For reference, in the present invention, the file creation time, the file modification time and the recent file inquiry time subtracted from the current time in order to calculate the file retention time is referred to as a data time, and this can be implemented to be set by a user or a manager. In this case, the file retention time can be defined as shown in mathematical expression 1.
File retention time=Current time−Data time [Mathematical expression 1]
In addition, in the file management apparatus according to the present invention, the file selection unit 242 and 322 selects an active file and an archive file by comparing the file retention time calculated as described above with a predetermined reference time.
Specifically, the file selection unit 242 and 322 compares the first retention time obtained by subtracting the file creation time or the recent modification time from the current time with the reference time (refer to S720 of
In addition, the file selection unit 242 and 322 may compare the second retention time obtained by subtracting the recent file inquiry time from the current time with the reference time (refer to S740 of
Then, the file management unit 243 and 323 of the file management apparatus according to the present invention backs up the original file and some or all of copy files of the file selected as an archive file from an active server to an archive server or from an active disk to an archive disk depending on a result of the selection of the file selection unit 242 and 322.
In this case, the file management unit 243 and 323 backs up the original file and some of the copy files of the file selected as an archive file from an active server to an archive server or from an active disk to an archive disk if the first retention time is larger than the reference time and the second retention time is smaller than the reference time (a first stage backup) (refer to S750 of
Meanwhile, the multi stage backup described above may be performed by the setting of the user (manager) or automatically performed, and in this case, the number of backup files (N) may be set, for example, as shown in mathematical expression 2 in the first stage backup which backs up some of the files.
N=N
total*(offset_time—1/tmax) [Mathematical expression 2]
Here, Ntotal denotes the total number of the original and copy files, offset_time—1 denotes a value obtained by subtracting the reference time from the first retention time, and tmax denotes a value of offset_time—1 when a value obtained by subtracting the reference time from the second retention time is 0.
Then, if the present invention is implemented as described above, the retention time calculation unit 241 and 321 can be implemented to calculate an offset time offset_time in advance as shown in mathematical expression 3, and the file selection unit 242 and 322 can be implemented to select an active file and an archive file by determining whether the offset time is positive (+) or negative (−).
Offset time=(Current time−Data time)−Reference time [Mathematical expression 3]
The reason why the backup is performed in two stages as described above in the present invention is as follows. The first case (refer to S750 of
In addition, according to a preferred embodiment of the present invention, the file management unit 243 and 323 can be implemented to back up files by the unit of file or chunk when the original file and some or all of the copy files of a file selected as an archive file are backed up.
Meanwhile, although an archive file is selected and the original file and some or all of the copy files of a corresponding file are backed up (relocated) to an archive server or an archive disk, management on these files is continued. If the number of inquiries on this file increases again, some or all of the backed files (the original and copy files) are restored to an active server or an active disk.
Specifically, the file selection unit 242 and 322 continuously observes the number of inquiries on this file selected as an archive file for a certain counting period (refer to S810 of
For reference,
That is, in the case of
Finally, the metadata management unit 324 and the storage device management unit 325 of
Describing in short, the metadata management unit 324 creates and manages metadata of the files stored in a plurality of storage servers (active servers and archive servers) in a distributed manner, and the storage device management unit 325 manages information on performance and capacity of the plurality of storage servers. Accordingly, the file management unit 323 may further efficiently manage the files in association with the metadata management unit 324 and/or the storage device management unit 325.
Meanwhile, the method of managing a file in a distributed storage system according to the present invention may be embodied through a computer readable recording medium containing program commands for performing the operations implemented in a variety of computers. The computer readable medium may include program commands, data files, data structures and the like in a single or combined form. The recording medium may be a medium that is specially designed and configured for the present invention or a medium that is publicized and available for those skilled in the computer software art. Examples of the computer readable medium include magnetic media such as a hard disk, a floppy disk and a magnetic tape, optical media such as a CD-ROM and a DVD, magneto-optical media such as a floptical disk, and hardware devices specially configured to store and execute the program commands, such as ROM, RAM and flash memory. Examples of the program commands include high-level language codes that can be executed by a computer using an interpreter or the like, as well as machine codes such as those generated by a compiler.
While the present invention has been described with reference to the particular illustrative embodiments, it is not to be restricted by the embodiments but only by the appended claims. It is to be appreciated that those skilled in the art can change or modify the embodiments without departing from the scope and spirit of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
10-2009-0106949 | Nov 2009 | KR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/KR2010/007766 | 11/4/2010 | WO | 00 | 4/3/2012 |