This application is based upon and claims the benefit of priority from Japanese patent application No. 2007-137220, filed on May 23, 2007, the disclosure of which is incorporated herein in its entirety by reference.
1. Field of the Invention
The present invention relates to file management systems of files stored in an external storage device such as a magnetic disc, in particular, to a synchronous management system of the files stored in a plurality of external storage devices.
2. Description of the Related Art
When clients of a personal computer and the like share files through a network, a file management system for managing a disc space mounted on a file server of the network performs file management. In this case, the data is generally taken backup by periodically making copies using the external storage device existing on the same network in order to prevent loss of data on the external storage device and to prevent accesses from concentrating on a specific external storage device. Specifically, the access load on the same file is distributed to a plurality of external storage devices and the load is reduced by making copies so that the same data is held in a plurality of external storage devices. The risk of losing important data is reduced by copying the data on an exchange medium such as tape, and storing the exchange medium in a safe place.
Japanese Laid-Open Patent Publication No. 2003-196136 (Patent document 1) discloses a backup system for realizing a backup operation in units of files or realizing difference backup of backing up only the updated files by using an external storage device mounted with the file management system for the backup of a network connected storage mounted with the file management system.
Japanese Laid-Open Patent Publication No. 2001-159997 (Patent document 2) discloses a method of suppressing the server access frequency, and reducing the file access or the load of the network by holding update interval information of page data in a file management system that performs file input/output with an HTTP (Hyper Text Transfer Protocol of web server and the like.
Japanese Laid-Open Patent Publication No. 2004-005092 (Patent document 3) discloses a storage system including a synchronization level management table for registering/managing synchronization levels for every information type, and a synchronization interval registration table for registering/managing synchronization time interval of the information on the synchronization level.
The related arts have the following problems.
In the file management system, generation date and time, update date and time, owner, and other attributes of a logical collection called a file which is managed by the file management system are to be managed, but update frequency, usage mode, time fluctuation of the update frequency of the data are not to be managed. Thus, in order to synchronize the file which is constantly reflecting the recent state, that is, in order to take backup for example, there is a need to frequently perform the backup operation itself, to constantly monitor the update state of the file, and the like. As a result, the load on the hardware of the external storage device etc. and on the network becomes large.
It is an exemplary object of the invention to provide a file management system etc. for reducing the load on the hardware due to file synchronization.
A file management system according to an exemplary aspect of the invention includes a time measurement unit for recording an update history of a file; an update interval calculation unit for calculating an update interval and a blank period of the file based on the update history and determining a synchronization time of the file based on the update interval and the blank period; and a file management unit for executing synchronization of the file stored in a plurality of storage media at the synchronization time.
The exemplary embodiments of the invention will now be described in detail with reference to the drawings.
The first exemplary embodiment includes a plurality of clients 4 which accesses the servers via the network 3. A case in which one master server and one slave server are arranged is shown in
The client 4 is an information processing device such as a personal computer (hereinafter written as “PC”) that has a function of connecting to the network 3, and a function of a client to use the file sharing service provided by the master server 1 and the slave server 2. Each client 4 requests for input/output of a file to/from the master server 1 or the slave server 2, where in normal use, load distribution is achieved by arranging the slave server 2 in plurals. In such a case, the client 4 selects one of the plural slave servers 2, and then accesses the file on the relevant slave server 2. An IP (Internet Protocol) address resolution through a DNS (Domain Name System) server used on the Internet, for example, can be applied as a measure for the client 4 to select the slave server. Specifically, load distribution is achieved by including the DNS server, which has received the inquiry, return the IP address of the slave server 2 that is physically close to the client 4.
The method of realizing the network 3 is not limited herein. In addition to the IP base network used on the Internet, the present invention may apply to an SAN (Storage Area Network) environment etc. using a fiber channel and the like is also possible. In such a case, in addition to the network 3 on the client 4 side, networks are inserted between the master server 1 and the slave server 2, and the external storage devices 19, 28, respectively. The external storage device is shared by each server. The present invention may also apply even when it is configured with a network dedicated to an independent external storage device. That is, the external storage devices 19, 28 are not limited to the ones being incorporated in the master server 1 or the slave server 2.
The configuration of the master server 1 will now be described using
The network interface 12 transmits and receives files and commands with the slave server 2 and the client 4.
The control unit 11 executes a process corresponding to the command the network interface 12 received from the slave server 2 or the client 4. Specifically, the control unit 11 interprets the command content of the input/output request received via the network interface 12. The control unit 11 then determines necessity of input/output of data according to the requested content, and sends a file input/output request to the file management unit 13 when determined that input/output of the actual data is necessary. The control unit 11 controls the update interval calculation unit 18 including a function of calculating the update interval of the file based on input/output history information of the file acquired from the file management unit 13. The control unit 11 records the update interval in the external storage device 19 via the file management unit 13, and sends the same to the slave server 2 in response to an information request from the network interface 12.
When the file input/output request is made from the client 4, the time measurement unit 14 records an update history indicating the relevant time.
The area management unit 15 manages the storage area of the external storage device 19.
The file management unit 13 performs arrangement management of the data on a disc. Specifically, the file management unit 13 calculates the recorded position etc. of the actual data using the area management unit 15. The time when the input/output request is made is also measured in the time measurement unit 14, and a history of input/output request for every file is created. The file management unit 13 records the created history in the update history storage unit 16, or records the created history as an update history list in the external storage device 19.
The input/output control unit 17 executes input/output of data with respect to the external storage device 19 based on instruction of the file management unit 13.
The update interval calculation unit 18 calculates, for every file, the update interval of a file and a period (hereinafter referred to as “blank period”) during which it can be assumed that write has not been made with respect to a certain file based on the history. The history on the files stored in the external storage device 28 of the slave server 2 is received from the slave server 2 via the network 3.
The external storage device 19 performs read and write of information with respect to a storage medium. A magnetic disc device, an optical disc device, a silicon disc device, and the like can be used as the external storage device. In the present invention, a case in which one part of a main storage device arranged in the master server 1 etc. is virtually used as the external storage device (e.g. RAM disc) is also encompassed within the concept of external storage device.
The configuration of the slave server 2 will now be described using
The network interface 22 transmits and receives files and commands with the master server 1 and the client 4.
The control unit 21 executes a process corresponding to the command the network interface 22 received from the master server 1 or the client 4. Specifically, the control unit 21 interprets the command content of the input/output request received via the network interface 22. The control unit 21 then determines necessity of input/output of data according to the requested content, and sends a file input/output request to the file management unit 23 when determined that input/output of the actual data is necessary. The control unit 21 acquires the update interval and the update time from the master server 1 via the network interface 22.
The time measurement unit 24 records the time when the file input/output request is made from the client 4. The area management unit 25 manages the usage state of the area of the external storage device 19.
The file management unit 23 performs arrangement management of data on a disc. Specifically, the file management unit 23 calculates the recorded position etc. of the actual data using the area management unit 25. The time when the input/output request is made is also measured in the time measurement unit 24, and a history of input/output request for every file is created. The file management unit 23 records the created history in the update history storage unit 26, or records the created history as an update history list in the external storage device 28. The input/output control unit 27 executes input/output of data with respect to the external storage device 28 based on instruction of the file management unit 23.
The file input/output operation in the present exemplary embodiment will be described using
The client 4 specifies a file on the distributed file management system connected to the network 3 and issues an input/output request. As the method of specifying the file, making an access based on identification information such as URL (Uniform Resource Locator) on the normal Internet is considered, but is not particularly limited to a specific form as long as the file can be specified. If a plurality of servers exists as in the present system, the identification information such as URL is provided by being converted to identification information of one specific server according to an appropriate rule when being converted to server identification information (hereinafter referred to as “host ID”) such as IP address of the host.
The client 4 selects the slave server 2 based on the identification information of the server, and issues the input/output request. Here, it is assumed that the slave server 2 is usually prepared in plurals to distribute the load, the host ID of the slave server 2 is notified to the client 4 and the client 4 issues the input/output request to the relevant slave server 2.
After performing authentication regarding the necessity of access based on the user identification information (hereinafter referred to as “user ID”) or client identification information (hereinafter referred to as “client ID”) obtained from the client 4 via the network interface 22, the slave server 2 accepts the input/output request from the client 4. The input/output request from the client 4 is a request for input/output such as Read/Write in units of files.
The operation of the slave server 2 will now be described. In the slave server 2, the input/output request from the client 4 is received by the network interface 22, and such command is transmitted to the control unit 21. The control unit 21 performs synchronous management of the file according to the command content, and thereafter, executes input/output of the file on a local disc. The synchronous management algorithm of the file will be hereinafter described. In the control unit 21, the instruction of input/output of the file stored in the local disc is provided to the file management unit 23, and the input/output of data at the file position on the external storage device 28 is executed through the input/output control unit 27.
In the input/output operation, the update history information is generated including the time information measured in the time measurement unit 24 and the user ID or the client ID information for specifying the request issuing source of the client 4 in the file management unit 23, and managed in the update history storage unit 26 to leave the history of the requested content. The file management unit 23 records the history data in the external storage device 28 or the update history storage unit 26 as metadata information of the file management system along with an area management structure of the external storage device used by the file management system. The metadata of the file management system refers to a data structure that carries out area management of the files managed by the file management system.
“File ID” is information for the file management system to identify the file. “File name” is information for the user to identify the file. “Owner ID” is information indicating the user ID of the owner of the file. “File size” is information indicating the data amount of the file in units of bytes. “Dirty flag” is information indicating whether or not the relevant file is synchronized, where value “0” indicates being synchronized and value “1” indicates not being synchronized (Dirty). “Created date and time” is information indicating the date and the time the file is created. “File area list” is information indicating the storage area at where the file is stored on the external storage device 28. “Recent update date and time” is information indicating the recent date and the time the update is performed on the relevant file. “Final synchronization date and time” is information indicating the most recent date and the time the synchronization is performed on the relevant file. Each item described up to now is generally to be used in the file system, and not all of such items need to be included in the metadata in the implementation of the present invention. Further, it is also acceptable that items other than the above are included.
An update history pointer is information pointing to a position at where the update history on the relevant file is stored. The value of “Addrl” and the like indicates the address of the memory, the block or the sector of the external storage device, or the like.
An update interval pointer is information pointing to a position at where the update interval on the relevant file is stored. The value of “AddrA” and the like indicates the address of the memory, the block or the sector of the external storage device, or the like.
The update history is to be sequentially added and becomes larger, but the size thereof merely needs to be held within a period necessary in processing of the update interval in the update interval calculation unit 18 in the master server 1. If the analysis of the data access cycle is set to a maximum of one week in the update interval calculation unit 18, the update history merely needs to be held within the relevant period. After the calculation process in the update interval calculation unit 18, implementation of appropriately deleting the update history and suppressing enlargement may be applied.
The file management unit 13 lists the update history as an access history as shown in an example of
The update history information is used in synchronous management in the file management system, and thus needs to be managed in a unified manner by the file management mechanism to guarantee consistency. Thus, data complying with the metadata such as update interval information obtained from the history management and the history thereof are also uniquely managed by the file management system. In the present exemplary embodiment, the update history information is uniquely readout using the update history pointer and the update interval pointer from the metadata managing the file as shown in
In the present exemplary embodiment, the properties of update on the file are managed as an update interval list as shown in
Other than the file management structure shown in
The update interval calculation unit 18 of the master server 1 makes an analysis on the update interval based on the access history on each slave server 2, and generates the resultant information as an update interval list shown in
In the master server 1, the update history list of the master server 1 can be corrected based on the history information in the slave server 2 by transmitting and receiving the metadata information of the file management system further including the update interval list and the update history list via the network interface 12. Consequently, with regards to the accesses made on the external storage device of the plurality of slave servers 2, the update history can be collected, and the properties thereof can be analyzed in the update interval calculation unit 18.
The update history list and the update interval list are recorded on a disc in the master server 1 and the slave server 2 as data referenced from the metadata of the file management system, as described above. The consistency of the data is ensured by once tallying the information measured in the slave server 2 in the master server 1 and distributing the calculation result to the slave server 2.
The synchronization between the master server 1 and the slave server 2 using the update interval list will now be described using
First, a flowchart of file input/output including synchronous management of the master server 1 is shown in
First, the type of event is determined, and whether the event is the one based on time interval is determined (S101). If it is the event at the data update time, mutual copying is executed with the slave server 2 regarding the file registered in the update directory as the data to be updated, and synchronization of data is executed (S102, YES in determination of S101). With respect to the event based on the time interval, the data to be updated is registered and managed in the update directory organized according to update time to manage the timing at which each file is to be updated based on the update interval list, and the synchronization operation is executed sequentially at the time of update time event. The update directory will be hereinafter described in detail.
If the event is the one based on a command, and which is other than the update time event (NO in determination of S101), the type of command is sequentially determined. First, whether or not the event is either the data update notification of a specific file or the Dirty flag set request is determined (S103). If the event is either of them, the Dirty flag is set in the metadata (S104), and the command processing content is recorded in the update notification list of the data (S106).
If the event is neither the data update notification nor the Dirty flag set request, whether the event is notification of access history such as reading of data is determined (S105). If so, registration to the update history is only executed (S106). If the event is not the notification of access history, whether the event is the request to clear the Dirty flag of the metadata is determined (S107). If not, the process is terminated, and if so, the synchronization process of the data content is performed with the slave server 2 only on the relevant file (S108), and then the Dirty flag of the metadata is cleared (s109).
A flowchart of file input/output including synchronous management of the slave server 2 is shown in
When determined that the command is the data write request command (YES in determination of S204), the content of the update flag of the relevant file is inquired to the master server 1 (S205), and the presence of the Dirty flag is checked (S206). If the flag is being set, the request for clear is issued to the master server 1 (S207). If the dirty flag clear in the master server 1 is not successful, error process such as notifying error to the client 4 is performed (S209), but if not, the write request from the client 4 is executed on the file of the local disc (S210), and the setting of the Dirty flag is again requested to the master server 1 (S211). In cases of command processes other than the data readout request and the data write request (NO in determination of S204), the process is executed in accordance with each command (S212), and the file input/output process and the related process are completed.
A method of determining the update interval of the update interval calculation unit 18 will be described using
First, a simple example is shown in
In the case of file access in which update of data occurs periodically, the write frequency distribution of the next period can be estimated from the update interval obtained from the write frequency distributions of a plurality of times, and the synchronization time at the beginning of the blank period. The time after a lapse of the update interval from the synchronization time can be set as the next scheduled time for synchronization.
If the update process of the data is executed based on a determined processing routine, the processing content of the write access is configured by the write process of substantially a constant number of times and sizes. In this case, the time necessary for the individual data update process including a plurality of writes and readouts can be relatively easily estimated. That is, when the rewrite operation is increased, the duration period is estimated as follows based on the update interval and the blank period in the update frequency distribution:
(write duration period)=(update interval)−(blank period)
Thus, based on such period, the access converging time can be estimated at the time of write occurrence. For instance, at the time point when the rewrite access of the data starts to occur in the write frequency distribution, estimation can be made that the write access is to be converged after the above described write duration period, the next execution for synchronization can be scheduled at the relevant time. As a result, the synchronization operation can be effectively executed at the time point the rewrite access is settled.
An update example of the file involving write from a plurality of clients is shown in
The update properties of each file obtained in the update interval calculation unit 18 described above are held in the external storage device 19 in a form of the update interval list shown in
The structure of the update directory will be described using
Specifically, the list of
The effects of the present exemplary embodiment will be described. If synchronization of data is performed through the network 3 every time the file stored in the storage device 28 of the slave server is updated, the load on the network 3 becomes larger. For instance, data is to be synchronized again even when the relevant file is updated immediately after synchronization is performed, and thus network communication for, in the worst case, the number of updates becomes necessary. Actually, however, it may be sufficient in many cases to update the data after the update of the data is completely finished (see
According to the present exemplary embodiment, the update interval calculation unit 18 calculates the blank period and the update interval based on the access history on the file, and based on such information, determines the time closest to the beginning of the blank period as the update time while avoiding the period in which update is frequently performed. The file management unit 13 instructs the execution of synchronization at such time to the input/output control unit. Thus, the load on the network 3 in synchronizing the data can be effectively reduced.
In the present exemplary embodiment, the synchronization time is determined based on the update history as described above. The update period calculation unit 18 records the file to be performed with synchronization at each synchronization time in the update directory, and specifies the same. The file management unit 13 references the update directory when receiving notification of arrival of the update time, acquires the file ID of the file to be synchronized, and executes update. That is, the master server 1 does not need to check the update state of the normal directory etc. at a timing synchronization is unnecessary.
Thus, the load and the power consumption on the external storage device 19 can be reduced as a result.
A system for performing management of synchronization based only on the update interval is effective in the web server and the like in which the files are periodically updated. However, it has been difficult to manage the synchronization timing based only on the update interval in accesses in block units in which one part of the file is sequentially updated as in the database file.
In the present exemplary embodiment, the update interval calculation means 18 predicts the zone (blank period) in which access is not made based on the access history, and calculates the synchronization time. Thus, the update timing of the data can be effectively generated even on the file access in which update in block units frequently occur, and a distributed file management of low load in a general file service other than the web service can be realized.
As an exemplary advantage according to the invention, the load on the hardware due to file synchronization can be reduced.
A second exemplary embodiment of the present invention will now be described. The second exemplary relates to a distributed file management system, similar to the first exemplary embodiment. The overall configuration and the configuration of the master server 1 and slave server 2 are respectively the same as shown in
The operation of the second exemplary embodiment will now be described with reference to the flowcharts of
The second exemplary embodiment differs from the first exemplary embodiment in that synchronization of data is executed by the slave server 2.
First,
Since the event of time does not occur, the process shown in the flowchart of
First, whether the event is the data update notification of a specific file or the Dirty flag set request is determined (S301), where if so, the Dirty flag is set in the metadata (S302), and the command processing content is recorded in the update notification list of the data (S304). In the determination of the command type, whether the event is notification of access history such as reading of data is determined (S303), where if so, registration to the update history is only executed (S304). Similarly, whether the event is the request to clear the Dirty flag of the metadata is determined (S305), where if not, the process is terminated, but if so, the synchronization process of the data content is performed with the slave server 2 only on the relevant file (S306), and then the Dirty flag of the metadata is cleared (S307).
A flowchart of file input/output including synchronous management of the slave server 1 is shown in
First, the update directory is acquired from the mater server 1, and determination on the necessity of the synchronization operation is performed based upon the content (S400). Similar to the case of the master server 1 in the first exemplary embodiment, the necessity of synchronization process includes determining whether the process for performing the synchronization operation of each file at the event occurrence time is registered in the update directory, and synchronizing the data with the master server 1 if the process is registered. In this case, execution is made only on the files to which the slave server 2 pertains.
Determination is made on whether the command is a data readout request command (S403), and if the command is the data readout command, the access history of readout and occurrence of the readout event is notified to the master server (S404). Subsequently, readout is executed on the copied files of the external storage device 28 of the slave server 2.
When determined that the command is the data write request command (S406), the content of the update flag of the relevant file is inquired to the master server 1 (S407), and the presence of the Dirty flag is checked (S408). If the flag is set, the request for clear is issued to the master server 1 (S409). If the dirty flag clear in the master server 1 is not successful, error process such as notifying error to the client 4 is performed (S411), but if it is successful, the write request from the client 4 is executed on the file of the local disc (S412), and the setting of the Dirty flag is again requested to the master server 1. In cases of command processes other than the above, the process is executed for each command (S414), and the file input/output process and the related process are completed.
In the second exemplary embodiment, a case of performing file synchronization between the master server 1 and the slave server 2 has been described, but file synchronization may be performed between the master server 1 and the client 4. In this case, the client 4 includes components similar to the network interface 22, the control unit 21, the file management unit 23, the time measurement unit 24, the area management unit 25, the update history storage unit 26, and the external storage device 28 of
Effects similar to the first exemplary embodiment are also obtained with the second exemplary embodiment.
The processing load in the management of the synchronization timing is avoided from concentrating on the master server 1 side by managing the update time on the slave server 2 side, and since such management can be performed on the slave server 2 side, the load can be distributed.
A third exemplary embodiment of the present invention will now be described. In the first and the second exemplary embodiments, file synchronization is performed between two or more devices through the network, but in the third exemplary embodiment, synchronization is performed between two storage media in one device. Assume a case where periodic processing of a file is necessary in a stand alone device such as a PC. In such device, the data on the disc is referenced and the updated data is moved or copied when checking for improper data such as computer virus in the data, or when backing up data in the PC. Here, an exemplary embodiment of taking backups of the file stored in the external storage device connected to the PC in an exchange storage medium will be described.
The PC includes a PC controller 30 for managing the entire file input/output control, a first external storage device 39 for storing the file, a second external storage device 40, and an instructing means 42. The PC controller 10 includes a control unit 31, a file management unit 32, an input/output control unit 33, an I/O interface 34, a time measurement unit 35, an area management unit 36, an update history storage unit 37, and an update interval calculation unit 38, and has a function similar to the master controller 10 of
The control unit 31 executes the process corresponding to the command input by the instructing means 42. Specifically, the control unit 31 interprets the command content of the input/output request made through the instructing means 42. The control unit 31 determines the necessity of input/output of data according to the request content, and makes a file input/output request to the file management unit 32 when determining that the input/output of the data is actually necessary.
The control unit 31 also controls the update interval calculation unit 38 including a function of determining the update interval of the file based on the input/output history information of the file acquired from the file management unit 33. The control unit 31 records the update interval in the external storage device 39 via the file management unit 32.
Furthermore, when making a backup of the data recorded on the first external storage device 39, the control unit 31 determines the necessity of update of the data based on the update directory of
When the input/output request of the file is made from the instructing means 42, the time measurement unit 35 records the update history indicating the relevant time.
The area management unit 36 manages the storage area of the first external storage device 39.
The file management unit 32 performs arrangement management of the data of the first external storage device 39. Specifically, the file management unit 32 calculates the recorded position and the like of the actual data using the area management unit 36. Furthermore, the time when the input/output request is made is also measured in the time measurement unit 35, and the history of input/output request for every file is created. The file management unit 32 records the created history in the update history storage unit 37 or records the created history as an update history list in the first external storage device 39.
The input/output control unit 33 executes input/output of data with respect to the first external storage device 39 based on the instruction of the file management unit 32.
The update interval calculation unit 38 calculates the update interval and the blank period of the file for every file stored in the first external storage device 39 based on the history. Similar to the synchronization interval calculation unit 18 of the first embodiment, the synchronization interval calculation unit 38 creates the update directory (see
The first external storage device 39 is a magnetic disc device for example, and performs read and write of information with respect to the storage medium.
The second external storage device 40 is an optical disc device for example, and performs read and write with respect to the exchange storage medium 41.
The exchange storage medium 41 is a so-called removable media, and is used by being set in the second external storage device 40 when performing input/output of data. CD-RW (Compact Disc-Rewritable), DVD-RW (Digital Versatile Disc-Rewritable), MO (Magneto-Optical Disc), and the like can be used for the exchange storage medium 40.
The instructing means 42 is an input device such as mouse and keyboard, where the user operates the instructing means 42 to give instructions to the PC of the present exemplary embodiment.
The operation of the present exemplary embodiment will now be described using the flowchart of
The PC controls input/output of files based on external instruction, but also accepts backup process request. Thus, the input/output operation includes determining whether or not the command is a backup command (S501), and executing a normal file input/output operation if determined as not a backup operation (S504).
In the case of being determined as the backup command, determination is made on whether or not the relevant file is the file registered in the update directory with reference to the update directory (S502). If the file is the registered file, the backup is executed (S503). If the file is not the registered file, no process is executed.
As described above, the wear of the exchange storage medium 41 can be suppressed with limiting the number of writes to the exchange storage medium 41 at a requisite minimum by determining the necessity of backup of the file based on the update directory.
While the invention has been particularly shown and described with reference to exemplary embodiments thereof, the invention is not limited to these embodiments. It will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the claims.
Number | Date | Country | Kind |
---|---|---|---|
2007-137220 | May 2007 | JP | national |