1. Technical Field
Embodiments of the present disclosure relate to file management systems and methods, and more particularly to a distributed storage system and a file synchronization method.
2. Description of related art
File synchronization is required by a distributed storage system. In one synchronization mechanism, a metadata server may be used to maintain all files stored within the distributed storage system. If a file stored in the distributed storage system is deleted or corrupted, the metadata file replaces or repairs the file using data stored in the metadata. This synchronization mechanism can repair destroyed files in a short time, however, with an increase of the number of files stored within the distributed storage system, data stored in the metadata also increases, which may decrease synchronization speed of the file synchronization and increase the likelihood of errors concerning data in the metadata server.
The present disclosure, including the accompanying drawings, is illustrated by way of examples and not by way of limitation. It should be noted that references to “an” or “one” embodiment in this disclosure are not necessarily to the same embodiment, and such references mean “at least one.”
In general, the word “module”, as used herein, refers to logic embodied in hardware or firmware, or to a collection of software instructions, written in a programming language. One or more software instructions in the modules may be embedded in firmware, such as in an erasable programmable read only memory (EPROM). The modules described herein may be implemented as either software and/or hardware modules and may be stored in any type of non-transitory computer-readable medium or other storage device. Some non-limiting examples of non-transitory computer-readable media include CDs, DVDs, BLU-RAY, flash memory, and hard disk drives.
The file synchronization system 50 designates different storage paths to the same file, and stores the same file into different storage units in the distributed storage system 100 according to the designated storage paths. For example, a file A may be stored into the storage units 20, 30, and 40 as files 21, 31, and 41 respectively. The file synchronization system 50 further creates a system log 11 in the access entry 10 and creates a unit log in each storage unit, such as a unit log 22 in the storage unit 20, a unit log 32 in the storage unit 30, and a unit log 42 in the storage unit 40. The system log 11 records information of all files stored in the distributed storage system 100, and the unit log in each storage unit records information of all files stored in the storage unit. For example, the unit log 22 in the storage unit 20 records information of all files stored in the storage unit 20.
When a file (such as the file 21) stored in a first storage unit (such as the storage unit 20) is lost or destroyed or corrupted, the file synchronization system 50 determines the file to be repaired (such as the file 21) according to information stored within the system log 11 and the unit log of the storage unit, determines a second storage unit (such as the storage unit 30) that stores the same file (such as the file 31), and repairs the file to be repaired (such as the file 21) by copying the same file (such as the file 31) from the second storage unit to the first storage unit.
In step S301, the access entry 10 receives a file sent from the client 200. For example, the file with the name of “volume1” is received.
In step S303, the setting module 51 designates multiple storage paths to the file in the distributed storage system 100. For example, three storage paths “szunit01,” “szunit02,” and “szunit03” may be designated to the file “volume1.”
In step S305, the storing module 52 stores the file into one or more storage units corresponding to the multiple storage paths in the distributed storage system 100. For example, if the storage paths “szunit01,” “szunit02,” and “szunit03” respectively correspond to the storage units 20, 30, and 40, the file “volume1” is stored into the storage units 20, 30, and 40 as file 21, file 31, and file 41 respectively.
In step S307, the logging module 53 creates a system log 11 in the access entry 10 and creates a unit log in each storage unit, such as a unit log 22 in the storage unit 20, a unit log 32 in the storage unit 30, and a unit log 42 in the storage unit 40. The system log 11 records information of all files stored in the distributed storage system 100, and the unit log records information of all files stored in the storage unit. For example, the unit log 22 in the storage unit 20 records information of all files stored in the storage unit 20. Information of each file includes a name of the file, a volume of the file, creation time of the file, time when the file was last accessed, time when the file was last backed up, and a storage path of the file. The system log 11 includes all the information recorded in all of the unit logs.
In step S309, the collecting module 54 collects the unit logs stored in the storage units, and stores the collected unit logs in a preset storage location of the distributed storage system 100. Depending on the embodiment, the collecting operation may be periodically or aperiodically. In one embodiment, the preset storage location is storage space independent from the storage units, so that the collected unit logs are isolated and safe from damage to the storage units.
In step S311, the reading module 55 tries to read a file from a first storage unit, such as the file 21 from the storage unit 20, and determines if the file can be successfully read. In one embodiment, the reading operation may be enabled in response to an access request sent from the client 200, or in response to a request to check data security initiated by the distributed storage system 100. If the file can be successfully read from the first storage unit, the file is indicated to be normal (e.g., not corrupted and not deleted), and the procedure ends. Otherwise, if the file cannot be read from the first storage unit, the file is indicated to be corrupted or has been deleted, the procedure goes to step 5313.
In step S313, the repairing module 56 compares the collected unit logs and the system log 11, to determine a second storage unit that stores the same file. For example, if the file 21 is destroyed, by comparing the collected unit logs 22, 32, 42 and the system log 11, a determination may be made that the file 21 is a file having the name “volume1,” and that the files 31 and 41 are the same file as having the same file name “volume1” with the file 21.
In step S315, the repairing module 56 repairs the file in the first storage unit by copying the file from the second storage unit to the first storage unit. For example, the repairing module 56 repairs the file 21 by copying the file 31 from the storage unit 30 to the storage unit 20, or by copying the file 41 from the storage unit 40 to the storage unit 20.
The above embodiments store the same file in different storage paths of the distributed storage system and record file information in logs, so that a destroyed file can be quickly determined according to the logs and be repaired from the duplicate files.
Although certain disclosed embodiments of the present disclosure have been specifically described, the present disclosure is not to be construed as being limited thereto. Various changes or modifications may be made to the present disclosure without departing from the scope and spirit of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
201210047314.8 | Feb 2012 | CN | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CN12/78808 | 7/18/2012 | WO | 00 | 2/1/2013 |