1. Technical Field
The embodiments of the present disclosure relate to file management technology, and particularly to a file saving system and method.
2. Description of Related Art
A data center is a facility which houses a large number of computers and stores huge amounts of data. By using cloud computing, the files are uploaded into a data center. However, at present, a file stored in the data center may include one or more portions of the same data, which waste a lot of storage space. Therefore, there is room for improvement in the art.
The disclosure is illustrated by way of examples and not by way of limitation in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that references to “an” or “one” embodiment in this disclosure are not necessarily to the same embodiment, and such references mean “at least one.”
In general, the word “module”, as used herein, refers to logic embodied in hardware or firmware, or to a collection of software instructions, written in a programming language, such as, Java, C, or assembly. One or more software instructions in the modules may be embedded in firmware, such as in an EPROM. The modules described herein may be implemented as either software and/or hardware modules and may be stored in any type of non-transitory computer-readable medium or other storage device. Some non-limiting examples of non-transitory computer-readable media include CDs, DVDs, BLU-RAY, flash memory, and hard disk drives.
In one embodiment, the client 1 divides each file into two or more data blocks, and uploads the two or more data blocks of the file into the assignment server 2. The assignment server 2 sends the two or more data blocks of the file to the storage server 3. Additionally, before uploading the two or more data blocks of the file into the assignment server 2, the client 1 further calculates a hash value of each data block and saves the hash value of each data block into a hash list. The client 1 also includes information of the file, the information of the file includes a name of the file and an attribute of the file. Furthermore, each file corresponds to a hash list. In other words, the data blocks of the file are saved into the hash list corresponding to the file. Each data block includes a name. The name of each data block is generated in order and also saved into the hash list. In detail, the name of each data block is generated in an alphabetical order (e.g., “a,” “b,” “c,” “d,” “d,” or “f”) or in a numerical order (e.g., “1,” “2,” “3,” or “4”). For example, the file is divided into three data blocks, namely a data block “a,” a data block “b,” and a data block “c.” Each data block may include a storage capacity predetermined by a user, such as 16 KB, 32 KB, 64 KB, 128 KB, or 256 KB. For example, if the storage capacity is predetermined as 32 KB, the file is divided into a plurality of data blocks, and each data block is 32 KB.
The receiving module 2000 receives a hash list corresponding to a file and information of the file uploaded from the client 1, and saves the hash list corresponding to the file and information of the file into the database 4.
The calculation module 2002 calculates a transfer process usage ratio of each storage server 3 and a remaining storage capacity of each storage server 3. In one embodiment, each storage server 3 includes a transfer process for transferring data. If the transfer process is overloading, the storage server 3 may stop transferring data. The transfer process usage ratio of each storage server 3 indicates a loading degree of the transfer process, which is a percentage. The greater the transfer process usage ratio, the more loading is imposed on the transfer process.
The determination module 2004 determines an available storage server according to the transfer process usage ratio of the storage server and the remaining storage capacity of each storage server 3. A storage server 3 is determined as the available storage server, upon the condition that the transfer process usage ratio of the storage server 3 is not more than a predetermined percentage (e.g., 80%) and the storage server 3 is available to store two or more data blocks.
The removing module 2006 searches for repetitive data blocks according to the hash value of each data block and keeps one repetitive data block by deleting other repetitive data blocks. A data block is determined as the repetitive data block upon the condition that the hash value of the data block is the same as the hash values of other data blocks. For example, if the hash value of the data block “a” is the same as the hash value of the data block “b,” the data block “a” and data block “b” are determined as repetitive blocks. The removing module 2006 may delete data block “a” from the client 1 and keep the data block “b” in the client 1.
The assignment module 2008 assigns a storage space for storing each data block in the available storage server and obtains a pointer corresponding to the data block that points to the storage space. In one embodiment, each data block corresponds to a pointer that points to the storage space. In other words, a user uses the pointer to find the storage space. The storage space may store one or more data blocks in the assignment server 2. Furthermore, even the repetitive data blocks are deleted in the client 1 for the reason of repetitiveness, each repetitive data block is also assigned to one pointer, and the pointer corresponding to the repetitive data block is the same as the pointer corresponding to the data block in the assignment server 2, wherein the repetitive data block is the same as the data block in the assignment server 2.
The notification module 2010 uploads each data block from the client 1 into the storage space corresponding to the data block according to the pointer corresponding to the data block, and sends the pointer of each data block to the client 1. The pointer of each data block is received from the assignment server 2 and displayed in a display device of the client 1.
In step S100, the client 1 divides a file into two or more data blocks, saves a name of each data block and a hash value of each data block into a hash list.
In step S102, the client 1 uploads information of the file into an assignment server 2 and uploads a hash list into a database 4. The receiving module 2000 receives the information of the file and the hash list from the client 1.
In step S104, the calculation module 2002 calculates a transfer process usage ratio of each storage server and a remaining storage capacity of each storage server.
In step S106, the determination module 2004 determines an available storage server according to the transfer process usage ratio of the storage server and the remaining storage capacity of each storage server. The storage server is determined as the available storage server, upon the condition that the transfer process usage ratio of the storage server is less or equal to a predetermined percentage (e.g., 80%) and the storage server 3 is available to store two or more data blocks.
In step S108, the removing module 2006 searches for repetitive data blocks according to the hash value of each data block and keeps one repetitive data block by deleting other repetitive data blocks. For example, if the hash value of the data block “a” is the same as the hash value of the data block “b,” the data block “a” and data block “b” are determined as repetitive blocks. The removing module 2006 may delete data block “a” from the client 1 and keep the data block “b” in the client 1.
In step S110, the assignment module 2008 assigns a storage space for storing each data block in the available storage server and obtains a pointer corresponding to the data block that points to the storage space. In one embodiment, each data block corresponds to a pointer that points to the storage space. In other words, a user uses the pointer to find the storage space. The storage space may store one or more data blocks in the assignment server 2. Furthermore, even the repetitive data blocks are deleted in the client 1, however, each repetitive data block is also assigned to one pointer, and the pointer corresponding to the repetitive data block is the same as the pointer corresponding to the data block in the assignment server 2, wherein the repetitive data block is the same as the data block in the assignment server 2.
In step S112, the available storage server 3 receives each data block from the assignment server 2.
In step S114, the available storage server 3 determines if each data block is correct. In one embodiment, when the available storage server receives the data blocks from the assignment server 2, the available storage server 3 also calculates the hash value of each data block, and verifies the existence of the hash value of each data block in the hash list. If the hash value of each data block does exist in the hash list, the data block is determined to be correct, the procedure goes to step S116, and the available storage server 3 saves each data block into the storage space corresponding to the data block. If the hash value of a data block does not exist in the hash list, the data block is determined not to be correct, and the procedure goes to step S118.
In step S116, the available storage server saves each data block into the storage space corresponding to the data block according to the pointer corresponding to the data block.
In step S118, the available storage server notifies the client 1 to upload the file again. In one embodiment, the available storage server rejects the client for uploading the data blocks and notifies the client 1 that the data blocks are rejected for uploading.
In step S200, the client 1 obtains a hash value of each data block of a file from a hash list stored in a database 4.
In step S202, the client 1 downloads each data block of the file according to a pointer of each data block from the available storage server.
In step S204, the download module 2012 calculates a hash value of each downloaded data block and determines if the hash value of each downloaded data block exists in the hash list stored in the database 4. In one embodiment, if the calculated hash value of each downloaded data block exists in the database 4, the procedure goes to step S206. Otherwise, if one calculated hash value of the downloaded data block does not exist in the hash list, the procedure returns to step S200.
In step S206, the client 1 saves all downloaded data blocks into a temporary storage space of the client 1.
In step S208, the client 1 combines all downloaded data blocks to generate or regenerate the file in the temporary storage space of the client 1 according to the name of each downloaded data block. The temporary storage space of the client 1 may be, but is not limited to, a random access memory (RAM). In one embodiment, due to the name of each downloaded data block being generated in order, the client 1 combines all downloaded data blocks to generate the file in the name order of each downloaded data block.
In step S210, the client 1 calculates the hash value of the generated file and determines if the calculated hash value of the generated file exists in the hash list stored in the database 4. If the calculated hash value of the generated file exists in the hash list, the client 1 displays success message (e.g., displaying “SUCCESS”) in a display device of the client 1, and the procedure goes to step S212. If the calculated hash value of the generated file does not exist in the hash list, the client 1 displays a fail message (e.g., displaying “FAIL”) in the display device of the client 1, and the procedure returns to S200.
In step S212, the client 1 sends the generated file to the display device of the client 1.
Although certain inventive embodiments of the present disclosure have been specifically described, the present disclosure is not to be construed as being limited thereto. Various changes or modifications may be made to the present disclosure without departing from the scope and spirit of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
201210533979X | Dec 2012 | CN | national |