Technical Field
The disclosure is related to distributed storage, and more particularly, to a system and a method for distributed storage based on regenerating codes.
Related Art
A centralized network storage system is configured for storing all data in a storage server. The storage server itself becomes a limit of the performance of the network storage system, and keys for reliability and safety. Sometimes, the centralized network storage system cannot satisfy needs for massive storage solutions.
A distributed network storage system is another storage solution where data are distributed and stored on plural independent storage servers (also be referred as storage-nodes). Such a storage solution is scalable for increasing the number of storage servers for sharing the storage loadings, and all stored data can be manageable with location information by a location service device. Therefore, the distributed network storage system is not only scalable, but also has benefits of reliability, availability and accessibility.
In order to further increase the reliability of the distributed network storage system, regenerating codes are introduced to rebuild lost encoded fragments. The regenerating code is one of the erasure codes for error correction information theory. A recipient is able to detect and correct errors by the erasure codes when errors are encountered during the data transmission in networks.
Upon failure of an individual node, the regenerating codes repair the failed node by a replacement node. The replacement node needs to connect d nodes of the remaining nodes in the network, and download information with a size of P from each of these d nodes. Thus, the bandwidth of repair for regenerating codes is d*P. The bandwidth for rebuilding optimally trade models for regenerating codes includes a Minimum-Storage Regenerating (MSR) and a Minimum-Bandwidth Regenerating (MBR).
However, since the number of the storage-nodes in the conventional distributed network storage system is fixed, and the redundancy of the conventional distributed network storage system cannot be adjusted based on the characteristic of the stored data. Therefore, data transmission delay may occur when the data has been rapidly accessed.
These and other needs are addressed by the exemplary embodiments, in which one approach provides systems and methods for regenerating codes for a distributed storage system that is able to additionally assign extension storage-nodes when the encoded data has been transmitted to each one of the nodes.
According to an embodiment of the present disclosure, a system for a distributed storage system based on regenerating codes, in which encoded data is distributed to a plurality of storage-nodes and then extended to at least one extension storage-node, comprises a data source and multiple storage-nodes. The data source comprises a control module and an encoder. The control module segments data into multiple fragments. The encoder generates multiple data stripes from the fragments, where each data stripe is generated according a corresponding encoding vector, and each of the encoding vectors are linearly independent to each other. The data source transmits the data stripes to the corresponding storage-nodes according to the encoding vectors. The data source receives an extension command that is configured for extending a selected storage-node, and generates at least one extension storage-node with at least two other randomly selected storage-nodes whereby to construct a linear combination with the data stripes and encoding vectors of the selected storage-nodes.
According to another embodiment of the present invention, a method for distributed storage based on regenerating codes comprises steps of segmenting data into multiple fragments; encoding the fragments into a data stripe according to an encoding vector; transmitting and storing the data stripe and the corresponding encoding vector to a storage-node; selecting one of the storage-nodes as a specified storage-node when an extension command is received; and selecting a set of other storage-nodes, and generating an extension storage-node according to the selected storage-nodes, the encoding vectors and the data stripe.
Wherein the extension storage-node is homogeneous to the existing storage-nodes, in the sense that the extension command can be configured repeatedly using a fixed number of arbitrary existing nodes, regardless if they are generated by the data source, or previously extended from other nodes.
Compared with the regenerating codes system in the art, the present invention has at least the following advantages:
(1) The regenerating codes system in the art use fixed numbers for storage-nodes. The present invention has advantages of lowering the bandwidth, a higher encoding efficiency, a low computing cost and being able to adapt to a highly condition changes of the dynamic network; and
(2) The present invention can be applied to block storage, distribution and encoding modules of a distributed storage system. The corresponding storage system is more suitable for the system in which the access frequency of data is highly dynamic.
Still other aspects, features, and advantages of the exemplary embodiments are readily apparent from the following detailed description, simply by illustrating a number of particular embodiments and implementations, including the best mode contemplated for carrying out the exemplary embodiments. The exemplary embodiments are also capable of other and different embodiments, and their several details can be modified in various obvious respects, all without departing from the spirit and scope of the exemplary embodiments. Accordingly, the drawings and description are to be regarded as illustrative, and not as restrictive.
The present invention will become more fully understood from the detailed description given herein below for illustration only, and thus not limitative of the present invention, wherein:
Referring to
As shown in
The data source 110 comprises a control module 111 and an encoder 112. The control module 111 segments a data into multiple fragments. The encoder 112 has a vector matrix. The vector matrix has multiple encoding vectors. The encoder 112 selects one of the encoding vectors from the vector matrix. The encoder 112 generates a data stripe of the corresponding fragment according to the selected encoding vector, and each of the encoding vectors is non-linear to each other. Multiple data stripes form a main striping, and each data stripe has at least one fragment.
The data source 110 transmits the data stripes to the corresponding storage-nodes 120 according to the different encoding vectors. The storage-nodes 120 are configured for storing the data stripes and may be a hard disk, a Solid State Disk (SSD) or a flash storage device.
As shown in
In one embodiment, the size of input data is defined as “B”, “d” is the number of the storage-nodes 120 that is needed for configuring an extension storage-node, and “a” is defined as the number of fragments contained in one single stripe.
For example, if B=4, a=2, d=3, and each storage-node 120 is configured to store 1 data stripe. That is, a data is segmented into 4 fragments, each storage-node 120 is allowed to store 2 fragments, and 3 storage-nodes 120 are required for generating an extension storage-node.
With reference to
S210: segmenting data into multiple fragments;
S220: encoding the fragments into a data stripe according to an encoding vector;
S230: transmitting and storing the data stripe and the corresponding encoding vector to a storage-node;
S240: selecting one of the storage-nodes as a specified storage-node when an extension command is received; and
S250: selecting two of the other storage-nodes to generate an extension storage-node based on the selected storage-nodes, the encoding vectors and the data stripe.
Assuming there are k storage-nodes 120, each storage-node is labeled as nodei, wherein i≦k. As above mentioned, B=4, a=2 and d=3, for example, the data has 4 fragments (u11, u12, u13, and u14). In this embodiment, each storage-node is able to store 1 data stripe, and each data stripe has two fragments. As shown in
from two fragments.
Wherein pit is the encoding vector of U1 vector of ith storage-node, qit is the encoding vector of U2 vector of ith storage-node, rit is the encoding vector for compensating fragments of ith storage-node. In addition, any of two encoding vectors {pit}i=1n, {qit}i=1n are non-linear.
The data source 110 then transmits the encoded data stripe and the encoding vector to the corresponding storage-node. The storage-node stores the data stripe and the encoding vector. When the data collector 130 detects that one of the storage-nodes is disabled (failed), the data collector 130 recovers the data of the disabled storage-node based on other existing storage-nodes and data stripes. With further reference to
According to the encoding vectors of nodei, nodej, a 4×4 matrix is determined from the two data stripes as following:
When the 4×4 matrix is a non-singular matrix, the 4 fragments (u11, u12, u13, and u14) is determined by using linear substitutions. Since two encoding vectors {pit}i=1n, {qit}i=1n are non-linear, the two diagonally 2×2 blocks of the 4×4 matrix are non-singular matrix. The value rit configured for recovering the encoding data does not have linear relationship, and thus the value can be given randomly. Accordingly, the data collector is able to retrieve information of the disabled storage-node based on the aforementioned calculations.
The present invention is not only recovering the data from the disabled storage-node, but also extends a specified storage-node. The extension storage-node can be configured to clone the information from the specified storage-node through other storage-nodes. The data stripe of the extension storage-node is homogeneous to the data stripe of the selected storage-node.
Accordingly, since the extension storage-node is homogeneous to the existing storage-nodes. The extension command can be configured repeatedly using a fixed number of arbitrary existing nodes, regardless if they are generated by the data source, or previously extended from other nodes.
Referring to
The equations of (3) and (4) can be determined from (1), which are
Since any two vectors of {qit}i=1n are non-linear related, which:
in combination (5) into (4) to get:
and it can be rewritten as:
[PΛ+R](−Q−1k3q3)=pi−k3(λ3p3+r3) (7)
Λ is a 2×2 diagonal matrix where P=[p1 p2], Q=[q1 q2] and R=[r1 r2]. The equation of (7) can further simply into:
PΛQ
−1
k
3
q
3
=k
3(λ3p3+r3)−RQ−1k3q3−pi (8)
ΛQ−1k3q3=P−1(k3(λ3p3+r3)−RQ−1k3q3−pi) (9)
k1, k2, k3 and λ1, λ2, λ3 can be determined by giving any values to λ3 and k3 is not equal to zero. It is also noted that when solving equations, the vector of Q1q3t will not have “0” element, otherwise it means that at least two vectors of {qit}i=1n are linear.
Moreover, equation (11) can be solved by giving known values of k1, k2, k3, λ1, λ2, λ3, l1, l2 and l3.
Accordingly, the extension storage-node D is able to store/clone the fragment and corresponding vector which were previous stored in other storage-node.
While the exemplary embodiments have been described in connection with a number of embodiments and implementations, the exemplary embodiments are not so limited but cover various obvious modifications and equivalent arrangements, which fall within the purview of the appended claims. Although features of the exemplary embodiments are expressed in certain combinations among the claims, it is contemplated that these features can be arranged in any combination and order.
Number | Date | Country | Kind |
---|---|---|---|
201610116376.8 | Mar 2016 | CN | national |