1. Field of the Invention
The present invention relates to a system for distributing data by dividing it into plural pieces of partial data.
2. Description of the Related Art
During data transfer from a transmission device to a reception device, a part of the data can be lost due to change in reception conditions, communication congestion, or the like. In such a case, the original data can be restored by retransmitting the lost data if it is possible to send some message from the reception device to the transmission device. However, for example, in a case of multicast, if it is not possible to send a message from a reception device to a transmission device, it is difficult to retransmit only lost data. In view of the above, several data transfer methods have been proposed.
For example, Japanese Unexamined Patent Application Publication No. 5-235979 has disclosed an asynchronous transfer mode communication method for increasing tolerance to subsequent lost data. Moreover, Japanese Unexamined Patent Application Publication No. 2005-223683 has disclosed a transmission/reception system for generating parity data using logic operation of exclusive OR (XOR).
Further, several data transfer methods have been proposed for the configuration in which it is not possible to send a message from a reception device to a transmission device.
For example, a data transfer method referred to as data carousel has been provided. The data transfer method is used in digital broadcasting. In the method, a transmission device divides data to be transmitted into a plurality of pieces and repeatedly transmits the divided data like a carousel. In the data carousel method, original data D is divided into a plurality of pieces (N pieces) of partial data arrays as shown below, and the partial data is repeatedly transmitted in several times.
According to the method, as in the above-described example, even if the reception data D2 is lost, the same data is transmitted again in the next cycle. Accordingly, it is possible to restore the original data. However, the data amount to be transmitted increases. Moreover, even if the next cycle is transmitted, the same portion may be lost. In such a case, it may be necessary to wait for another cycle.
Moreover, a data transfer method is provided in which redundant data referred to as parity is added to data to be transmitted in advance and the data is transmitted. In the data transfer method, as shown below, if original data D is represented by n pieces of partial data arrays, operation of exclusive OR (XOR) is performed on each of certain pieces,(m pieces) of partial data. The result of the XOR operation is transmitted together with the original data as the parity data. Hereinafter, “+” denotes an XOR operator.
Original data D=(D1, D2, . . . , Dn}
Transmission data D′={D1, D2, . . . , Dm, P1, Dm+1, . . . , Dn, Pk}
Parity data Pk=Dk+Dk+1+ . . . +Dk+m−1
According to the method, even if one of the partial data is lost during the transmission, the lost data can be restored by performing the XOR operation on the remaining partial data and the parity data. An example of the restoration is shown below.
Original data D={D1, D2, D3}
Transmission data D′=(D1, D2, D3, P1}
Parity data P1=D1+D2+D3
Reception data D″={D1, D2, P1} (D3 is lost)
Restoration of the lost data: D3=D1+D2+P1
In the restoration, it is possible to restore the loss of only one data per one parity data, however, it is not possible to restore the data if a plurality of pieces of data is lost. According to Japanese Unexamined Patent Application Publication No. 5-235979, the tolerance to the subsequent lost data is increased by performing the XOR operation on not subsequent partial data but at every n−1 pieces of partial data. However, the tolerance is decreased if the data is randomly lost.
In view of the above, a data transfer method capable of generating a plurality of parity, and restoring loss of one or more pieces of data has been proposed. In the method, an error correction code such as a Reed-Solomon code for generating parity is used. According to the Reed-Solomon code, as shown below, original data D is divided into a plurality of pieces (n pieces) of partial data, and k pieces of linear independent vectors α11, α21, α31, . . . αn1 are selected from a finite field referred to as a Galois field to generate parity data.
Original data D={D1, D2, D3, . . . , Dn}
Transmission data D′={C1, C2, C3, . . . , Cm}
Parity data Ck=α11·D1+α21·D2+ . . . +n1·Dn
In restoration of the data, if n pieces of partial data has been acquired, the original data can be obtained by forming a matrix A of n×n=(αij) (1≦i, j≦n) from corresponding n pieces of vectors, and multiplying by an inverse matrix of A on the acquired partial data. That is, the original data is restored as follows.
In the equation, a calculation of an amount of O(n3) is o be performed to obtain the inverse matrix of n×n. Accordingly, if the value of n is increased, a very long calculation time is required.
In consideration of the above, as discussed in Japanese Unexamined Patent Application Publication No. 2005-223683, the method of generating parity data using only XOR to reduce an calculation amount has been provided, in the case of one-to-one communication.
So far, the case of one transmission device has been described. However, if the number of the transmission device is only one, the transmission device is overloaded. Moreover, if a communication speed of the reception device is faster than that of the transmission device side, the speed of the transmission device becomes a bottleneck, and the data may not be transferred at a speed expected at the reception device. In such a case, it is possible to distribute the load of the transmission device by providing a plurality of transmission devices and allowing the reception device to receive data from any of the transmission devices.
If the above-described data transfer method is extended to the case where the plurality of transmission devices is provided, the following two methods can be applied.
Data transfer method A: All transmission devices store original data and transmit the data using the above method
Data transfer method B: Original data is divided into a plurality of pieces of data to form separated data and each transmission device transmits the separated data respectively
According to the data transfer method A, as the number of the transmission devices increases, the possibility to restore the original data also increases. However, since each transmission device has the same data, the amount of wasteful data stored in the transmission devices and the amount of total encoded data increase.
According to the data transfer method B, it is possible to reduce the capacity of storage devices in the transmission devices. However, in a case where the partial data is not completely received from one of the transmission devices, the original data may not be restored.
For example, as shown in
Moreover, for hard disk drives (HDDs) or data communication, error correction methods using a low density parity check (LDPC) matrix have been proposed. However, in the error correction methods, because repetitive operation is used, huge amount of calculation is required.
The present invention has been made in view of the above, and an object of the present invention is to provide an apparatus, a system, and a computer readable medium capable of efficiently distributing data, transferring the data, and restoring the data.
According to one aspect of the present invention, there is provided a system for dividing an original data into N pieces of partial data and generating L pieces of encoded data on the basis of the N pieces of partial data wherein N is a natural numbers greater than 1 and L is a natural number equal to or greater than N, each of the L pieces of encoded data including a header part and a operation result part, the header part including selection data identifying a subset of the N pieces of partial data on which a predetermined operation is performed, the operation result part being generated by performing the predetermined operation on partial data included in the subset identified by the header data, wherein the L pieces of encoded data are divided into one or more transmission data groups and each of the one or more transmission data groups is transmitted to one of one or more first communication links.
According to one aspect of the present invention, there is provided a system for receiving one or more transmission data groups each transmitted from the transmission device through one of the one or more second communication links, acquiring a plurality of pieces of encoded data from the one or more transmission data groups received to generate a selection matrix and an operation result matrix by using the plurality of pieces of encoded data acquired, the selection matrix including as a row thereof a piece of selection data included in the header part of each of the plurality of pieces of encoded data, the operation result matrix including as a row thereof a piece of data included in the operation result part of each of the plurality of pieces of encoded data, and converting the selection matrix and the operation result matrix into an optimum selection matrix and an optimum operation result matrix, respectively, by performing a predetermined matrix operation on both rows at the same position of the selection matrix and the operation result matrix.
According to the present invention, it is possible to efficiently distribute, transfer, and restore data.
Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
In the present embodiment, a description will be made with respect to a multicast distribution system to which a data transfer system and a data transfer method according to the present invention are applied.
First, a configuration of the multicast distribution system is described.
Now, operation of the multicast distribution system is described.
In the multicast distribution system, the distribution device 1 divides original data into N (an integer number greater than 1) pieces of partial data, encodes the partial data to L (an integer number greater than 1) pieces of encoded data using an encoding matrix, and transmits the encoded data to the transmission devices 2a and 2b through first communication links. The transmission devices 2a and 2b transmit the received encoded data to the reception devices 3a, 3b, and 3c by multicast respectively through second communication links. The reception devices 3a, 3b, and 3c respectively decode the received encoded data and restore the original data.
Now, encoding matrix generation process performed by the distribution device 1 is described.
Then, the encoding matrix generation part 14 sets the number L to be L=N×S.
The encoding matrix generation part 14 sets that h=(S-1)×N+1, and generates encoding candidate matrixes of L rows and N columns that are composed of 0 and 1 using a natural random number such that the number of 1 in each column in an encoding candidate matrix A is to be h (step S12). As a result, the number of 1 in encoded rows becomes smaller than that of 0.
Then, the encoding matrix generation part 14 performs candidate evaluation process (steps S13 to S18) regarding the encoding candidate matrix. First, the encoding matrix generation part 14 sets that k=1 (step S13), takes a combination of k rows out of the rows of the encoding candidate matrix, and generates a partial encoding candidate matrix which is a matrix of (L-k) rows generated by taking k rows out of the encoding candidate matrix (step S14). Then the encoding matrix generation part 14 performs decoding determination process, which will be described later, with respect to the partial encoding candidate matrix, and records the number of executions of the decoding determination process and the number of times the decoding determination process has determined that the encoded data can be decoded by using the partial encoding candidate matrix (step S15). Then, the encoding matrix generation part 14 determines whether all combinations of k rows are taken out. If all of the combinations of k rows are not taken out (No in step S16), the process proceeds to step S14, and the encoding matrix generation part 14 takes out a next combination. If all of the combinations are taken out (Yes in step S16), the encoding matrix generation part 14 determines whether a condition k>L−N is satisfied. If the condition is not satisfied (No in step S17), the encoding matrix generation part 14 increases the number of k by 1 (step S20) and the process proceeds to step S14. If the condition is satisfied (Yes in step S17), the encoding matrix generation part 14 sets that (decoding efficiency)=(the number determined to be decodable)/(the number of executions of decoding determination process), and store in the storage part 12 the encoding candidate matrix that has the highest decoding efficiency among generated encoding candidate matrixes (step S18).
If the k is increased, the number of combinations of k-row becomes huge. Accordingly, the encoding matrix generation part 14 finishes the candidate evaluation process (steps S13 to S18) at an appropriate condition (for example, in a case where k exceeds a predetermined value).
For this purpose, the encoding matrix generation part 14 determines whether the candidate generation process and the candidate evaluation process is repeated a predetermined number of times. For example, this determination can be done by determining whether k exceeds a predetermined value or not. If it is determined that the process is not repeated for the predetermined number of times (No in step S19), the process returns to step S12. If it is determined that the process is repeated for the predetermined number of times (Yes in step S19), the flow is finished. The encoding matrix generation part 14 sets the encoding candidate matrix stored in the storage part 12, that is, the encoding candidate matrix that has the highest decoding efficiency, as the encoding matrix A.
According to the encoding matrix generation process, it is possible to generate the encoding matrix A that has a high decoding efficiency.
It is noted that the encoding matrix A can be generated outside the device in advance and stored in the distribution device 1. In such a case, the distribution device 1 does not need to have the encoding matrix generation part 14.
Now, distribution process performed in the distribution device 1 is described.
Then, the encoding part 13 generates 1 pieces of encoded data by performing encoding process of the original data D stored in the storage part 12 (step S35). The encoded data includes a header part and a data part. The communication part 11 transmits L×S1/S pieces of encoded data out of L pieces of encoded data to the transmission device 2a, and transmits (distributes) L×S2/S pieces of encoded data to the transmission device 2b (step S36). Then, the flow is finished.
In a case where N=2, M=2, S1=4/4=1, and S2=3/4=0.75, values are set to be S=1.75 and L=14. Accordingly, by the distribution process, 8 pieces of partial data is generated from the original data and 14 pieces of encoded data is generated. Moreover, L×S1/S=8 pieces of encoded data is transmitted from the distribution device 1 to the transmission device 2a, and L×S2/S=6 pieces of encoded data is transmitted from the distribution device 1 to the transmission device 2b.
According to the distribution process, depending on performance differences of the transmission devices, it is possible to freely set the distribution size ratio. Accordingly, each transmission device can efficiently transmit the encoded data.
Now, encoding process performed by the distribution device 1 is described.
The encoding part 13 acquires numbers of positions where bits in the selection data are 1, and acquire partial data corresponding to the numbers. Then, the encoding part 13 performs XOR operation on each bit position with respect to the acquired partial data, and forms an operation result part of the encoded data based on a result of the operation (S45).
The encoding part 13 determines whether L pieces of selection data has been acquired (whether L pieces of data parts have been generated) (step S46). If it is determined that the L pieces of selection data has not been acquired (No in step S46), the process returns to step S44 and the encoding part 13 acquires the next selection data. If it is determined that the L pieces of selection data has been acquired (Yes in step S46), the flow is finished.
According to the encoding process, it is possible to generate the L pieces of encoded data that includes the header parts formed by the selection data and the operation result parts formed by the operation result of the partial data. Moreover, since the operation used for the encoding operation is the XOR operation, the loads to the distribution devices are small and the encoding can be performed at a high speed.
Now, transmission process performed by the transmission devices 2a and 2b is described.
The communication parts 21 of the transmission devices 2a and 2b store a plurality of pieces of encoded data (a transmission data group) received from the distribution device 1 in the storage part 22. Then, the communication parts 21 of the transmission devices 2a and 2b transmit the encoded data stored in the storage part 22 to the reception devices 3a, 3b, and 3c.
Decoding process performed by the reception devices 3a, 3b, and 3c is described.
In a stack of the storage part 32, it is possible to stack N pieces of stack data that has an encoded data length. Similarly to the encoded data, the stack data has a header part and an operation result part. The header parts of the N pieces of stack data can be considered as a matrix (called a selection matrix) that uses the header parts of each stack data as rows, and the operation result parts of the n pieces of stack data can be considered as a matrix (called an operation result matrix) that uses the operation result parts of each stack data as rows.
Then, the decoding part 33 determines whether the number of pieces of the stack data is N or more (step S57). If it is determined that the number of pieces of the stack data is less than N (No in step S57), the process returns to step S51, and the decoding part 33 receives the next encoded data. If it is determined that the number of pieces of the stack data is N or more (Yes in step S57), the decoding part 33 determines whether the selection matrix composed of the header parts of the stack data can be converted into a triangular matrix (an upper or lower triangular matrix) (step S58). If it is determined that the selection matrix cannot be converted into the triangular matrix (No in step S58), the decoding part 33 determines that it is not possible to decode the data, the process returns to step S51, and receives the next encoded data. If it is determined that the selection matrix can be converted into the triangular matrix (Yes in step S58), the decoding part 33 determines that it is possible to decode the data. Then, the decoding part 33 performs matrix operation on the selection matrix to form an optimum selection matrix as a unit matrix (step S59), and performs similar matrix operation with respect to the operation result matrix that is the matrix composed of the operation result parts of the stack data (step S60). According to the matrix operation, the data matrix is converted into a optimum operation result matrix that has the original partial data as each row, and the process is end.
In the decoding process, the decoding part 33 can form the unit matrix from the selection matrix using a Gauss elimination method. Accordingly, until the selection matrix is converted into the triangular matrix, the reception of the encoded data and the matrix operation with respect to the selection matrixes are continued, and as the result of the operation, if the selection matrix is converted into the triangular matrix, it is determined that the data can be decoded. Further, the decoding part 33 performs the matrix operation to form the unit matrix from the selection matrix. It is noted that in the matrix operation, any method other than the Gauss elimination method can be used as long as the unit matrix is formed from the selection matrixes.
In the decoding process, it is possible to decode the original data from the operation result parts of the N or more pieces of encoded data. The number of the encoded data necessary for the decoding varies depending on which encoding matrix A is used. As the arrangement of 1 in the encoding matrix A becomes more random, the decoding efficiency increases. Moreover, since the XOR operation is used for the decoding, the load to the reception device is reduced and the decoding can be performed at a high speed.
Now, decoding determination process performed by the distribution device 1 is described.
The encoding matrix generation part 14 executes steps S51 to 58 in the above-described decoding process as the decoding determination process (step S15 in
According to the decoding process and the decoding determination process, the reception devices 3a, 3b, and 3c can decode the original data if the reception devices fully receive N+α (1<α≦(L−N)) or more pieces of encoded data out of the L pieces of encoded data. Similarly to the above-described decoding efficiency, the value of a varies depending on the encoding matrixes A, and an encoding matrix candidate having a minimum a (close to 1) can be selected as the encoding matrix A.
According to the present embodiment, the data amount to be transmitted by the transmission devices can be reduced and the reception devices can decode the data even if the whole of the encoded data is not received. As a result, it is possible to improve the temporal efficiency necessary for the data transfer.
The distribution devices and the reception devices according to the present embodiment can be readily applied to information processing devices, and it is possible to further increase performance of the information processing devices. The information processing devices include, for example, servers, personal computers (PCs), or the like.
Further, a program for implementing the above-described each step in computers constituting the distribution device and the reception device can be provided. The above-described programs can be implemented in the computers constituting the distribution device and the reception device by storing the programs on a computer-readable recording medium. The computer-readable recording medium includes internal memory devices mounted in computers such as a read-only memory (ROM), a random access memory (RAM); transportable storage media such as a compact disc read only memory (CD-ROM), a flexible disk, a Digital Versatile Disc (DVD), a magneto-optical disk, an integrated circuit card (IC card); data bases for storing computer programs; other computers; and data bases for the computers. Further, transmission media on lines can be included.
Number | Date | Country | Kind |
---|---|---|---|
2007-004858 | Jan 2007 | JP | national |