This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2014-012449, filed on Jan. 27, 2014, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a replication device, a replication method, and a replication system.
A data distribution method in which data is distributed and stored to and in a plurality of servers as storage nodes includes a consistent hash method. In the consistent hash method, hash values are obtained for each of the servers and each of the data by a hash function that has been prepared beforehand. For example, the numeric values from an IP address of the server and the name of the data are calculated as hash values. The range of the hash function is represented as a hash space having a ring form.
Related arts have been discussed in Japanese Laid-open Patent Publication No. 2010-271798, Japanese Laid-open Patent Publication No. 2007-133503, or Japanese National Publication of International Patent Application No. 2012-524947.
According to an aspect of the embodiments, a replication device includes: a memory configured to store a replication program; and a CPU configured to execute the replication program, wherein the CPU, based on the replication program, performs operations: determining an arrangement, as a first arrangement of a first replica of first data to a storage node, so that the first data are distributed and stored to and in a plurality of first storage nodes; and determining an arrangement, as a second arrangement of a second replica of second data to the storage node, so that the second data is continuously stored in a second storage node that is different from the plurality of first storage nodes.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
In the arrangement of the data, the position on the ring is obtained from a hash value of the name of the data, a server that takes charge of the data is determined, and a first replication (1st replica) is arranged in the determined server. A second replication (2nd replica) is arranged, for example, in a server that takes charge of the next area in a clockwise direction on the ring (“server S11” in the case of “data 1”).
The 2nd replica may be arranged in an area other than the next area in the clockwise direction on the ring. In the plurality of replicas, even when a certain server breaks down, access to the data is allowed. Therefore, the arrangement of the 2nd replica is selected so that the two replicas are not arranged in an identical server. For example, a hash value of the 1st replica is obtained from the name of the data, and a hash value of the 2nd replica is obtained from a different value to determine arrangement of servers for the replicas on the same ring. In this case, the two hash values are in an identical assigned range, and the replicas may be arranged in an identical server.
For example, when the server in which the two replicas are arranged breaks down, accesses to the two replicas are not allowed, so that the data may be lost. A way to obtain a hash value includes, for example, a way to obtain a hash value from the reversed name, or a way to obtain a hash value from the value of the data in addition to the way to obtain a hash value from the name of data.
For example, in a distribution data management system in which a plurality of replicas is stored, the data is written to a node that is selected based on a value that has been obtained from the value of the data, and the data is written to a node that is selected based on a value that has been obtained from metadata.
In addition, the replica of data that is to be requested from a data request side is generated and accumulated beforehand in a data accumulation area of a specified range in a distribution system. When the request is received from the data request side, a response is performed using the replica. In addition, the arrangement of the replica is determined based on a feature such as the capacity, data storage cost, or the location of the storage node.
The consistent hash method has a characteristic by which data is distributed approximately evenly by hashing the names of the data using a hash function. The replica is arranged in a server that takes charge of the next area on the ring, so that the replicas are distributed approximately evenly.
On the other hand, in the usage of data, there are a case in which a series of data is arranged in a lot of servers, and a case in which the series of data is arranged to a few servers. For example, in a case in which a movie is divided into data A, data B, . . . , when the movie is reproduced, the data is read in order, so that it is desirable that the series of data is arranged collectively in a single server. On the other hand, in a case of analysis of movie (such as check of distribution of colors), it is desirable that the series of data is arranged in a lot of servers to read the series of data in parallel and process the series of data at high speed.
As described above, depending on the usage of the data, there are the case in which it is desirable the data are distributed to a lot of servers and the case in which the data are put together in a few servers. The usage of the data is changed depending on processing, so that the data arrangement may not be determined beforehand. In addition, replicas of identical data may not be stored in an identical server.
The replication device 2 is a device that accepts a data access request from a client 5, which is used for an access of the distribution storage 1 by a user of the distribution storage 1, through the Internet 6. When the replication device 2 accepts a data write request from the client 5, the replication device 2 transfers two replicas of the data to different servers 3.
When the replication device 2 accepts a data read request from the client 5, the replication device 2 reads the data from one of the two servers 3 that store the replicas of the data, and transmits the data to the client 5. The replication device 2 and the client 5 may correspond to each other so as to be one to one.
The server 3 is a storage node in which the replica of data is stored in a hard disk drive (HDD). The plurality of replication devices 2 are coupled to the plurality of servers 3 through a local area network (LAN) 4, but may be coupled to the plurality of servers 3 through a further network. The replication devices 2 may communicate with each other.
The replication device 2 arranges the 1st replica in a server 3 that has been selected based on a hash value of the data from among servers 3 the identifiers of which are stored in the 1st column of the data distribution table 22. The replication device 2 arranges the 2nd replica in a server 3 the identifier of which is stored in the last transmission destination storage unit 24, and stores the identifier of the server 3 of the transmission destination, in the 2nd column of the data distribution table 22.
When the server 3 the identifier of which is stored in the last transmission destination storage unit 24 is the same as the server 3 in which the 1st replica has been arranged, the replication device 2 arranges the 2nd replica in the server 3 that is next to the last transmission destination on the ring that indicates the hash space. The replication device 2 updates the last transmission destination storage unit 24 with the identifier of the server 3 in which the 2nd replica has been arranged, and stores the identifier of the sever 3 of the transmission destination in the 2nd column of the data distribution table 22.
For example, when a hash value that has been calculated from data is “abc”, the replication device 2 arranges the 1st replica in “server C” the identifier “C” of which is stored in the 1st column of the data distribution table 22. The replication device 2 arranges the 2nd replica in “server D” the identifier “D” of which is stored in the last transmission destination storage unit 24, and stores the identifier “D” of the server 3 of the transmission destination in the 2nd column of the data distribution table 22.
As described above, the replication device 2 arranges the 1st replica in a server 3 that has been selected based on a hash value of the data. The replication device 2 distributes and arranges the series of data to and in a lot of servers 3 using the 1st replicas. Therefore, for example, in movie analysis such as check of colors, in the distribution storage 1, data may be read at high speed by reading the data in parallel.
The replication device 2 arranges the 2nd replica in the last transmission destination of the data. The replication device 2 arranges the series of data in an identical server 3 using the 2nd replicas. Therefore, for example, in a case such as reproduction of a movie that has been divided into a plurality of files and stored in the distribution storage 1, in the distribution storage 1, the series of files are read from the identical server 3 at high speed.
When the last transmission destination is the same as the arrangement destination of the 1st replica, the replication device 2 arranges the 2nd replica in the server 3 that is next to the last transmission destination on the ring that indicates the hash space. Therefore, the replication device 2 respectively arranges the 1st replica and the 2nd replica in the different servers 3.
The reception unit 21 receives an access request to the distribution storage 1, from the client 5. The reception unit 21 passes the access request to the first arrangement destination determination unit 23 and the second arrangement destination determination unit 25 when the access request is a data write request, and passes the access request to the read unit 27 when the access request is a data read request.
The data distribution table 22 is a table that is used to determine an arrangement destination of a replica, associates a hash range with the identifier of a server 3 that is an arrangement destination of the replica and stores the associated hash range and the identifier. For example, in
In
The data distribution table 22 stores information that is identical among the plurality of replication devices 2. Therefore, the information of the data distribution table 22 is synchronized among the replication devices 2. In the data distribution table 22, the number of lines is greatly large as compared with the number of servers, and for example, may be around 10 000 times the number of servers.
The first arrangement destination determination unit 23 receives a data write request from the reception unit 21, and calculates a hash value from the write data. The first arrangement destination determination unit 23 determines an arrangement destination of the 1st replica based on the hash value and the data distribution table 22, and passes the determined arrangement destination to the transfer unit 26 with the write data.
The last transmission destination storage unit 24 stores the identifier of a server 3 that is a last transmission destination of the 2nd replica. The replication device 2 corresponds to the client 5 on a one-to-one basis, so that the last transmission destination storage unit 24 stores the identifier of the server 3 that is the last transmission destination for the corresponding client 5. An initial value of the last transmission destination storage unit 24 may be determined randomly.
The second arrangement destination determination unit 25 receives the data write request from the reception unit 21, and calculates a hash value from the write data. The second arrangement destination determination unit 25 determines an arrangement destination of the 2nd replica based on the hash value, the data distribution table 22, and the last transmission destination storage unit 24, and passes the determined arrangement destination to the transfer unit 26 with the write data.
The second arrangement destination determination unit 25 determines the arrangement destination so that the 2nd replicas of the series of data that are written from the client 5 are arranged in an identical server 3. For example, when the 2nd column in the data distribution table 22, which corresponds to the hash value of the data, is empty, the second arrangement destination determination unit 25 sets a value that is stored in the last transmission destination storage unit 24 to the 2nd column, and arranges the 2nd replica in the last transmission destination. The 2nd replica is arranged in a server 3 to which the previous 2nd replica has been written from the client.
When a value that is stored in the 1st column and a value that is stored in the last transmission destination storage unit 24 are identical to each other, the two replicas are arranged in an identical server 3 undesirably. In a such case, the second arrangement destination determination unit 25 does not write the 2nd replica to a server 3 to which the previous 2nd replica has been written, and changes the value that is stored in the last transmission destination storage unit 24, to a value that is moved by one in the clockwise direction on the ring that indicates the hash space.
When a value of the 2nd column in the data distribution table 22, which corresponds to the hash value of the data, is identical to a value that is stored in the last transmission destination storage unit 24, the second arrangement destination determination unit 25 arranges the 2nd replica in the server 3 that is indicated by the last transmission destination storage unit 24. When the value of the 2nd column is identical to the value that is stored in the last transmission destination storage unit 24, the values of the 1st column and the 2nd column are not equal due to processing that has been executed until that time.
When the 2nd column in the data distribution table 22, which corresponds to the hash value of the data, is not empty, and is not identical to the value that is stored in the last transmission destination storage unit 24, the second arrangement destination determination unit 25 may not write the 2nd replica to the server to which the previous 2nd has been written. Therefore, the second arrangement destination determination unit 25 tries to divide, a certain maximum number of times (for example, 10 times) an assigned range of a line that corresponds to the hash value of the data distribution table 22 into a plurality of lines.
For example, the second arrangement destination determination unit 25 divides the line that corresponds to the hash value into n (for example, 100) lines. The second arrangement destination determination unit 25 asks a server 3 that has held the 2nd replica until that time (server 3 that is indicated in the 2nd column) about whether data that corresponds to which line from among increased lines is held. The second arrangement destination determination unit 25 sets the value that has been indicated in the 2nd column before the division, to the 2nd column of the line that is replied from the server 3 that has held the 2nd replica. Therefore, the 2nd column in the data distribution table 22, which corresponds to the hash value of the data, becomes empty, and the 2nd replica may be arranged in the server 3 that is the same as the last transmission destination.
In a case in which the 2nd column does not become empty even when the assigned range is tried to be divided by the certain maximum number of times, the second arrangement destination determination unit 25 may stop writing the 2nd replica to the server 3 to which the previous 2nd replica has been written. The second arrangement destination determination unit 25 arranges the data in the server that is indicated by the 2nd column, and updates the last transmission destination storage unit 24 with an identifier of the server 3.
As described above, the second arrangement destination determination unit 25 does not determine an arrangement destination of the 2nd replica beforehand, and may determine the arrangement destination of the 2nd replica in the process of accumulating data. Therefore, by the distribution storage 1, arrangement of a new 2nd replica is performed appropriately depending on the arrangement status of the 2nd replica.
At that time, when the replication device 2 in which the value of the last transmission destination storage unit 24 is “B” writes data X the hash value of which is “abc”, the second line from the hash range of the data distribution table 22 takes charge of the hash value “abc” of the data X. Thus, the 1st replica is arranged in a server C based on a value “C” of the 1st column of the second line of
The 2nd column of the second line of the data distribution table 22 corresponds to “D”, and is not empty and is not equal to “B” that is the value stored in the last transmission destination storage unit 24. Therefore, the second arrangement destination determination unit 25 increases the number of lines of the data distribution table 22. For example, the second arrangement destination determination unit 25 increases lines by four.
The second arrangement destination determination unit 25 asks a server D about whether the identifier of the 2nd column is to be set to which line from among the increased number of lines because the identifier of the server 3 that has the data that corresponds to the original 2nd replica of the second line is “D”. Here, “D” is set to the 2nd column of the line that has been replied from the server D.
In the data distribution table 22 illustrated in
Returning to
The read unit 27 receives a data read request from the reception unit 21, and identifies a server 3 from which the of data is read. In the data read request, whether or not the data is wanted to be read from an identical server 3 is specified. The read unit 27 identifies a server 3 from which the data are read, based on the specification of whether the data is to be read from an identical server 3 and the data distribution table 22.
For example, the read unit 27 identifies a server 3 from which the data is read, by using the 2nd column of the data distribution table 22 when “the data is wanted to be read from an identical server 3” is specified. When “the data is wanted to be read from the identical server 3” is not specified, the read unit 27 identifies servers 3 from which the data is read, by using the 1st column in the data distribution table 22. The read unit 27 reads the data from the identified servers 3.
The table synchronization unit 28 synchronizes information of the data distribution table 22 with that of a further replication device 2. The table synchronization unit 28 performs synchronization with the further replication device 2 when the number of lines is increased, the 2nd column is updated, or the like.
The second arrangement destination determination unit 25 obtains a hash value of data, and identifies “id” of a line in which a hash range of the data distribution table 22 includes the hash value of the data (Operation S2). Here, “id” is the number of the line. The second arrangement destination determination unit 25 determines the identifier of the id line of the 2nd column in the data distribution table 22 (Operation S3).
When the identifier of the id line of the 2nd column in the data distribution table 22 is equal to a value that is stored in the last transmission destination storage unit 24, the transfer unit 26 transfers the data to a server 3 that is indicated by the id line of the 2nd column in the data distribution table (Operation S7).
When the id line of the 2nd column in the data distribution table 22 is empty, the second arrangement destination determination unit 25 determines whether or not the identifier of the 1st column of the id line in the data distribution table 22 is matched with the value that is stored in the last transmission destination storage unit 24 (Operation S4). When the identifier of the 1st column of the id line in the data distribution table 22 is matched with the value that is stored in the last transmission destination storage unit 24, the second arrangement destination determination unit 25 changes the value to the identifier of the next server 3 on the ring (Operation S5). When the identifier of the 1st column of the id line in the data distribution table 22 is not matched with the value that is stored in the last transmission destination storage unit 24, the processing proceeds to Operation S6.
The second arrangement destination determination unit 25 substitute the value that is stored in the last transmission destination storage unit 24, into the id line of the 2nd column in the data distribution table 22 (Operation S6). The transfer unit 26 transfers the data to a server 3 that is indicated by the id line of the 2nd column in the data distribution table 22 (Operation S7).
When the identifier of the id line of the 2nd column in the data distribution table 22 is not equal to the value that is stored in the last transmission destination storage unit 24, and the id line of the 2nd column in the data distribution table 22 is not empty, the second arrangement destination determination unit 25 adds “1” to the count (Operation S8). The second arrangement destination determination unit 25 determines whether or not the value of the count is larger than a certain maximum number of times (Operation S9).
When the value of the count is not larger than the certain maximum number of times, the second arrangement destination determination unit 25 divides the id line of the data distribution table 22, into n lines (Operation S10). For example, “n” may be 100. The 1st column of each of the divided lines is the same as the original, and the 2nd column is empty. For example, in
The second arrangement destination determination unit 25 asks the server 3 that stores the data that corresponds to the id line, about whether the data is held so as to correspond to which line from among the increased lines, and sets the identifier of the server 3 to the 2nd column of a replied line (Operation S11). For example, in
When the value of the count is larger than the certain maximum number of times, the transfer unit 26 transfers the data to the server 3 that is indicated by the id line of the 2nd column in the data distribution table 22 (Operation S12). The second arrangement destination determination unit 25 updates the value of the last transmission destination storage unit 24 with the identifier of the server 3 that is the transfer destination (Operation S13).
As described above, when the second arrangement destination determination unit 25 determines arrangement of the 2nd replica based on the hash value, the data distribution table 22, and the last transmission destination storage unit 24, the distribution storage 1 may store the series of data in an identical server 3 using the 2nd replicas.
The read unit 27 determines whether or not data is wanted to be read from an identical server 3, based on a read request (Operation S22). When “the data is wanted to be read from the identical server 3” is specified, the read unit 27 reads the data from a server 3 that is indicated by the id line of the 2nd column in the data distribution table 22 (Operation S23).
When “the data is wanted to be read from the identical server 3” is not specified, the read unit 27 reads the data from a server 3 that is indicated by the id line of the 1st column in the data distribution table 22 (Operation S24).
As described above, when the read unit 27 reads the 1st replica or the 2nd replica based on the read request, the distribution storage 1 may read the data at high speed using the replica that corresponds to the request from the client 5.
The first arrangement destination determination unit 23 determines an arrangement destination of the 1st replica so that data is distributed and arranged. The second arrangement destination determination unit 25 determines the arrangement of the 2nd replica so that data is continuously stored in an identical server 3, and stored in the server that is different from those of the 1st replicas. Therefore, the distribution storage 1 deals with both a case in which it is desirable that a series of data is arranged in a lot of servers and a case in which it is desirable that a series of data is arranged in a few servers, and replicas of identical data may not be stored in an identical server.
The data distribution table 22 stores the identifiers of servers 3 of arrangement destinations of the 1st replica and the 2nd replica, for each hash range. The last transmission destination storage unit 24 stores the identifier of a server 3 that is a last transmission destination of the 2nd replica. The second arrangement destination determination unit 25 determines an arrangement destination of the 2nd replica, based on a hash value of data, the data distribution table 22, and the last transmission destination storage unit 24. Therefore, the second arrangement destination determination unit 25 may determine the arrangement of the 2nd replica so that data is continuously stored in an identical server 3, and stored in the server that is different from those of the 1st replicas.
When the arrangement destination of the 2nd replica based on the hash value is different from the last transmission destination based on the last transmission destination storage unit 24, the second arrangement destination determination unit 25 tries to match the arrangement destination of the 2nd replica based on the hash value with the last transmission destination based on the last transmission destination storage unit 24 by dividing the hash range. Therefore, the second arrangement destination determination unit 25 may arrange the 2nd replica so that series of data is stored in an identical server 3.
In a case in which the arrangement destination based on the hash value and the last transmission destination are not caused to be matched with each other even when the hash range is divided by the certain number of times, the second arrangement destination determination unit 25 arranges the 2nd replica in the arrangement destination based on the hash value. Therefore, excessive concentration of data on an identical server 3 may be reduced.
The replication device 2 may be obtained by software. A replication program having a function that is similar to that of the replication device 2 is provided. The replication program may be executed by a computer. The computer may execute a plurality of replication programs.
The main memory 31 may be a memory that stores a program, a result in the middle of execution of the program, and the like. The CPU 32 may be a CPU that reads a program from the main memory 31 and executes the program. The CPU 32 may include a chip set including a memory controller.
The LAN interface 33 may be an interface that is used to couple the computer 30 to a further computer through a LAN. The HDD 34 may be a disk device that stores a program and data. The super IO 35 may be an interface that is used to perform connection of an input device such as a mouse and a keyboard. The DVI 36 may be an interface that is used to perform connection of a liquid crystal display device. The ODD 37 may be a device that performs read and write of a digital versatile disc (DVD).
The LAN interface 33 is coupled to the CPU 32 by PCI Express, and the HDD 34 and the ODD 37 are coupled to the CPU 32 by serial advanced technology attachment (SATA). The super IO 35 is coupled to the CPU 32 by low pin count (LPC).
The replication program that is executed in the computer 30 is stored in a DVD, read from the DVD by the ODD 37, and installed on the computer 30. For example, the replication program is stored in a database and the like of a further computer system that is coupled through the LAN interface 33, and is installed on the computer 30 so as to be read from the databases. The installed replication program is stored in the HDD 34, read to the main memory 31, and executed by the CPU 32.
The replication device may be included in the distribution storage. For example, even when the replication program is executed in a client or a computer that is in the vicinity of the client, the replication device may be applied in the similar manner.
The arrangement of a replica may be determined using a hash. For example, when arrangement servers are switched in order, the arrangement of a replica may be determined by a further method.
The replication device and a client may correspond to each other so as to be one to one. When “m” and “n” are set as given positive integers, the replication device and the client may correspond to each other so as to be “m” to “n”.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2014-012449 | Jan 2014 | JP | national |