The present invention relates to a storage system, and specifically, relates to a storage system that distributes and stores data into a plurality of storage devices.
In recent years, as computers have developed and become popular, various kinds of information are put into digital data. As a device for storing such digital data, there is a storage device such as a magnetic tape and a magnetic disk. Because data to be stored has increased day by day and the amount thereof has become huge, a high-capacity storage system is required. Moreover, it is required to keep reliability while reducing the cost for storage devices. In addition, it is required that data can easily be retrieved later. As a result, such a storage system is desired that is capable of automatically realizing increase of the storage capacity and performance thereof, that eliminates a duplicate of storage to reduce the cost for storage, and that has high redundancy.
Under such circumstances, in recent years, a content address storage system has been developed as shown in Patent Document 1. This content address storage system distributes data and stores into a plurality of storage devices, and specifies a storing position in which the data is stored based on a unique content address specified corresponding to the content of the data.
To be specific, the content address storage system divides predetermined data into a plurality of fragments, adds a fragment that is redundant data thereto, and stores the plurality of fragments into a plurality of storage devices, respectively. Later, by designating a content address, it is possible to retrieve data, that is, a fragment stored in a storing position specified by the content address and restore the predetermined data before being divided, from the plurality of fragments.
Further, the content address is generated so as to be unique corresponding to the content of data. Therefore, in the case of duplicated data, it is possible to acquire data having the same content with reference to data in the same storing position. Thus, it is not necessary to separately store duplicated data, and it is possible to eliminate duplicated recording and reduce the data capacity.
On the other hand, a storage system equipped with a plurality of storage devices is required to have a structure of load balancing so as not to place more load or intensify load on some nodes. An example of such a load balancing system is a system described in Patent Document 2.
A load balancing storage system will be described in detail. A load balancing storage system has a self-repairing function of being capable of performing data restoration by itself in case of a failure because redundant data is added at the time of data storing. Moreover, the load balancing storage system has a distributed resilient data function of, at the time of determining what node a component is located in, distributing by considering the load of each node autonomously as a system.
In such a storage system, firstly, data to be stored is divided into fine data blocks. Each of the data blocks is divided more finely, plural pieces of redundant data are added thereto, and these data are stored into a plurality of nodes configuring the system. The nodes belonging to the storage system each have a data storing region called a component, and the data blocks are stored into the components. Moreover, in the storage system, load balancing is performed by the component, and exchange of data between the nodes is performed by the component. Location of the components in the respective nodes is performed autonomously by the system.
In the system as described above, in a case that the node is separated from the system because of a node failure, the component of the node is regenerated on the other node.
[Patent Document 1] Japanese Unexamined Patent Application Publication No. JP-A 2005-235171
[Patent Document 2] Japanese Unexamined Patent Application Publication No. JP-A 2008-204206
However, as described above, in a case that a storage system has a function of distributing by considering the load of each node autonomously, relocation of data may become inefficient at the time of restoration from a node fault. An example shown in
In a case that the nodes A and B participate in the system again after temporal faults as shown in
Accordingly, an object of the present invention is to provide a storage system that can increase efficiency of processing in data restoration and inhibit system load and processing delay.
In order to achieve the object, a storage system of an embodiment of the present invention includes a plurality of storing means and a data processing means configured to store data into the plurality of storing means and retrieve the data stored in the storing means.
The data processing means includes: a distribution storage processing means configured to distribute and store a plurality of fragment data composed of division data obtained by dividing storage target data into plural pieces and redundant data for restoring the storage target data, into the plurality of storing means; a data location monitoring means configured to monitor a data location status of the fragment data in the respective storing means and store data location information representing the data location status; and a data restoring means configured to, when any of the storing means is down, regenerate the fragment data having been stored in the down storing means based on the fragment data stored in the storing means other than the down storing means and store into the other storing means. The data processing means also includes a data location returning means configured to, when the down storing means recovers, return a data location of the fragment data by using the fragment data stored in the storing means having recovered so that the data location status becomes as represented by the data location information stored by the data location monitoring means.
Further, a computer program of another embodiment of the present invention is a computer program including instructions for causing an information processing device equipped with a plurality of storing means to realize a data processing means configured to store data into the plurality of storing means and retrieve the data stored in the storing means, and also realize: a distribution storage processing means configured to distribute and store a plurality of fragment data composed of division data obtained by dividing storage target data into plural pieces and redundant data for restoring the storage target data, into the plurality of storing means; a data location monitoring means configured to monitor a data location status of the fragment data in the respective storing means and store data location information representing the data location status; a data restoring means configured to, when any of the storing means is down, regenerate the fragment data having been stored in the down storing means based on the fragment data stored in the storing means other than the down storing means and store into the other storing means; and a data location returning means configured to, when the down storing means recovers, return a data location of the fragment data by using the fragment data stored in the storing means having recovered so that the data location status becomes as represented by the data location information stored by the data location monitoring means.
Further, a data processing method of another embodiment of the present invention includes, in an information processing device equipped with a plurality of storing means: storing data into the plurality of storing means and retrieving the data stored in the storing means; distributing and storing a plurality of fragment data composed of division data obtained by dividing storage target data into plural pieces and redundant data for restoring the storage target data, into the plurality of storing means; monitoring a data location status of the fragment data in the respective storing means and storing data location information representing the data location status; when any of the storing means is down, regenerating the fragment data having been stored in the down storing means based on the fragment data stored in the storing means other than the down storing means and storing into the other storing means; and when the down storing means recovers, returning a data location of the fragment data by using the fragment data stored in the storing means having recovered so that the data location status becomes as represented by the data location information having been stored.
With the configurations as described above, the present invention can realize efficient and quick data restoration.
A first exemplary embodiment of the present invention will be described with reference to
This exemplary embodiment shows a specific example of a storage system disclosed in a second exemplary embodiment described later. Below, a case of configuring the storage system by connecting a plurality of server computers will be described. However, the storage system of the present invention is not limited to being configured by a plurality of computers, and may be configured by one computer.
[Configuration]
As shown in
As shown in
Furthermore, the storage system 10 of this exemplary embodiment is a content address storage system that divides data and makes the data redundant, distributes and stores the data into a plurality of storage devices, and specifies a storing position in which the data is stored by a unique content address specified in accordance with the content of the data. This content address storage system will be described later in detail.
In
Further, the storage node 10B configuring the storage system 10 is equipped with a component moving unit 31 and a data-movement and data-regeneration unit 32, which are configured by installation of a program into a plurality of arithmetic devices like a CPU (Central Processing Unit) included therein. Moreover, the storage node 10B is equipped with a component 33 within a storage device included therein. Below, the respective configurations will be described in detail.
The abovementioned program is provided to the accelerator node 10A and the storage node 10B, for example, in a state stored in a storage medium such as a CD-ROM. Alternatively, the program may be stored in a storage device of another server computer on the network and provided from the other server computer to the accelerator node 10A and the storage node 10B via the network.
Further, the configurations included by the accelerator node 10A and the storage node 10B are not necessarily limited to the configurations shown in
Firstly, the data-division and redundant-data-provision unit 21 divides backup target data (storage target data) into a plurality of fragment data in order to distribute and store the backup target data. An example of this process is shown in
Further, the data-division and redundant-data-provision unit 21 divides the block data D into a plurality of fragment data having predetermined capacities. For example, the data-division and redundant-data-provision unit 21 divides the block data D into nine fragment data (division data 41) as shown by symbols D1 to D9 in
Then, the fragment data generated as described above are distributed and stored into the components 33 formed in the respective storage nodes 10B via a switch 10C, respectively, by the component moving units 31 of the respective storage nodes 10B described later (a distribution storage processing means). For example, in the case of generating the twelve fragment data D1 to D12 as shown in
When the fragment data are stored as described above, a content address CA representing the storing positions of the fragment data D1 to D12, namely, the storing position of the block data D restored from the fragment data D1 to D12 is generated in the storage node 10B. At this moment, the content address CA is generated, for example, by combining part of the hash value H calculated based on the stored block data D (a short hash: e.g., the beginning 8 B (bytes) of the hash value H) and information representing a logical storing position. Then, this content address CA is returned to the accelerator node 10A managing a file system within the storage system 10 (arrow Y6 in
Thus, upon acceptance of a request for retrieving a file, the storage system can specify a storing position designated by a content address CA corresponding to the requested file and retrieve each fragment data stored in this specified storing position as data requested to be retrieved. As described above, the storage system has a function of retrieving and writing data (a data processing means).
Further, the component and node information monitoring unit 22 (a data location monitoring means) manages the fragment data stored in the respective storage nodes 10B by the component, which stores the fragment data. To be specific, as described later, the component and node information monitoring unit 22 monitors the movement of the component autonomously executed by the storage node 10B, and acquires component location information representing the location of the component at predetermined time intervals (every x minutes). When component location information indicates a steady state for a preset time or more (y minutes or more), the component and node information monitoring unit 22 stores the component location information including the storage node name and the component name related to each other into the mapping table 23. In other words, the component and node information monitoring unit 22 updates the mapping table 23.
Further, the component and node information monitoring unit 22 monitors the storage nodes 10B normally operating and participating in the storage system and stores node information representing a list thereof as a node lost 24 (a storing means list). In other words, the component and node information monitoring unit 22 monitors whether or not the storage node 10B is down, for example, the storage node 10B is stopping or is not participating in the system, and stores a list of the storage nodes 10B that are not down. To be specific, the component and node information monitoring unit 22 executes monitoring of the storage node 10B together with monitoring of the location of the components at predetermined time intervals (every x minutes). As a result of the monitoring, in a case that the location of the components and the list of the storage nodes keep steady without change for a predetermined time or more (y minutes or more), the component and node information monitoring unit 22 re-stores component location information and node information in that state into the mapping table and the node list, respectively.
On the other hand, in a case that there is no change of node information with respect to the node list though component location information has changed as a result of the monitoring, the component and node information monitoring unit 22 determines that a node fault is temporal and the storage node 10B has restored. In this case, the component and node information monitoring unit 22 gives, to the respective storage nodes 10B, an instruction to return location of the component so that the component location information stored in the mapping table 23 agrees with the location of the component located in the storage node 10B actually. The component and node information monitoring unit 22 functions as a data location returning means in cooperation with the component moving unit 31 and the data-movement and data-regeneration unit 32 of the storage node 10B described later.
Next, a configuration of the storage node 10b will be described. Firstly, the storage nodes 10B each form the component 33 that is the unit of a data storing region, and store the fragment data D1 to D12, respectively, as described later.
Further, the component moving unit 31 has a function of distributedly storing the respective fragment data transmitted via the switch 10C as described above in cooperation with the other storage nodes 10B, and also has a function of balancing load among the storage nodes 10B. To be specific, the load balancing function monitors the state of load of each of the storage nodes 10B and, for example, at the time of storing fragment data and at the time of adding or deleting the storage node 10B, moves the component 33 in accordance with a load balance among the storage nodes 10B. The load balancing function by the component moving unit 31 is autonomously executed by cach of the storage nodes 10B. For example, when the storage node 10B is down and deleted because of a fault or the like, the component stored in the down storage node 10B is moved so as to be generated in the other storage node 10B. Moreover, for example, when the storage node 10B is newly added, or recovers from a fault and is added, the component stored in the existing storage node 10B is moved to the added storage node 10B.
Then, specifically, upon acceptance of an instruction to return the location of the component from the component and node information monitoring unit 22 described above, the component moving unit 31 moves the component 33 so that the actual location of the component agrees with component location information stored in the mapping table 23.
Further, the data-movement and data-regeneration unit 32 executes movement of data or regeneration of data so as to store the data into the component in accordance with the component moved by the component moving unit 31 described above. To be specific, firstly, the data-movement and data-regeneration unit 32 checks by data belonging to the component whether the data exists in a storage node to which the component is to be moved. In a case that the data exists, the data-movement and data-regeneration unit 32 relates the data with the component moved by the component moving unit 31. On the other hand, in a case that the data does not exist in the destination storage node, the data-movement and data-regeneration unit 32 subsequently checks whether the data exists in a source storage node. At this moment, in a case that the data exists in the source storage node, the data-movement and data-regeneration unit 32 moves the data to the destination storage node, from the source storage node. On the other hand, in a case that the data does not exist in either the destination storage node or the source storage node, the data-movement and data-regeneration unit 32 regenerates the data from the redundant data.
As described above, the component moving unit 31 and the data-movement and data-regeneration unit 32, in cooperation with the component and node information monitoring unit 22, function as a data restoring means for restoring data stored in a deleted storage node 10B into another storage node 10B and also function as a data location returning means for returning data location in the storage node 10B having recovered.
[Operation]
Next, an operation of the storage system configured as described above will be described with reference to the flowcharts of
First, the data-division and redundant-data-provision unit 21 of the accelerator node 10A divides storage target data into any number of pieces, and adds a plurality of redundant data thereto, thereby forming a plurality of fragment data (step S1 in
Subsequently, an operation of the component and node information monitoring unit 22 of the accelerator 10A will be described with reference to
It is assumed that the storage node 10B is down because of a fault of the storage node 10B, etc. In other words, it is assumed that component location information being monitored and node information change with respect to the mapping table 23 and the node list 24 (“Yes” at step S15 and “Yes” at step S16). As a specific example, it is assumed that the storage nodes A and B are down as shown in
Then, in a case that the storage nodes remain down and, while the component location information being monitored and the node information remain changed with respect to the mapping table 23 and the node list 24 (“Yes” at step S15 and “Yes” at step S16), keep steady for y minutes or more (“Yes” at step S18), the component and node information monitoring unit 22 re-stores the component location information and node information in that state into the mapping table and the node list (step S13).
On the other hand, in a case that component location information changes because of a storage node fault, etc., as described above (“Yes” at step S15) and load balancing is autonomously executed as shown in
Movement of data stored in a component in accordance with movement of the component and regeneration of data are executed by the storage node 10B as shown in
On the other hand, in a case that the data corresponding to the moved component does not exist in the destination storage node (“No” at step S21), the storage node 10B next checks whether the data exists in a source storage node (step S23). Then, in a case that the data exists in the source storage node, the storage node 10B moves the data from the source storage node to the destination storage node (step S24).
Furthermore, in a case that the data does not exist either in the component destination storage node or in the source storage node, the data is regenerated from redundant data. This process is executed for, when any storage node goes down, moving a component stored in the storage node to another storage node as shown in
A second exemplary embodiment of the present invention will be described with reference to
As shown in
Then, the data processing means 2 includes: a distribution storage processing means 3 configured to distribute and store a plurality of fragment data composed of division data obtained by dividing storage target data into plural pieces and redundant data for restoring the storage target data, into the plurality of storing means; a data location monitoring means 4 configured to monitor a data location status of the fragment data in the respective storing means and store data location information representing the data location status; and a data restoring means 5 configured to, when any of the storing means is down, regenerate the fragment data having been stored in the down storing means based on the fragment data stored in the storing means other than the down storing means and store into the other storing means.
Furthermore, the storage system 1 of this exemplary embodiment also includes a data location returning means 6 configured to, when the down storing means recovers, return a data location of the fragment data by using the fragment data stored in the storing means having recovered so that the data location status becomes as represented by the data location information stored by the data location monitoring means.
According to the present invention, firstly, the storage system divides storage target data into a plurality of division data, generates redundant data for restoring the storage target data, and distributes and stores a plurality of fragment data including the division data and the redundant data into a plurality of storing means. After that, the storage system monitors a data location status of the respective fragment data, and stores data location information representing the data location status.
Further, when the storing means is down because of occurrence of a fault, the storage system regenerates the fragment data having been stored in the down storing means based on the other fragment data and stores into the other storing means. After that, when the down storing means recovers, the storage system uses the fragment data stored in the storing means having recovered and returns the data location so that the data location status becomes as represented by the data location information.
Consequently, in a case that the storing means is down temporarily and then recovers, it is possible to return data location by using the stored fragment data, and therefore, it is possible to inhibit regeneration and movement of unnecessary data. Accordingly, it is possible to realize efficient and quick data restoration in recovery of the storing means.
Further, in the storage system: the data location monitoring means is configured to monitor the data location status of the fragment data by component that is a unit of data storing within the storing means; the data restoring means is configured to regenerate the component of the down storing means in the other storing means; and the data location returning means is configured to return a data location of the component in the storing means based on the data location information and return the data location of the fragment data.
Further, in the storage system, the data location returning means is configured to return the component to the storing means having recovered and, by relating the fragment data stored in the storing means having recovered with the component, return the data location of the fragment data.
Further, in the storage system, the data location returning means is configured to, in a case that the fragment data to be stored in the component returned to the storing means having recovered based on the data location information does not exist in the storing means having recovered, return the data location of the fragment data by moving the fragment data regenerated by the data restoring means from the other storing means.
Further, in the storage system: the data location monitoring means is configured to, in a case that the data location status being monitored keeps steady for a predetermined time or more, store the data location information representing the data location status; and the data location returning means is configured to, when the data location status monitored by the data location monitoring means changes with respect to the data location information and the down storing means recovers, return the data location of the fragment data.
Further, in the storage system: the data location monitoring means is configured to monitor an operation status of the storing means, and store the data location information and also store a storing means list showing the operating storing means; and the data location returning means is configured to, when the data location status monitored by the data location monitoring means changes with respect to the data location information and the operating storing means agrees with the storing means list, return the data location of the fragment data.
Further, the abovementioned storage system can be realized by installing a program in an information processing device.
To be specific, a computer program of another exemplary embodiment of the present invention includes instructions for causing an information processing device equipped with a plurality of storing means to realize a data processing means configured to store data into the plurality of storing means and retrieve the data stored in the storing means, and also realize: a distribution storage processing means configured to distribute and store a plurality of fragment data composed of division data obtained by dividing storage target data into plural pieces and redundant data for restoring the storage target data, into the plurality of storing means; a data location monitoring means configured to monitor a data location status of the fragment data in the respective storing means and store data location information representing the data location status; a data restoring means configured to, when any of the storing means is down, regenerate the fragment data having been stored in the down storing means based on the fragment data stored in the storing means other than the down storing means and store into the other storing means; and a data location returning means configured to, when the down storing means recovers, return a data location of the fragment data by using the fragment data stored in the storing means having recovered so that the data location status becomes as represented by the data location information stored by the data location monitoring means.
Then, in the computer program, the data location monitoring means is configured to monitor the data location status of the fragment data by component that is a unit of data storing within the storing means; the data restoring means is configured to regenerate the component of the down storing means in the other storing means; and the data location returning means is configured to return a data location of the component in the storing means based on the data location information and return the data location of the fragment data.
The abovementioned program is provided to the information processing device, for example, in a state stored in a storage medium such as a CD-ROM. Alternatively, the program may be stored in a storage device of another server computer on the network and provided from the other server computer to the information processing device via the network.
Further, a data processing method executed in the storage system with the above configuration includes: storing data into the plurality of storing means and retrieving the data stored in the storing means; distributing and storing a plurality of fragment data composed of division data obtained by dividing storage target data into plural pieces and redundant data for restoring the storage target data, into the plurality of storing means; monitoring a data location status of the fragment data in the respective storing means and storing data location information representing the data location status; when any of the storing means is down, regenerating the fragment data having been stored in the down storing means based on the fragment data stored in the storing means other than the down storing means and storing into the other storing means; and when the down storing means recovers, returning a data location of the fragment data by using the fragment data stored in the storing means having recovered so that the data location status becomes as represented by the data location information having been stored.
Then, the data processing method includes: when monitoring the data location status, monitoring the data location status of the fragment data by component that is a unit of data storing within the storing means; when regenerating the fragment data, regenerating the component of the down storing means in the other storing means; and when returning the data location, returning a data location of the component in the storing means based on the data location information and returning the data location of the fragment data.
Inventions of a computer program and a data processing method having the abovementioned configurations have like actions as the abovementioned storage system, and therefore, can achieve the object of the present invention mentioned above.
Although the present invention has been described with reference to the respective exemplary embodiments described above, the present invention is not limited to the abovementioned exemplary embodiments. The configuration and details of the present invention can be altered within the scope of the present invention in various manners that can be understood by those skilled in the art.
The present invention is based upon and claims the benefit of priority from Japanese patent application No. 2009-033438, filed on Feb. 17, 2009, the disclosure of which is incorporated herein in its entirety by reference.
The present invention can be utilized for a storage system configured by connecting a plurality of computers, and has industrial applicability.
Number | Date | Country | Kind |
---|---|---|---|
2009-033438 | Feb 2009 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2009/003964 | 8/20/2009 | WO | 00 | 8/3/2011 |