1. Field of the Invention
The present invention relates to a distributed data storage system and method for storing data, and more particularly, to a system and method for storing subsets of an original set of data on multiple data storage devices in one or more locations such that the individual data subsets on each digital data storage device are unrecognizable and unusable except when combined with data subsets from other digital data storage devices and in which the data subsets are selected by way of information dispersal algorithms so that even if there is a failure of one or more digital data storage devices, the original data can be reconstructed.
2. Description of the Prior Art
Various data storage systems are known for storing data. Normally such data storage systems store all of the data associated with a particular data set, for example, all the data of a particular user or all the data associated with a particular software application or all the data in a particular file, in a single data space (i.e single digital data storage device). Critical data is known to be initially stored on redundant digital data storage devices. Thus, if there is a failure of one digital data storage device, a complete copy of the data is available on the other digital data storage device. Examples of such systems with redundant digital data storage devices are disclosed in U.S. Pat. Nos. 5,890,156; 6,058,454; and 6,418,539, hereby incorporated by reference. Although such redundant digital data storage systems are relatively reliable, there are other problems with such systems. First, such systems essentially double the cost of digital data storage. Second, all of the data in such redundant digital data storage systems is in one place making the data vulnerable to unauthorized access.
In order to improve the security and thus the reliability of the data storage system, the data may be stored across more than one storage device, such as a hard drive, or removable media, such as a magnetic tape or a so called “memory stick,” as set forth in U.S. Pat. No. 6,128,277, hereby incorporated by reference, as well as for reasons relating to performance improvements or capacity limitations. For example, recent data in a database might be stored on a hard drive while older data that is less often used might be stored on a magnetic tape. Another example is storing data from a single file that would be too large to fit on a single hard drive on two hard drives. In each of these cases, the data subset stored on each data storage devices does not contain all of the original data, but does contain a generally continuous portion of the data that can be used to provide some usable information. For example, if the original data to be stored was the string of characters in the following sentence:
In each case, the data stored on each device is not a complete copy of the original data, but each of the data subsets stored on each device provides some usable information.
Typically, the actual bit pattern of data storage on a device, such as a hard drive, is structured with additional values to represent file types, file systems and storage structures, such as hard drive sectors or memory segments. The techniques used to structure data in particular file types using particular file systems and particular storage structures are well known and allow individuals familiar with these techniques to identify the source data from the bit pattern on a physical media.
In order to make sure that stored data is only available to authorized users, data is often stored in an encrypted form using one of several known encryption techniques, such as DES, AES or several others. These encryption techniques store data in some coded form that requires a mathematical key that is ideally known only to authorized users or authorized processes. Although these encryption techniques are difficult to “break”, instances of encryption techniques being broken are known, making the data on such data storage systems vulnerable to unauthorized access.
In addition to securing data using encryption, several methods for improving the security of data storage using information dispersal algorithms have been developed, for example as disclosed in U.S. Pat. No. 6,826,711 and U.S. patent application Publication No. US 2005/0144382, hereby incorporated by reference. Such information dispersal algorithms are used to “slice” the original data into multiple data subsets and distribute these subsets to different storage nodes (i.e different digital data storage devices). Individually, each data subset or slice does not contain enough information to recreate the original data; however, when threshold number of subsets (i.e. less than the original number of subsets) are available, all the original data can be exactly created.
The use of such information dispersal algorithms in data storage systems is also described in various trade publications. For example, “How to Share a Secret”, by A. Shamir, Communications of the ACM, Vol. 22, No. 11, November, 1979, describes a scheme for sharing a secret, such as a cryptographic key, based on polynomial interpolation. Another trade publication, “Efficient Dispersal of Information for Security, Load Balancing, and Fault Tolerance”, by M. Rabin, Journal of the Association for Computing Machinery, Vol. 36, No. 2, April 1989, pgs. 335-348, also describes a method for information dispersal using an information dispersal algorithm. Unfortunately, these methods and other known information dispersal methods are computationally intensive and are thus not applicable for general storage of large amounts of data using the kinds of computers in broad use by businesses, consumers and other organizations today. Thus there is a need for a data storage system that is able to reliably and securely protect data that does not require the use of computation intensive algorithms.
Briefly, the present invention relates to a digital data storage system in which original data to be stored is separated into a number of data “slices” or subsets in such a manner that the data in each subset is less usable or less recognizable or completely unusable or completely unrecognizable by itself except when combined with some or all of the other data subsets. These data subsets are stored on separate digital data storage devices as a way of increasing privacy and security. After the system “slices” the original data into data subsets, a coding algorithm is used on the data subsets to create coded data subsets. Each data subset and its corresponding coded subset may be transmitted separately across a communications network and/or stored in separate storage nodes in an array of storage nodes. In order to recreate the original data, the data subsets and coded subsets are retrieved from some or all of the storage nodes or communication channels, depending on the availability and performance of each storage node and each communication channel. The original data is then recreated by applying a series of decoding algorithms to the retrieved data and coded data. In accordance with an important aspect of the invention, the system codes and decodes data subsets in a manner that is computationally efficient relative to known systems in order to enable broad use of this method using the types of computers generally used by businesses, consumers and other organizations currently.
These and other advantages of the present invention will be readily understood with reference to the following drawing and attached specification wherein:
The present invention relates to a data storage system. In order to protect the security of the original data, the original data is separated into a number of data “slices” or subsets. The amount of data in each slice is less usable or less recognizable or completely unusable or completely unrecognizable by itself except when combined with some or all of the other data subsets. In particular, the system in accordance with the present invention “slices” the original data into data subsets and uses a coding algorithm on the data subsets to create coded data subsets. Each data subset and its corresponding coded subset may be transmitted separately across a communications network and stored in a separate storage node in an array of storage nodes. In order to recreate the original data, data subsets and coded subsets are retrieved from some or all of the storage nodes or communication channels, depending on the availability and performance of each storage node and each communication channel. The original data is recreated by applying a series of decoding algorithms to the retrieved data and coded data.
As with other known data storage systems based upon information dispersal methods, unauthorized access to one or more data subsets only provides reduced or unusable information about the source data. In accordance with an important aspect of the invention, the system codes and decodes data subsets in a manner that is computationally efficient relative to known systems in order to enable broad use of this method using the types of computers generally used by businesses, consumers and other organizations currently.
In order to understand the invention, consider a string of N characters d0, d1, . . . , dN which could comprise a file or a system of files. A typical computer file system may contain gigabytes of data which would mean N would contain trillions of characters. The following example considers a much smaller string where the data string length, N, equals the number of storage nodes, n. To store larger data strings, these methods can be applied repeatedly. These methods can also be applied repeatedly to store large computer files or entire file systems.
For this example, assume that the string contains the characters, O L I V E R where the string contains ASCII character codes as follows:
d0=O=79
d1=L=76
d2,=I=73
d3,=V=86
d4,=E=69
d5=R=82
The string is broken into segments that are n characters each, where n is chosen to provide the desired reliability and security characteristics while maintaining the desired level of computational efficiency—typically n would be selected to be below 100. In one embodiment, n may be chosen to be greater than four (4) so that each subset of the data contains less than, for example, ¼ of the original data, thus decreasing the recognizablity of each data subset.
In an alternate embodiment, n is selected to be six (6), so that the first original data set is separated into six (6) different data subsets as follows:
A=d0, B=d1, C=d2, D=d3, E=d4, F=d5
For example, where the original data is the starting string of ASCII values for the characters of the text O L I V E R, the values in the data subsets would be those listed below:
A=79
B=76
C=73
D=86
E=69
F=82
In this embodiment, the coded data values are created by adding data values from a subset of the other data values in the original data set. For example, the coded values can be created by adding the following data values:
c[x]=d[n_mod(x+1)]+d[n_mod(x+2)]+d[n_mod(x+4)]
where:
For example, where the original data is the starting string of ASCII values for the characters of the text O L I V E R, the values in the coded data subsets would be those listed below:
cA=218
cB=241
cC=234
cD=227
cE=234
cF=241
In accordance with the present invention, the original data set 20, consisting of the exemplary data ABCDEF is sliced into, for example, six (6) data subsets A, B, C, D, E and F. The data subsets A, B, C, D, E and F are also coded as discussed below forming coded data subsets cA, cB, cC, cD, cE and cF. The data subsets A, B, C, D, E and F and the coded data subsets cA, cB, cC, cD, cE and cF are formed into a plurality of slices 22, 24, 26, 28, 30 and 32 as shown, for example, in
In order to retrieve the original data (or receive it in the case where the data is just transmitted, not stored), the data can reconstructed as shown in
For a variety of reasons, such as the outage or slow performance of a storage node 34, 36, 38, 40, 42 and 44 or a communications connection, not all data slices 22, 24, 26, 28, 30 and 32 will always be available each time data is recreated.
A=cC−D−E
where cC is a coded value and D and E are original data values, available from the slices 26, 28 and 30, which are assumed to be available from the nodes 38, 40 and 42, respectively. In this case the missing data value can be determined by reversing the coding equation that summed a portion of the data values to create a coded value by subtracting the known data values from a known coded value.
For example, where the original data is the starting string of ASCII values for the characters of the text O L I V E R, the data value of the A could be determined as follows:
A=234−86−69
Therefore A=79 which is the ASCII value for the character, O.
In other cases, determining the original data values requires a more detailed decoding equation. For example,
B=(cD−F+cF−cC)/2 1.
E=cD−F−B 2.
A=cF−B−D 3.
These equations are performed in the order listed in order for the data values required for each equation to be available when the specific equation is performed.
For example, where the original data is the starting string of ASCII values for the characters of the text O L I V E R, the data values of the B, E and A could be determined as follows:
B=(227−82+241−234)/2 1.
B=76
E=227−82−76 2.
E=69
A=241−76−86 3.
A=79
In order to generalize the method for the recreation of all original data ABCDEF when n=6 and up to three slices 22, 24, 26, 2830 and 32 are not available at the time of the recreation,
This table lists the 40 different outage scenarios where 1, 2, or 3 out of six storage nodes are not available or performing slow enough as to be considered not available. In the table in
The data values can be represented by the array d[x], where x is the node number where that data value is stored. The coded values can be represented by the array c[x].
In order to reconstruct missing data in an outage scenario where one node is not available in a storage array where n=6, the follow equation can be used:
d[0+offset]=c3d(2, 3, 4, offset)
where c3d( ) is a function in pseudo computer software code as follows:
where n_mod( ) is the function defined previously.
In order to reconstruct missing data in an outage scenario where two nodes are not available in a storage array where n=6, the equations in the table in
In order to reconstruct missing data in an outage scenario where three nodes are not available in a storage array where n=6, the equations in the table in
The example equations listed above are typical of the type of coding and decoding equations that create efficient computing processes using this method, but they only represent one of many examples of how this method can be used to create efficient information distribution systems. In the example above of distributing original data on a storage array of 6 nodes where at least 3 are required to recreate all the data, the computational overhead of creating the coded data is only two addition operations and three modulo operations per byte. When data is decoded, no additional operations are required if all storage nodes and communications channels are available. If one or two of the storage nodes or communications channels are not available when n=6, then only two additional addition/subtraction operations are required to decode each missing data value. If three storage nodes or communications channels are missing when n=6, then just three addition/subtraction operations are required for each missing byte in 11 of 12 instances—in that twelfth instance, only 4 computational operations are required (3 addition/subtractions and one division by an integer). This method is more computationally efficient that known methods, such as those described by Rabin and Shamir.
This method of selecting a computationally efficient method for secure, distributed data storage by creating coded values to store at storage nodes that also store data subsets can be used to create data storage arrays generally for configurations where n=4 or greater. In each case decoding equations such as those detailed above can be used to recreate missing data in a computationally efficient manner.
Coding and decoding algorithms for varying grid sizes which tolerate varying numbers of storage node outages without original data loss can also be created using these methods. For example, to create a 9 node grid that can tolerate the loss of 2 nodes, a candidate coding algorithm is selected that uses a mathematical function that incorporates at least two other nodes, such as:
c[x]=d[n_mod(x+1)]+d[n_mod(x+2)]
where:
In this example embodiment, n=9, the first data segment is separated into different data subsets as follows:
A=d0, B=d1, C=d2, D=d3, E=d4, F=d5, G=d6, H=d7, I=d8
Using this candidate coding algorithm equation above, the following coded values are created:
cA, cB, cC, cD, cE, cF, cG, cH, cI
The candidate coding algorithm is then tested against all possible grid outage states of up to the desired number of storage node outages that can be tolerated with complete data restoration of all original data.
The validity of the candidate coding algorithm can be tested by determining if there is a decoding equation or set of decoding equations that can be used to recreate all the original data in each outage Type and thus each outage case. For example, in the first outage case in
cH=I+A
A=cH−I
The missing data value B can then be created from cI as follows:
cI=A+B
B=cI−A
This type of validity testing can then be used to test if all original data can be obtained in all other instances where 2 storage nodes on a 9 node storage grid are not operating. Next, all instances where 1 storage node is not operating on a 9 node storage grid are tested to verify whether that candidate coding algorithm is valid. If the validity testing shows that all original data can be obtained in every instance of 2 storage nodes not operating on a 9 node storage grid and every instance of 1 storage node not operating on a 9 node storage grid, then that coding algorithm would be valid to store data on a 9 node storage grid and then to retrieve all original data from that grid if up to 2 storage nodes were not operating.
These types of coding and decoding algorithms can be used by those practiced in the art of software development to create storage grids with varying numbers of storage nodes with varying numbers of storage node outages that can be tolerated by the storage grid while perfectly restoring all original data.
Obviously, many modifications and variations of the present invention are possible in light of the above teachings. Thus, it is to be understood that, within the scope of the appended claims, the invention may be practiced otherwise than is specifically described above.
Number | Name | Date | Kind |
---|---|---|---|
4092732 | Ouchi | May 1978 | A |
5454101 | Mackay et al. | Sep 1995 | A |
5485474 | Rabin | Jan 1996 | A |
5774643 | Lubbers et al. | Jun 1998 | A |
5802364 | Senator et al. | Sep 1998 | A |
5809285 | Hilland | Sep 1998 | A |
5890156 | Rekieta et al. | Mar 1999 | A |
5987622 | Lo Verso et al. | Nov 1999 | A |
5991414 | Garay et al. | Nov 1999 | A |
6012159 | Fischer et al. | Jan 2000 | A |
6058454 | Gerlach et al. | May 2000 | A |
6128277 | Bruck et al. | Oct 2000 | A |
6192472 | Garay et al. | Feb 2001 | B1 |
6256688 | Suetaka et al. | Jul 2001 | B1 |
6272658 | Steele et al. | Aug 2001 | B1 |
6301604 | Nojima | Oct 2001 | B1 |
6356949 | Katsandres et al. | Mar 2002 | B1 |
6366995 | Vilkov et al. | Apr 2002 | B1 |
6374336 | Peters et al. | Apr 2002 | B1 |
6415373 | Peters et al. | Jul 2002 | B1 |
6418539 | Walker | Jul 2002 | B1 |
6449688 | Peters et al. | Sep 2002 | B1 |
6567948 | Steele et al. | May 2003 | B2 |
6571282 | Bowman-Amuah | May 2003 | B1 |
6609223 | Wolfgang | Aug 2003 | B1 |
6718361 | Basani | Apr 2004 | B1 |
6760808 | Peters et al. | Jul 2004 | B2 |
6785768 | Peters et al. | Aug 2004 | B2 |
6785783 | Buckland | Aug 2004 | B2 |
6826711 | Moulton et al. | Nov 2004 | B2 |
6879596 | Dooply | Apr 2005 | B1 |
7003688 | Pittelkow et al. | Feb 2006 | B1 |
7024451 | Jorgenson | Apr 2006 | B2 |
7024609 | Wolfgang et al. | Apr 2006 | B2 |
7080101 | Watson et al. | Jul 2006 | B1 |
7103824 | Halford | Sep 2006 | B2 |
7103915 | Redlich et al. | Sep 2006 | B2 |
7111115 | Peters et al. | Sep 2006 | B2 |
7140044 | Redlich et al. | Nov 2006 | B2 |
7146644 | Redlich et al. | Dec 2006 | B2 |
7171493 | Shu et al. | Jan 2007 | B2 |
7222133 | Raipurkar et al. | May 2007 | B1 |
7240236 | Cutts et al. | Jul 2007 | B2 |
7272613 | Sim et al. | Sep 2007 | B2 |
7366299 | Cheung | Apr 2008 | B2 |
20020062422 | Butterworth et al. | May 2002 | A1 |
20020166079 | Ulrich et al. | Nov 2002 | A1 |
20030018927 | Gadir et al. | Jan 2003 | A1 |
20030037261 | Meffert et al. | Feb 2003 | A1 |
20030065617 | Watkins et al. | Apr 2003 | A1 |
20030084020 | Shu | May 2003 | A1 |
20040024963 | Talagala et al. | Feb 2004 | A1 |
20040122917 | Menon et al. | Jun 2004 | A1 |
20040215998 | Buxton et al. | Oct 2004 | A1 |
20040228493 | Ma et al. | Nov 2004 | A1 |
20050100022 | Ramprashad | May 2005 | A1 |
20050114594 | Corbett et al. | May 2005 | A1 |
20050125593 | Karpoff et al. | Jun 2005 | A1 |
20050131993 | Fatula, Jr. | Jun 2005 | A1 |
20050132070 | Redlich et al. | Jun 2005 | A1 |
20050144382 | Schmisseur | Jun 2005 | A1 |
20050229069 | Hassner et al. | Oct 2005 | A1 |
20060047907 | Shiga et al. | Mar 2006 | A1 |
20060136448 | Cialini et al. | Jun 2006 | A1 |
20060156059 | Kitamura | Jul 2006 | A1 |
20060224603 | Correll, Jr. | Oct 2006 | A1 |
20070079081 | Gladwin | Apr 2007 | A1 |
20070079082 | Gladwin et al. | Apr 2007 | A1 |
20070079083 | Gladwin et al. | Apr 2007 | A1 |
20070088970 | Buxton et al. | Apr 2007 | A1 |
20070174192 | Gladwin | Jul 2007 | A1 |
20070234110 | Soran | Oct 2007 | A1 |
20070283167 | Venters, III et al. | Dec 2007 | A1 |
Number | Date | Country | |
---|---|---|---|
20070079081 A1 | Apr 2007 | US |