Virtualized data storage vaults on a dispersed data storage network

Information

  • Patent Grant
  • 8190662
  • Patent Number
    8,190,662
  • Date Filed
    Tuesday, April 26, 2011
    13 years ago
  • Date Issued
    Tuesday, May 29, 2012
    12 years ago
Abstract
A slice server includes a network port, a central processing unit, and memory. The central processing unit (CPU) is operable to receive, via the network port, a request to access a virtual digital data storage vault. The CPU then determines whether the slice server supports the virtual digital data storage vault. When the slice server supports the virtual digital data storage vault, the CPU determines whether the request is valid. When the request is valid, the CPU executes the request to generate a response.
Description
FIELD OF THE INVENTION

The present invention relates generally to systems, apparatus, and methods for distributed data storage, and more particularly to systems, apparatus, and methods for distributed data storage using an information dispersal algorithm so that no one location will store an entire copy of stored data, and more particularly still to systems, apparatus, and methods for using a fixed number of slice servers to implement a plurality of dispersed data storage networks.


DESCRIPTION OF THE PRIOR ART

Storing data in digital form is a well-known problem associated with all computer systems, and numerous solutions to this problem are known in the art. The simplest solution involves merely storing digital data in a single location, such as a punch film, hard drive, or FLASH memory device. However, storage of data in a single location is inherently unreliable. The device storing the data can malfunction or be destroyed through natural disasters, such as a flood, or through a malicious act, such as arson. In addition, digital data is generally stored in a usable file, such as a document that can be opened with the appropriate word processing software, or a financial ledger that can be opened with the appropriate spreadsheet software. Storing an entire usable file in a single location is also inherently insecure as a malicious hacker only need compromise that one location to obtain access to the usable file.


To address reliability concerns, digital data is often “backed-up,” i.e., an additional copy of the digital data is made and maintained in a separate physical location. For example, a backup tape of all network drives may be made by a small office and maintained at the home of a trusted employee. When a backup of digital data exists, the destruction of either the original device holding the digital data or the backup will not compromise the digital data. However, the existence of the backup exacerbates the security problem, as a malicious hacker can choose between two locations from which to obtain the digital data. Further, the site where the backup is stored may be far less secure than the original location of the digital data, such as in the case when an employee stores the tape in her home.


Another method used to address reliability and performance concerns is the use of a Redundant Array of Independent Drives (“RAID”). RAID refers to a collection of data storage schemes that divide and replicate data among multiple storage units. Different configurations of RAID provide increased performance, improved reliability, or both increased performance and improved reliability. In certain configurations of RAID, when digital data is stored, it is split into multiple units, referred to as “stripes,” each of which is stored on a separate drive. Data striping is performed in an algorithmically certain way so that the data can be reconstructed. While certain RAID configurations can improve reliability, RAID does nothing to address security concerns associated with digital data storage.


One method that prior art solutions have addressed security concerns is through the use of encryption. Encrypted data is mathematically coded so that only users with access to a certain key can decrypt and use the data. Common forms of encryption include DES, AES, RSA, and others. While modern encryption methods are difficult to break, numerous instances of successful attacks are known, some of which have resulted in valuable data being compromised.


In 1979, two researchers independently developed a method for splitting data among multiple recipients called “secret sharing.” One of the characteristics of secret sharing is that a piece of data may be split among n recipients, but cannot be known unless at least t recipients share their data, where n≧t. For example, a trivial form of secret sharing can be implemented by assigning a single random byte to every recipient but one, who would receive the actual data byte after it had been bitwise exclusive orred with the random bytes. In other words, for a group of four recipients, three of the recipients would be given random bytes, and the fourth would be given a byte calculated by the following formula:

s′=s⊕ra⊕rb⊕rc,

where s is the original source data, ra, rb, and rc are random bytes given to three of the four recipients, and s′ is the encoded byte given to the fourth recipient. The original byte s can be recovered by bitwise exclusive-orring all four bytes together.


The problem of reconstructing data stored on a digital medium that is subject to damage has also been addressed in the prior art. In particular, Reed-Solomon and Cauchy Reed-Solomon coding are two well-known methods of dividing encoded information into multiple slices so that the original information can be reassembled even if all of the slices are not available. Reed-Solomon coding, Cauchy Reed-Solomon coding, and other data coding techniques are described in “Erasure Codes for Storage Applications,” by Dr. James S. Plank, which is hereby incorporated by reference.


Schemes for implementing dispersed data storage networks (“DDSN”), which are also known as dispersed data storage grids, are also known in the art. In particular, U.S. Pat. No. 5,485,474, issued to Michael O. Rabin, describes a system for splitting a segment of digital information into n data slices, which are stored in separate devices. When the data segment must be retrieved, only m of the original data slices are required to reconstruct the data segment, where n>m.


Prior art DDSN systems are only viable for extremely specialized applications, as implementing an effective DDSN requires that a user setup a network of slice servers in multiple physically disparate locations. Existing directory service software will not effectively manage access to a DDSN, particularly as a DDSN does not have physical resources in the sense of a disk drive or directory, but rather is a type of virtual drive, where information is spread across numerous slice servers. Therefore, software for managing access to a DDSN would make DDSN technology accessible to a wider variety of applications.


In addition, the management and administration of a DDSN presents other problems that are not associated with prior art systems. For example, different users of a DDSN may want to store their data in different ways, i.e., one user may want all of their data compressed to save on storage space, while another user may not want to compress their data to improve retrieval speed. Further, a network of slice servers can be used to implement numerous DDSNs, each having different characteristics, and using a subset or all of the available slice servers to store data.





BRIEF DESCRIPTION OF THE DRAWINGS

Although the characteristic features of this invention will be particularly pointed out in the claims, the invention itself, and the manner in which it may be made and used, may be better understood by referring to the following description taken in connection with the accompanying drawings forming a part hereof, wherein like reference numerals refer to like parts throughout the several views and in which:



FIG. 1 is a network diagram of a dispersed data storage network constructed in accordance with an embodiment of the disclosed invention;



FIG. 2 is a simplified network diagram of the operation of one aspect of the disclosed invention by which a plurality of dispersed data storage networks can be implemented from a set of slice servers;



FIG. 3 is a flowchart illustrating the process by which a slice server authenticates requests received from various computers accessing a dispersed data storage network;



FIG. 4 is a data relationship diagram illustrating the relationship between user accounts and virtualized data storage vaults, as well as the structure of account and vault constructs.





DETAILED DESCRIPTION OF THE ILLUSTRATED EMBODIMENT

Turning to the Figures and to FIG. 1 in particular, a distributed computer system implementing a dispersed data storage grid 100 is shown. An arbitrary number of slice servers 150-162 store data slices sent to them by networked client computers 102,104,106. As illustrated, some number of grid access computers 120,122 allows access to the slice servers 150-162 by the client computers 102,104,106. Data segments are written to the grid by client computers 102,104,106. In accordance with an information dispersal algorithm, the data segments are sliced into multiple data slices that are then stored on slice servers 150-162.


As explained herein, the disclosed invention allows a network of slice servers to implement numerous dispersed data storage networks. In accordance with the disclosed invention, a subset of the available slice servers 150-162 is associated with a user account to form a dispersed data storage network. This information is stored in an accessible location, such as a grid access computer 120,122, on each client computer 102,104,106, or elsewhere. This software construct, which is referred to herein as a “vault,” allows for numerous DDSNs to be implemented from a network of slice servers. Each vault makes use of some number of slice servers, and a particular slice server may be associated with any number of vaults. There is no fixed relation between slice servers comprising a vault, except by the vault construct itself. By example, a first vault may be comprised of 16 slice servers. A second vault may utilize 4 slice servers in common with the first vault, and an additional 8 that are not used by the first vault.


In addition to storing information about what slice servers make up a particular DDSN, a vault will also store other information pertinent to the operation of a DDSN. This information includes what information dispersal algorithm (“IDA”) is used on the DDSN, as well as the information required to operate the particular IDA, such as the number of slices that each data segment is divided into as well, which is also referred to as the quantity n, and the minimum number of data slices required to reconstruct a stored data segment, which is also referred to as the quantity m.


The vault also conglomerates other information that is relevant to the operation of a DDSN. The total storage that is available in a particular vault is stored, as well as the amount of storage that is presently occupied by data segments. In a fee-for-service system, this will prevent a particular user from using more storage than was paid for. In addition, a particular vault may require that data be encrypted, either before it is sliced, after it is sliced, or both before and after it is sliced. Accordingly, the vault structure can contain a field indicating that data segments and/or data slices are encrypted, as well as the particular algorithm that is used for encryption.


For certain applications, data stored on a DDSN may be compressed to increase the total amount of storage available. However, the use of compression can increase the time required to write and retrieve data. Accordingly, the vault can contain a field indicating if compression is to be used, and what type of compression should be used. In addition, while almost every DDSN makes use of integrity checks, certain applications may be better served by different types of integrity checks. For this purpose, the vault may contain a field allowing a user to specify a specific type of integrity check to be used for stored data segments as well as for stored data slices.


In addition to storing information about the particular DDSN associated with a vault, a vault may also include an access control list specifying which accounts are allowed to access the vault, and what permissions are associated with that account. For example, one user may have full access to a vault, while another user may only be allowed to read data segments from the vault, and not write data segments to, or modify data segments stored on the vault.



FIG. 2 explains the process of how access to a DDSN is handled through a vault. A user logs into a particular account at a client computer 202. As part of the login process, a grid access computer 212 assembles a vault definition, which may be resident on the grid access computer 212, stored on the slice servers 222, 224, 226 as distributed data, or stored elsewhere. The vault structure moderates access to a DDSN comprised of slice servers 222,224,226 by the client computer 202.



FIG. 3 illustrates the process by which a slice server authenticates a request from a client. After a client has logged into a vault, a client computer will originate one or more requests in step 302. Those requests will be directed to the appropriate slice server, and the slice server will validate that it can accept requests from the vault identified in the request in step 303. If the slice server cannot accept requests from the identified vault, an error is generated in step 304. The slice server also validates that the account identified in the request is allowed to make the specified request in step 305. If the slice server accepts requests from the identified vault and the identified account is allowed to make the specified request, the slice server will execute the request in step 307, and send a response back to the requesting client in step 308.



FIG. 4 illustrates the relationship between user accounts and vaults. Three vaults 402,404,406 are depicted, as well as nine users 410-418. Users 410, 411, and 412 have access to vault 402. User 412 also has access to vault 2, and as indicated, there is a many to many relationship between vaults and user accounts. Data structure 440 illustrates one way that vault information could be maintained. In particular, the illustrated structure shows the information dispersal algorithm used on the DDSN associated with the vault, i.e., Cauchy-Reed Solomon. In addition, the information dispersal parameters are identified, i.e., data segments are divided into 112 data slices, of which any 18 may be lost without compromising the integrity of the stored data. Further, the vault data structure shows that no data compression is used, and that CRC-32 is used as an integrity check for both stored data segments and stored data slices. As illustrated, the data structure 440 does not indicate if stored data is encrypted, although alternative data structures could. Finally, data structure 440 lists three accounts that are allowed to access this particular vault. In addition to listing the associated accounts, the permissions granted to those accounts could also be listed here as well. As permissions are well-known in the art, they are not discussed further here.



FIG. 4 also shows data structure 430, which illustrates one way that a user account could be represented, namely by a username and a password. However, this particular representation of a user account is not a limitation of the invention; other methods well-known in the prior art would work just as well, for instance, biometric information.


The foregoing description of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or to limit the invention to the precise form disclosed. The description was selected to best explain the principles of the invention and practical application of these principles to enable others skilled in the art to best utilize the invention in various embodiments and various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention not be limited by the specification, but be defined by the claims set forth below.

Claims
  • 1. A slice server comprises: a network port operable for coupling to a network;a central processing unit operably coupled to the network port; andmemory operably coupled to the central processing unit, wherein the central processing unit functions to: receive, via the network port, a request to access a virtual digital data storage vault;determine whether the virtual digital data storage vault is supported by the slice server;when the virtual digital data storage vault is supported by the slice server, determine whether the request is valid; andwhen the request is valid, execute the request to generate a response.
  • 2. The slice server of claim 1, wherein the central processing unit further functions to determine whether the request is valid by: accessing a first vault data structure when the request corresponds to accessing a first virtual digital data storage vault, wherein the first vault data structure includes first user account information; andaccessing a second vault data structure when the request corresponds to accessing a second virtual digital data storage vault, wherein the second vault data structure includes second user account information.
  • 3. The slice server of claim 1, wherein the central processing unit further functions to determine whether the request is valid by: validating a user account identified in the request; andwhen the user account is valid, validating the request based on permissions associated with the user account.
  • 4. The slice server of claim 1, wherein the central processing unit further functions to determine whether the virtual digital data storage vault is supported by the slice server by: accessing a list of virtual digital data storage vaults; anddetermining whether the virtual digital data storage vault is on the list.
  • 5. The slice server of claim 1, wherein the central processing unit further functions to: generate an error when the virtual digital data storage vault is not supported by the slice server or when the request is invalid.
  • 6. The slice server of claim 1 further functions to: store a slice of a first data segment in a first section of a corresponding portion of a first supported virtual digital data storage vault;store a slice of a second data segment in a second section of the corresponding portion of the first supported virtual digital data storage vault;store a slice of a third data segment in a first section of a corresponding portion of a second supported virtual digital data storage vault; andstore a slice of a fourth data segment in a second section of the corresponding portion of the second supported virtual digital data storage vault.
  • 7. The slice server of claim 1, wherein the central processing unit further functions to execute the request to generate a response by at least one of: executing a read request and outputting a slice corresponding to the read request as the response;executing a write request and outputting a corresponding write response; andexecuting a modify request and outputting a corresponding modify response.
Parent Case Info

This patent application is claiming priority under 35 USC §120 as a continuing patent application of co-pending patent application entitled VIRTUALIZED DATA STORAGE VAULTS ON A DISPERSED DATA STORAGE NETWORK, having a filing date of Dec. 8, 2009, and a Ser. No. 12/633,779, which claims priority under 35 USC §120 as a continuing patent application of now issued patent entitled VIRTUALIZED DATA STORAGE VAULTS ON A DISPERSED DATA STORAGE NETWORK, having a filing date of Oct. 9, 2007, and a Ser. No. 11/973,621 (U.S. Pat. No. 7,904,475), which are incorporated herein by reference in their entirety and made part of the present U.S. Utility Patent Application for all purposes.

US Referenced Citations (83)
Number Name Date Kind
4092732 Ouchi May 1978 A
5454101 Mackay et al. Sep 1995 A
5485474 Rabin Jan 1996 A
5774643 Lubbers et al. Jun 1998 A
5802364 Senator et al. Sep 1998 A
5809285 Hilland Sep 1998 A
5890156 Rekieta et al. Mar 1999 A
5987622 Lo Verso et al. Nov 1999 A
5991414 Garay et al. Nov 1999 A
6012159 Fischer et al. Jan 2000 A
6058454 Gerlach et al. May 2000 A
6128277 Bruck et al. Oct 2000 A
6175571 Haddock et al. Jan 2001 B1
6192472 Garay et al. Feb 2001 B1
6256688 Suetaka et al. Jul 2001 B1
6272658 Steele et al. Aug 2001 B1
6301604 Nojima Oct 2001 B1
6356949 Katsandres et al. Mar 2002 B1
6366995 Vilkov et al. Apr 2002 B1
6374336 Peters et al. Apr 2002 B1
6415373 Peters et al. Jul 2002 B1
6418539 Walker Jul 2002 B1
6449688 Peters et al. Sep 2002 B1
6567948 Steele et al. May 2003 B2
6571282 Bowman-Amuah May 2003 B1
6609223 Wolfgang Aug 2003 B1
6718361 Basani et al. Apr 2004 B1
6760808 Peters et al. Jul 2004 B2
6785768 Peters et al. Aug 2004 B2
6785783 Buckland Aug 2004 B2
6826711 Moulton et al. Nov 2004 B2
6879596 Dooply Apr 2005 B1
7003688 Pittelkow et al. Feb 2006 B1
7024451 Jorgenson Apr 2006 B2
7024609 Wolfgang et al. Apr 2006 B2
7080101 Watson et al. Jul 2006 B1
7103824 Halford Sep 2006 B2
7103915 Redlich et al. Sep 2006 B2
7111115 Peters et al. Sep 2006 B2
7140044 Redlich et al. Nov 2006 B2
7146644 Redlich et al. Dec 2006 B2
7171493 Shu et al. Jan 2007 B2
7222133 Raipurkar et al. May 2007 B1
7240236 Cutts et al. Jul 2007 B2
7272613 Sim et al. Sep 2007 B2
7953771 Gladwin et al. May 2011 B2
20020062422 Butterworth et al. May 2002 A1
20020166079 Ulrich et al. Nov 2002 A1
20030018927 Gadir et al. Jan 2003 A1
20030037261 Meffert et al. Feb 2003 A1
20030065617 Watkins et al. Apr 2003 A1
20030084020 Shu May 2003 A1
20040024963 Talagala et al. Feb 2004 A1
20040122917 Menon et al. Jun 2004 A1
20040215998 Buxton et al. Oct 2004 A1
20040228493 Ma et al. Nov 2004 A1
20050100022 Ramprashad May 2005 A1
20050114594 Corbett et al. May 2005 A1
20050125593 Karpoff et al. Jun 2005 A1
20050131993 Fatula, Jr. Jun 2005 A1
20050132070 Redlich et al. Jun 2005 A1
20050144382 Schmisseur Jun 2005 A1
20050229069 Hassner Oct 2005 A1
20060047907 Shiga et al. Mar 2006 A1
20060136448 Cialini et al. Jun 2006 A1
20060156059 Kitamura Jul 2006 A1
20060224603 Correll, Jr. Oct 2006 A1
20070079081 Gladwin et al. Apr 2007 A1
20070079082 Gladwin et al. Apr 2007 A1
20070079083 Gladwin et al. Apr 2007 A1
20070088970 Buxton et al. Apr 2007 A1
20070174192 Gladwin et al. Jul 2007 A1
20070214285 Au et al. Sep 2007 A1
20070234110 Soran et al. Oct 2007 A1
20070283167 Venters, III et al. Dec 2007 A1
20090094251 Gladwin et al. Apr 2009 A1
20090094318 Gladwin et al. Apr 2009 A1
20100023524 Gladwin et al. Jan 2010 A1
20100161916 Thornton et al. Jun 2010 A1
20100169415 Leggette et al. Jul 2010 A1
20100250497 Redlich et al. Sep 2010 A1
20100250751 Leggette et al. Sep 2010 A1
20100306578 Thornton et al. Dec 2010 A1
Related Publications (1)
Number Date Country
20110202568 A1 Aug 2011 US
Continuations (2)
Number Date Country
Parent 12633779 Dec 2009 US
Child 13094375 US
Parent 11973621 Oct 2007 US
Child 12633779 US