1. Field of the Invention
The present invention generally relates to protecting confidentiality of a file distributed and stored at a plurality of storage service providers. In particular, it concerns cloud storage. Cloud storage is a model of networked online storage where data is stored in virtualized pools of storage which are generally hosted by third parties. Hosting companies operate large data centers, and people who require their data to be hosted buy or lease storage capacity from them. The data centre operators, in the background, virtualize the resources according to the requirements of the customer and expose them as storage pools, which the customers can themselves use to store files or data objects. Physically, the resource may span across multiple servers.
If no precaution is taken, all the stored data can be accessed by the cloud operator who can potentially use it with a malicious intent (e.g. reselling the information to the client's competitors). Furthermore, even if the cloud operator is honest, the confidentiality of stored data can be compromised by attackers who have greater interest in attacking data centers which aggregate data of several companies and users rather than attacking a single enterprise network. Therefore there is a need to protect the confidentiality of data at a storage service provider.
2. Description of the Prior Art
One known solution consists in encrypting the data before outsourcing its storage. The drawback of this solution is that it is resource consuming (encryption for storage and decryption for retrieval). Additionally it requires a key management process to keep trace of the keys used to encrypt each data packet. It further implies to securely store the keys, because it gives full access to the data if the key is leaked.
Another known solution consists in segmenting data in several chunks and store the chunk respectively at different storage service providers so that none of them has access to the full data. This solution has the drawback that each storage service provider has access to the chunk that it stores. So it can still derive some confidential information from it. A countermeasure is to further encrypt each chunk, with the drawback of the previous solution.
Another known solution is described in the article of PAULO F OLIVERA ET AL: “Trusted Storage over Untrusted Networks”, GLOBECOM 2010, 2010 IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE, IEEE, PISCATAWAY, N.J., USA, 6 Dec. 2010. It comprises the steps of:
The randomly chosen n coefficients aj, with j=1, . . . , n, are different and independent from each other; but most of the n2 coefficients aij are not independent from each other because they are obtained by generating a Vandermonde matrix (aji-1).
The aim of the present invention is to provide a more secure technical solution for protecting confidentiality of data distributed and store on a plurality of storage service providers.
This can be solved by applying, the methods according to the invention.
A first object of the invention is a method for protecting confidentiality of a file distributed and stored at a plurality of storage service providers comprising the steps of:
characterized in that it further comprises the steps of:
The fact of computing linear combinations Ci of all the chunks of the file to be distributed and stored, and then storing the combinations respectively at a plurality of storage service providers, protects the confidentiality of the file without requiring encryption or key management, because it is necessary to get all the combinations for retrieving any chunk of the original file. So the providers cannot extract any information from these combinations unless all the n providers collude. The fact of using n2 randomly chosen coefficients and an over-combination improves the security.
According to a first peculiar embodiment of the method for protecting confidentiality of a file distributed and stored at a plurality of storage service providers, storing the set of coefficients ai1, . . . , ain, so that it can be re-associated with the combination Ci, for i=1, . . . , n, comprises the step of storing them at the storage service provider Oi, in association with the combination Ci and the file identifier ID′i, for i=1, . . . , n.
According to a second first peculiar embodiment of the method for protecting confidentiality of a file distributed and stored at a plurality of storage service providers, storing the set of coefficients ai1, . . . , ain, so that it can be re-associated with the combination Ci, for i=1, . . . , n, comprises the steps of:
According to a third peculiar embodiment of the method for protecting confidentiality of a file distributed and stored at a plurality of storage service providers, storing the set of coefficients ai1, . . . , ain so that it can be re-associated with the combination Ci, for i=1, . . . , n, comprises the step of storing the set of coefficients ai1, . . . , ain, in association with a file identifier ID′i and a provider identifier Oi, for i=1, . . . , n, in said file descriptor corresponding to the file, and being stored in said local memory.
Another object of the invention is a method for retrieving a file that has been protected by the method according to the invention, that comprises the steps of:
characterized in that it further comprises the steps of:
According to a first peculiar embodiment of the method for retrieving a file, retrieving n sets of coefficients ai1 . . . , ain for i=1, . . . , n, that respectively correspond to the n combinations C1, . . . , Cn, comprises the step of receiving them from the providers designated by the provider identifiers O1, . . . , On.
And associating each combination Ci with a corresponding set of coefficients ai1, ain, for i=1, . . . , n, comprises the step of directly associating the combination Ci with the set of coefficients ai1, . . . , ain.
According to a second peculiar embodiment of the method for retrieving a file, retrieving n sets of coefficients ai1, . . . , ain for i=1, . . . , n, that respectively correspond to the n combinations C1, . . . , Cn, comprises the step of receiving them from the providers designated by the provider identifiers O1, . . . , On;
And associating each combination Ci with a corresponding set of coefficients ai1, . . . , ain, for i=1, . . . , n comprises the steps of:
According to a third peculiar embodiment of the method for retrieving a file, retrieving n sets of coefficients ai1, . . . , ain, for i=1, . . . , n, that respectively correspond to the n combinations C1, . . . , Cn, comprises the step of reading them in the file descriptor corresponding to said file,
and wherein associating each combination Ci with a corresponding set of coefficients ai1, . . . , ain, for i=1, . . . , n comprises the step of directly associating the set of coefficients ai1, . . . , ain to the combination Ci, for i=1, . . . , n.
Another object of the invention is a method for protecting confidentiality of a file distributed and stored at a plurality of storage service providers, peculiarly well suited in situations where the storage is performed once but retrieval is performed many times or by many entities. It comprises the steps of:
Another object of the invention is a method for retrieving a file that has been protected by this last method. It comprises the steps of:
Other features and advantages of the present invention will become more apparent from the following detailed description of embodiments of the present invention, when taken in conjunction with the accompanying drawings.
In order to illustrate in detail features and advantages of embodiments of the present invention, the following description will be with reference to the accompanying drawings. If possible, like or similar reference numerals designate the same or similar components throughout the figures thereof and description, in which:
The proposed method requires the following means in the device:
1/ Means for associating a unique identifier ID to each file F to be distributed and stored at storage service providers. This process could potentially be complex but it can use the path of the file, or tags, or any other known method compatible with file system architectures.
2/ Input means enabling a user to choose a security parameter n. This parameter can be different for each file: A bigger value of n involves more computation but also more security. So the values of n can be chosen according to the wanted level of confidentiality such as: restricted, confidential, secret . . . .
3) Computing means executing a program comprising computer-executable instructions for performing the proposed method.
Step 1: The user chooses a security parameter n, lower than or equal to the number m of available storage service providers.
Step 2: The computing means segment the file F in n chunks S1, . . . , Sn. The chunks must comprise a same number of bits. The number of bits per chunk is the smallest integer which is bigger than size(F)/n and is a multiple of the size p of the field in which the operation takes place (the same field as the one where the coefficients are chosen). This process implies a padding of the file in order to have n chunks of the same size. Any standard padding algorithm can be used to perform this task.
Step 3: The computing means randomly choose n2 coefficients aij for i=1, . . . , n and j=1, . . . , n. Then the computing means verify that the chosen vectors (ai1, . . . , ain) for i=1, . . . n, are linearly independent, otherwise they generate the coefficients again (A set of vectors is linearly independent if and only if the only representations of the zero vector as linear combinations of its elements are trivial solutions).
Step 4: The computing means compute n linear combinations as:
Ci=ai1·S1+ . . . +aij·Sj+ . . . +ain·Sn for =1, . . . , n
Step 5: The computing means choose n storage service providers where the combinations will be stored (n is lower or equal to m). They choose them among the m available storage services providers, as a function of the identifier ID of the file to be stored and protected. They supply the respective identifiers O1, . . . , On designating the chosen storage service providers. There are many possibilities to perform the selection: a random selection is the easiest, but the criterion of choice could also take performance (load balancing), cost (pricing at each provider) or security (reputation of providers) into account, to have a better selection policy. Depending on the selection policy, one particular provider could be the local host itself, and a given provider could be chosen more than once (although this would decrease security).
The next steps can be made according to three variants called step 6a, step 6b, step 6c that are respectively illustrated by
The computing means send, to a provider Oi, for i=1, . . . , n:
The provider Oi, stores the combination Ci, and the set of coefficients ai1, . . . , ain, in association with the file identifier ID′.
The computing means locally store, in a local memory LM, a very simple file descriptor for the file F: This file descriptor comprises the file identifier ID′ designating the file F, and the provider identifiers O1, . . . , On designating the chosen storage service providers.
According to an improvement of this variant a, it is also possible to replace the unique file identifier ID′ by different file identifiers ID′l, . . . , ID′n, for the same file F. These file identifiers ID′l, . . . , ID′n are respectively stored at the n chosen providers respectively, and in the file descriptor corresponding to the file F, in the local memory LM. This improvement makes the combinations more difficult to link in case of providers' collusion.
The set of coefficients ak1, . . . , akn is determined by a permutation σ applied to the vectors (ai1, . . . , ain) for i=1, . . . , n:
(ak1, . . . , akn)=(aσ(n)1, . . . , aσ(n)i, . . . , aσ(n)n)=coefficients corresponding to the combination Cσ(i)
On the other hand, for i=1, . . . , n, the combination Cσ(i) is stored in association with the set of coefficients ai1, . . . , ain at the provider Oσ(i). The computing means define the permutation σ on the integers from 1 to n. Basically this permutation can be represented as an array (σ(1), . . . , σ(i), . . . , σ(n)) containing all the integers between 1 and n but in a non-ordered way.
The computing means store, at provider Oi, for i=1, . . . , n, in association:
The computing means locally store, in a local memory LM, a very simple file descriptor for the file F. This file descriptor comprises:
According to an improvement of this variant a, it is also possible to replace the unique file identifier ID′ by different file identifiers ID′l, . . . , ID′n, for the same file F. These file identifiers ID′l, . . . , ID′n are respectively stored at the n chosen providers respectively, and in the file descriptor corresponding to the file F, in the local memory LM. This improvement makes the combinations more difficult to link in case of providers' collusion.
The computing means generate different file identifiers ID′1, . . . , ID′n, for the same file F, as a function of the local identifier ID, by a one way function, for example a hash function—The computing means send, to a provider Oi, the file identifier ID′i and the combination Ci, for i=1, . . . , n, but they do not send any set of coefficients.
According to this variant c, the computing means locally store a larger file descriptor, in the local memory LM. For i=1, . . . , n, this file descriptor comprises triplets, each triplet comprising for i=1, . . . , n:
To reconstruct the file F, the computing means of the device request a combination and the set of coefficients corresponding to this combination, from each of the n chosen storage service providers (in any order).
Step 51: The computing means read the file identifier ID′ and the provider identifiers O1, . . . , On registered in the file descriptor corresponding to the file F, in the local memory LM. They send the identifier ID′ to the storage service providers Oi for i=1, . . . , n.
Step 52: Then the computing means receive, from the provider Oi, for i=1, . . . , n:
The combination Ci is directly associated to the set of coefficients ai1, . . . , ain, for the process of retrieving the file F.
Step 53: Then the computing means compute the inverse of the matrix A=(aij), and obtain a matrix B=(bij). Then they compute the original chunks as:
Si=bi1·C1+ . . . +bij·Cj+ . . . +bin·Cn for i=1, . . . , n.
Step 54: Then the computing means re-assemble the chunks S1, . . . , Sn and possibly removes the padding to reconstruct the file F.
Note that if different identifiers ID′l, . . . , ID′n have been respectively stored at each provider, for a same file F, the retrieval method is modified, since it comprises a modified step 51 consisting in reading the different identifiers ID′l, . . . , ID′n in the file descriptor, stored in the local memory LM, and respectively sending the identifiers ID′l, . . . , ID′n to the n chosen providers, instead of sending the unique identifier ID′.
To reconstruct the file F, the computing means of the device request a combination and a set of coefficients, from each of the n chosen storage service providers (in any order). But they must re-associate each combination with the corresponding set of coefficients that was used for computing this combination, since this combination and this set of coefficients were not stored at a same storage service provider. The permutation σ must be used again because the combination Cσ(i) (corresponding to the set of coefficients stored by Oi) is stored at the provider Oσ(i).
Step 61:
The computing means read the file identifier ID′, the provider identifiers O1, . . . , On, and the permutation σ, registered in the file descriptor corresponding to the file F, in the local memory LM.
They send the identifier ID′ to the n storage service providers designated by the provider identifiers O1, . . . , On.
They receive, from the provider Oi for i=1, . . . , n:
Step 62: Then they re-associate the combinations Ci and the corresponding set of coefficients ai1, . . . , ain, for i=1, . . . , n, by means of the permutation σ.
Step 63: Then they compute the inverse of the matrix A=(aij), and obtain a matrix B=(bij), and they compute the original chunks as:
Si=bi1·C1+ . . . +bij·Cj+ . . . +bin·Cn for i=1, . . . , n.
Step 64: Then they reconstruct the file F by assembling then n chunks S1, . . . , Sn and possibly removing the padding.
To reconstruct the file F, the computing means of the device request a combination and a set of coefficients corresponding to this combination, from each of the n chosen storage service providers (in any order).
Step 71: The computing means read the file identifiers ID′1, . . . , ID′n, and the provider identifiers O1, . . . , On registered in the file descriptor corresponding to the file F, in the local memory LM.
They send the file identifier ID′i to the provider Oi for i=1, . . . , n (The n identifiers ID′1, . . . , ID′n are different one from another).
In response, the computing means receive the combination Ci from the provider Oi, for I=1, . . . , n.
Step 72: They read the coefficients ai1, . . . , ain, in the local memory LM. The combination Ci is directly associated to the set of coefficients ai1, . . . , ain, for I=1, . . . n.
Step 73: Then they compute the inverse of the matrix A=(aij), and obtain a matrix B=(bij). Then they compute the n original chunks as:
Si=bi1·C1+ . . . +bij·Cj+ . . . +bin·Cn for i=1, . . . , n.
Step 74: Then they reconstruct the file F by assembling the n chunks S1, . . . , Sn and possibly removing the padding.
Security:
In every variant, a basic security of the scheme lays in the fact that it is impossible to recover any chunk Si without all the combinations C1, . . . , Cn. Hence the only possibility for operators to break the confidentiality of the file F, is to collude all together.
The variants a, b and c provide different levels of security, in the event of all n operators are colluding:
Resilience (Optional Improvement):
The only weakness of the method is that if one operator Oi loses (or corrupts) a combination Ci, then it becomes impossible to retrieve the file F.
A first improvement, in order to circumvent this drawback, consists in generating not n but n+k combinations and storing them at n+k operators. Then retrieving any n combinations out of the n+k combinations is possible and sufficient to reconstruct the file F. This means that it is possible to recover from the failure of up to k storage service providers.
When storing at n+k operators, the security remains the same: At least n operators have to collude to break the scheme. This solution applies well to the variants a and c.
A different improvement for variant c, in order to circumvent this drawback, consists in generating at least one additional combination (that we call over-combination) on-the-fly; this can be performed without requiring to reconstitute the original file F. An over-combination OC′ is a linear combination of the combinations Ci for i=1, . . . , n. This improvement comprises the steps of:
The n super-coefficients a′j and the over-combination OC′ can be generated and stored by any entity comprising computing means and a memory. We call it “orchestrator” because it manages the retrieving of the combinations and over-combinations for ensuring resiliency. The orchestrator can be located at any place in the network:
Functionally, the orchestrator is an intermediary in charge of managing availability of the data without being necessarily trusted by the source. This solution hence suits variant c very well, and the orchestrator is given the n combinations C1, . . . , Cj . . . , Cn without the coefficients ai1, . . . , ain for i=1, . . . , n.
Retrieving the original file F is performed by a method similar to the method described above because a linear over-combination OC′=a′1·C1+ . . . +a′j·Cj+ . . . +a′n·Cn is also a linear combination of the segments Si for i=1, . . . , n. Indeed Ci=ai1·S1+ . . . +aij·Sj+ . . . +ain·Sn, for i=1, . . . , n. The corresponding coefficients in terms of the segments Si are thus obtained by multiplying the coefficients a′i of the over-combination OC′ in terms of Ci, and the coefficients aij of the combinations Ci in terms of segments Si.
We can express the over-combination OC′ in term of the segments Si as follows:
C′=(a′1·a11+ . . . +a′i·ai1+ . . . +a′n·an1)·S1+ . . . +(a′1·a1j+ . . . +a′i·aij+ . . . +a′n·anj)·Sj+ . . . +(a′1·a1n+ . . . +a′i·ain+ . . . +a′n·ann)·Sn for i=1, . . . , n
So the retrieving can be done by inverting a matrix.
For instance, the orchestrator that manages resiliency of the data in the cloud comprises a memory for storing the combinations C1, . . . , Cj, . . . , Cn, and computing means for:
(Optionally it can generate a plurality of sets of super-coefficients for computing a plurality of super-combinations: one over-combination enables to circumvent the loss of only one storage service provider, while k over-combinations enable to circumvent the loss of k storage service providers). The over-combination OC′ is then stored at a different storage provider. The orchestrator monitors the availability of the combinations stored at the storage providers. Whenever the Orchestrator detects a faulty storage operator, it generates an additional over-combination and stores it at another storage provider.
For retrieving the original file F, the user device requests the retrieving of file F (designated by an identifier ID′) from the orchestrator.
The orchestrator requests the n combinations respectively from the n storage operators.
The computing means of the user device retrieve all the segments S1, . . . , Sn by:
Then computing n chunks S1, . . . , Sn of said file F as
Si=bi1·C1+ . . . +bij·Cj+ . . . +bin·OC′ for i=1, . . . , n.
The computation presented before with one over-combination can be generalized to the case of a plurality of over-combinations in a straightforward way. M super-combinations enable to cover the cases where up to M storage operators have failed. If k storage operators have failed (with k<M), n-k chunks (and n-k sets of coefficients) and k over-combinations (with their associated coefficients) are used for the matrix inversion. Since an over-combination can be expressed as an original combination, by the formula C′=(a′1·a11+ . . . +a′i·ai1+ . . . +a′n·an1)·S1+ . . . +(a′1·a1j+ . . . +a′i·aij+ . . . +a′n·anj)·Sj+ . . . +(a′1·a1n+ . . . +a′i·ain+ . . . +a′n·ann)·Sn for i=1, . . . , n there is no difficulty to use over-combinations instead of lost combinations, both are processed in the same way.
The variant b, on the contrary, is strictly designed with n combinations and n2 coefficients for n chunks. Resilience to the loss of a combination could be provided by another method, through redundancy or error coding but not as an intrinsic feature of the proposed method.
Performance Trade-Off (Optional Improvement):
A key aspect of the proposed method is its efficiency and the fact that it conciliates security and performance. To further stress on this aspect, we can offer an additional trade-off in the computation between the user who stores the data (we will call it the source for simplicity) and the users who retrieve the data (we will call them fetchers as they are not necessarily the same as the one who stores the data). In the above described embodiments, the source performs the combination and the fetchers perform the reconstruction which implies a matrix inversion, so the efforts are halved between the source and the fetchers.
However, we can also propose that the source computes the combinations and additionally pre-computes the inverse matrix and directly stores the coefficients bij of the inverse matrix instead of the direct coefficients aij, at the storage service providers or at the local memory LM. This will then enable fetchers to retrieve directly the inverse matrix and perform a simple matrix multiplication to reconstruct the file F. Hence this mode of operation increases the computation burden on the source but drastically decreases it for the fetchers, and is therefore well suited in situations where the storage is performed once but retrieval is performed many times or by many entities. This mode of operation is compatible with all three approaches (a), (b) and (c).
As concerns retrieving the file F, the reconstruction steps 53, 63, 73 are modified by suppressing the computation of the inverse of the matrix A=(aij), since the computing means directly receive a matrix B=(bij) either from the providers or from the local memory LM.
A final remark concerns the local memory LM to store file descriptors that the device has to store in all three variants (although they have different sizes and contain different information). There are multiple possibilities to store these descriptors, which meet different security levels. The descriptors could indeed be stored:
The first two options are particularly relevant for very secret documents in combination with variant c, while the last one should not be used except for not so sensitive data in combination with variant a, and with some guarantee that this last provider will not collude with the previous ones.
Number | Date | Country | Kind |
---|---|---|---|
12305544 | May 2012 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2013/058075 | 4/18/2013 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2013/171017 | 11/21/2013 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
8132073 | Bowers | Mar 2012 | B1 |
8374340 | Puech | Feb 2013 | B2 |
9047475 | Orsini | Jun 2015 | B2 |
20030079222 | Boykin | Apr 2003 | A1 |
20070214314 | Reuter | Sep 2007 | A1 |
20100037056 | Follis | Feb 2010 | A1 |
20120331249 | Benjamin | Dec 2012 | A1 |
20130204849 | Chacko | Aug 2013 | A1 |
Number | Date | Country |
---|---|---|
2003-296179 | Oct 2003 | JP |
2004-118239 | Apr 2004 | JP |
2007-242019 | Sep 2007 | JP |
Entry |
---|
“Trusted Storage over Untrusted Networks”, Paulo Oliveira et al., Publication—IEEE Globecom 2010 proceedings, 2010. |
Paulo F. Oliveira et al., “Trusted Storage over Untrusted Networks,” IEEE Global Telecommunications Conference, pp. 1-5, XP031845991, Dec. 6, 2010. |
International Search Report for PCT/EP2013/058075 dated Jul. 22, 2013. |
Number | Date | Country | |
---|---|---|---|
20150161411 A1 | Jun 2015 | US |