DYNAMIC DATABASE UPDATE IN MULTI-SERVER PRIVATE INFORMATION RETRIEVAL SCHEME

Information

  • Patent Application
  • 20140344944
  • Publication Number
    20140344944
  • Date Filed
    April 30, 2014
    10 years ago
  • Date Published
    November 20, 2014
    10 years ago
Abstract
A system and methods to provide updates of an oblivious database that is based on an original database without compromising privacy guarantees, and without requiring a periodic downtime to re-initialize the database. According to embodiments of the present invention, update caches are provided at the random servers that are not emptied or sent to the oblivious database after every update in a predictable fashion. Instead, updates are made incrementally to the oblivious database in an order that is independent of how the original database is updated. Hence there is no way for the server to learn which record of the oblivious database corresponds to an updated block from the original database.
Description
FIELD OF THE INVENTION

The present invention relates to multi-server private information retrieval schemes, and in particular to enabling database updates to occur concurrently with user queries, without allowing the update process to compromise the privacy of user queries and the database content.


BACKGROUND OF THE INVENTION

A critical privacy protection that users crave is preventing information they consider sensitive from being inadvertently leaked as they query or access Internet services. In other words, users see the problem of preserving their access privacy to online services as an important concern that must be addressed. A cryptographically sound approach to protect access privacy it to use the technique of private information retrieval (PIR). PIR schemes, as are known in the art, allow a user to access data from service providers without the service providers being able to learn any information about which particular data item was accessed or retrieved.


One such PIR scheme requires a database to be replicated to two or more servers that are assumed not to be colluding. A query received from a user is separated into different parts, and each part is sent to a different server. The returned result from each server, based on the portion of the query each received, is returned back to the client where the results are combined to provide a complete response to the full query. However, concerns remain about the practicality of having an organization replicate its database to the servers of multiple different cloud services that are assumed not to collude. Replicating a database to multiple independent cloud servers increases the chances of the data being broken into, used without consent, or used for illegitimate purposes. In short, it is inconceivable that an organization would ever want to give out a copy of its database especially as it may represent their intellectual property, trade secret, or asset.


To address the database replication problem, a random server model of PIR was introduced by Gertner et al. (Yael Gertner, Shafi Goldwasser, and Tal Malkin. A Random Server Model for Private Information Retrieval or how to Achieve Information Theoretic PIR Avoiding Database Replication. In RANDOM '98, pages 200-217, 1998). This model attempts to separate the task of providing query privacy from that of information retrieval using auxiliary random servers running databases containing random data. The database server uses the service of two or more random servers to generate an encrypted and permuted version of its database and to help keep the user queries private. Of particular interest in this solution are universal random servers, which are a type of auxiliary servers holding random data that is completely independent of the content of the database. Gertner et al. proposed a scheme that achieves total independence, i.e., all random servers are of the universal type—they contain no information derived from content of the dataset, thereby addressing the database replication problem. In other words, the scheme provides user privacy according to the underlying PIR scheme used with the scheme, and database privacy (no single server or a coalition can learn any information about the content of the database).


Gertner et al.'s secure multi-party computation (SMC) protocol enables the server holding a database x and two auxiliary random servers each holding a random database α and a pseudorandom permutator π, to compute an initial oblivious database y=π(x⊕α). However, their protocol must be rerun to re-compute y after a large (e.g., sublinear) number of queries have been run or whenever the database x is updated. But, naively updating the oblivious database y with updates from x leaks information about π. In other words, an update to some record xi would require an update to be made to some oblivious database block yj and the server maintaining the database is able to learn that j=π−1(i). Hence, finding a periodic downtime to rerun the SMC is prescribed to update the oblivious database y. As a result, users would have to suspend making query requests during the SMC protocol rerun because the random servers would be preoccupied. Such a wait is undesirable in environments where database changes are frequent and query downtimes are unacceptable. The second problem with this scheme is that it expects the same random database to be used to mask multiple databases belonging to different organizations, which can lead to significant attacks in practice (i.e., the attacker learns r by running several queries across databases and uses this knowledge to learn the blocks of a target y much faster).


SUMMARY OF THE INVENTION

The present invention alleviates the problems described above by providing a system and methods to provide updates of the database x without compromising privacy guarantees, and without requiring a periodic downtime to re-initialize the database. According to embodiments of the present invention, update caches are provided at the random servers that are not emptied or sent to the oblivious database y after every update in a predictable fashion. Instead, updates are made incrementally to the oblivious database in an order that is independent of how the database x is updated. Hence there is no way for the server to learn which record yj corresponds to an undated block xi. Utilizing the present invention makes a multi-server PIR deployment more feasible in an environment where the database changes frequently.


Therefore, it should now be apparent that the invention substantially achieves all the above aspects and advantages. Additional aspects and advantages of the invention will be set forth in the description that follows, and in part will be obvious from the description, or may be learned by practice of the invention. Moreover, the aspects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the appended claims.





DESCRIPTION OF THE DRAWINGS

The accompanying drawings illustrate presently preferred embodiments of the invention, and together with the general description given above and the detailed description given below, serve to explain the principles of the invention. As shown throughout the drawings, like reference numerals designate like or corresponding parts.



FIG. 1 is a block diagram of a system that allows database updates to occur concurrently with query processing, without leaking any information about the correspondence between an updated record in the database and the respective oblivious database blocks according to embodiments of the present invention;



FIG. 2 is a flowchart illustrating a preprocessing setup to enable computation of the oblivious database according to an embodiment of the present invention;



FIG. 3 is a flowchart illustrating the processing of a user query according to an embodiment of the present invention; and



FIG. 4 is a flowchart illustrating the updating of the original database according to an embodiment of the present invention.





DETAILED DESCRIPTION OF THE PRESENT INVENTION

In describing the present invention, reference is made to the drawings, wherein there is seen in FIG. 1 in block diagram form a portion of a system that allows database updates to occur concurrently with query processing, without leaking any information about the correspondence between an updated record in the database and the respective oblivious database blocks according to embodiments of the present invention. A Server D 10 holds the original database x 12 consisting of r records or blocks each of b bits in length and an oblivious database y 18 that is based on the original database 12 (as described further below). Alternatively, the database y could be held by another server if desired. A plurality of l universal auxiliary random servers Server A1 20, Server A2 22, . . . Server Al 24 each hold a random database α 30, and a pseudorandom permutator processing device π 32, where π:[1 . . . r]→[1 . . . r]. A user can perform a query to retrieve data stored in the oblivious database 18 (and hence obtain information from the original database 12) using a client device 16, which may be, for example, any type of computing device such as a personal computer, smartphone, tablet, etc. that can access the network 14 and request a search. Because the data is obtained from the oblivious database 18, the server 10 (or other server that performs the search) does not know what content was actually returned to the client device 16, thereby maintaining the privacy of the user query. Each of the servers may be operated, for example, by a cloud service provider. Each of the servers is coupled to a network 14, such as, for example the Internet. The servers 10-24 may be a mainframe or the like that includes at least one processing device (not shown). Servers 10-24 may be specially constructed for the required purposes, or may comprise a general purpose computer selectively activated or reconfigured by a computer program (described further below) stored therein. Such a computer program may alternatively be stored in a computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, which are executable by the processing device. One of ordinary skill in the art would be familiar with the general components of a computing system upon which the method of the present invention may be performed. Each of the servers 10-24 is adapted to communicate with other devices via the network 14 as is known in the art.


Referring now to FIG. 2, there is illustrated in flow diagram form the preprocessing setup to enable the server 10 to compute the oblivious database y 18 according to an embodiment of the present invention. In step 50, the server D 10 randomly chooses two servers, e.g., servers A1 20 and A2 22, out of the l available random servers. In step 52, the servers A1 20 and A2 22 establish a cryptographically secure pseudorandom function (PRF) f:{0,1}m×K→{0,1}b and a synchronized timestamp, where m, the length of the input, can be chosen arbitrarily, K is a symmetric key and b is the length of a block in bits. In step 54, the server D 10 chooses at random two datasets x01 and x02, such that x01⊕x02=x. Note that x0 denotes the starting database x 12. In step 56, server A1 20 chooses at random α1 and α2, such that α1⊕α2=α. The scheme described by Gertner et al. also requires server A2 22 to choose uniformly at random π1 and π2, such that π21(•))=π(•). The present invention, in contrast, makes no such requirement for a few reasons. First, it does not result in any privacy gain since it is quite trivial for server A1 20 to compute π2 from the expression π2−1(π)=π1. Server A1 20 has knowledge of π and is required to receive π1 server A2 22. Second, removing the restrictions enables the server 10 to engage more than two random servers in the multi-party protocol. More servers give a better security for the database since the chances of three or more random, servers not colluding is better than for two servers. In step 58, server D 10 sends x01 to server A1 20 and x02 to server A2 22, and server A1 20 sends α2 to server A2 22.


In step 60, server A1 20 creates a vector z of length n and initializes each of its elements to the current timestamp, computes a temporary dataset u=π(x01⊕α1), and sends u to server D 10. In step 62, server A2 22 similarly creates a vector z and initializes each of its elements to the current timestamp. However, it computes a temporary dataset v=π(x02⊕α2⊕f(K,“AppName”∥α∥z)) and sends v to server D 10. In step 64, server D 10 computes the initial oblivious database 18 y as u⊕v=π(x⊕α⊕f(K,“AppName”∥α∥z)). Additionally, servers A1 20 and A2 22 respectively discard their snare of x01and x02, and reset u and v to ⊥ (empty value).


Referring now to FIG. 3, there is illustrated in flow diagram form the processing of a user query according to an embodiment of the present invention. In step 80, a user, using the client device 16, runs the underlying PIR scheme with A1 and A2 (or A1, A2 and one or more other random servers sharing the same state as A2). Suppose, for example, that a user query to the database 12 of server 10 requires retrieval of the ith database block of xi (i.e., xtext missing or illegible when filed, where text missing or illegible when filed is the current version of the database and i is the index or address of the block sought). The result will be in step 82 for the client device to obtain from the random servers 20-24 a block αtext missing or illegible when filed⊕f(K,“AppName”∥αtext missing or illegible when filed∥ztext missing or illegible when filed), j=π−1(i). In step 84, the client device 16 asks the server D 10 for the jth block of the oblivious database, i.e., ytext missing or illegible when filed. In step 86, the client device 16 computes the the desired block xtext missing or illegible when filedtext missing or illegible when filed⊕f(K,“AppName”∥αtext missing or illegible when filed∥ztext missing or illegible when filed)⊕ytext missing or illegible when filed.


Referring now to FIG. 4, there is illustrated in flow diagram form the processing to update the original database that allows the updates to occur concurrently with query processing, and without leaking any information about the correspondence between an updated record in the database and the respective oblivious database blocks according to embodiments of the present invention. Suppose, for example, the owner of the database 12 updates the ith record of database xtext missing or illegible when filed (i.e., xtext missing or illegible when filed), making it necessary to update the corresponding oblivious database block ytext missing or illegible when filed, which is queried directly by users. However, it is desired to make changes in a way that does not allow server 10 to establish that j=π−1(i). If changes to ytext missing or illegible when filed are naively made shortly after xtext missing or illegible when filed is updated, then the server 10 will be able to learn the permutation. The present invention is based on making changes, to the oblivious database ytext missing or illegible when filed in an order that is independent of how the database xtext missing or illegible when filed changes, and in a way that keeps the changes in ytext missing or illegible when filed unpredictable from the known changes in xtext missing or illegible when filed. In other words, each incremental change made to ytext missing or illegible when filed does not reveal to the server 10 which of the xtext missing or illegible when filed's it was that triggered the change.


In step 100, prior to receiving the first update from server D 10, the random servers A1 20 and A2 22 jointly agree on a new pseudorandom permutation π0, which will be used to define the order to send bits of data that will be used to update the oblivious database ytext missing or illegible when filed. Note that π0 is unrelated to π held by the random servers 20, 22. Additionally, they establish a coin toss that allows them to switch roles in the protocol either as a randomizer or a timer. A randomizer deals out a share of αtext missing or illegible when filed to the other random server(s), such that the sum of their shares (mod2) is the same as αtext missing or illegible when filed. A timer adds the evaluation of f(K,“AppName”∥αtext missing or illegible when filed∥ztext missing or illegible when filed) (using the current timestamp ztext missing or illegible when filed) to its update piece for ytext missing or illegible when filed (described below). The coin toss might be established from the bits output by a pseudorandom generator sharing a common key and state (e.g., AES on the string of the application name with the same key and initialization vector, where a ‘0’ and ‘1’ could indicate a randomizer or timer). Suppose that the original ith block is xtext missing or illegible when filed and the updated ith block is xtext missing or illegible when filed.


In step 102, server D 10 computes xtext missing or illegible when filed=xtext missing or illegible when filed⊕xtext missing or illegible when filed, xtext missing or illegible when filed1, and xtext missing or illegible when filed2, such that xtext missing or illegible when filed1xtext missing or illegible when filed2= xtext missing or illegible when filed. Then it sends {i, xtext missing or illegible when filed1} to server A1 20 and {i, xtext missing or illegible when filed2} to server A2 22. In step 104, on receipt of xtext missing or illegible when filed1, server A1 20 computes utext missing or illegible when filed= xtext missing or illegible when filed1⊕αtext missing or illegible when filed1⊕f(K,“AppName”∥αtext missing or illegible when filed∥ztext missing or illegible when filed) using the current timestamp ztext missing or illegible when filed for ztext missing or illegible when filed if role is timer, otherwise it uses the saved timestamp value if role is randomizer. In either case, it updates the value of ztext missing or illegible when filed←z0 after the computation. It then saves utext missing or illegible when filed to a dataset u stored in an update cache in server A1 20. In step 106, on receipt of xtext missing or illegible when filed2, server A2 22 computes vtext missing or illegible when filed= xtext missing or illegible when filed2⊕αtext missing or illegible when filed2⊕f(K,“AppName”∥αtext missing or illegible when filed∥ztext missing or illegible when filed) using the current timestamp ztext missing or illegible when filed for ztext missing or illegible when filed if role is timer or the saved time if role is randomizer. Afterwards, it then updates the timestamp ztext missing or illegible when filed←ztext missing or illegible when filed and saves vtext missing or illegible when filed to a dataset v stored in an update cache in server A2 22. In step 108, both server A1 20 and server A2 22 follow the order defined by π0 to respectively send a single block of u and v to the database server D 10. Once a block utext missing or illegible when filed or vtext missing or illegible when filed is sent, each random server 20, 22 resets its slot to ⊥. In step 110, on receipt of {j,utext missing or illegible when filed} and {j,vtext missing or illegible when filed}, server D 10 can compute the new block as ytext missing or illegible when filed=ytext missing or illegible when filed⊕utext missing or illegible when filed⊕vtext missing or illegible when filed={circumflex over (x)}text missing or illegible when filed⊕αtext missing or illegible when filed⊕f(K,“AppName”∥αtext missing or illegible when filed∥ztext missing or illegible when filed). Note that f(K,“AppName”∥αtext missing or illegible when filed∥ztext missing or illegible when filed) re-randomizes ytext missing or illegible when filed irrespective of whether xtext missing or illegible when filed changes or not. The database owner will be unable to predict if xtext missing or illegible when filed has changed or not. After all the blocks of u and v corresponding to the last index of π0 has been processed, the random servers 20, 22 picks a different π0, a different π and then repeats the above steps to help update the oblivious database ytext missing or illegible when filed. Finally, server D 10 can compute a new block of oblivious data ytext missing or illegible when filed12(u1)⊕v1⊕y0, which would give π1(x1⊕α1) for the updated block of x and π1(x0⊕α1) for the unchanged blocks.


While preferred embodiments of the invention have been described and illustrated above, it should be understood that these are exemplary of the invention and are not to be considered as limiting. Additions, deletions, substitutions, and other modifications can be made without departing from the spirit or scope of the present invention. Accordingly, the invention is not to be considered as limited by the foregoing description but is only limited by the scope of the appended claims.

Claims
  • 1. A method for a first server to compute an oblivious database based on an original database such that a data block of the original database can be updated without the first server being able to determine the corresponding data block in the oblivious database, the method comprising: selecting, by the first server, a first random server and a second random server from a plurality of random servers, each of the random servers having a random database;establishing, by the first random server and the second random server, a cryptographically secure pseudorandom function and a synchronized timestamp;selecting, the first server, a first random data set and a second random data set from the original database;sending, by the first server, the first random data set to the first random server and the second random data set to the second random server;selecting, by the first random server, a first random data set and a second random data set from the random database;sending, by the first random server, the second rand data set from the random database to the second random server;computing, by the first random server, a first vector of length n and initializing each element of the first vector to a current timestamp;computing, by the first random server, a first temporary data set based on the first random data set from the original database and the first random data set from the random database and sending the first temporary data set to the first server;computing, by the second random server, a second vector of length n and initializing each element of the second vector to the current timestamp;computing, by the second random server, a second temporary data set based on the second random data set from the original database, the second random data set from the random database and the cryptographically secure pseudorandom function, and sending the second temporary data set to the first server; andcomputing, by the first server, the oblivious database based on the first temporary data set and the second temporary data set.
  • 2. A method for a first server to update a data block of an oblivious database with updated data, the oblivious database being based on an original database, such that the first server cannot determine a correspondence between the original database and the oblivious database, the method comprising: separating, by the first server, the updated data into a first random block of data and a second random block of data;sending, by the first server, the first random block of data and an index for the data block being updated to a first random server and the second random block of data and the index to a second random server;computing, by the first random server, a first temporary value based on the first random block of data, a corresponding block of data from a random database, and a pseudorandom function;storing, by the first random server, the first temporary value in a first update cache;computing, by the second random server, a second temporary value based on the second random block of data, a corresponding block of data from the random database, and the pseudorandom function;storing, by the second random server, the second temporary value in a second update cache;sending, by the first random server, a single block of the first update cache and an index of the single block from the first update cache to the first server;sending, by the second random server, a corresponding single block of the second update cache and an index of the corresponding single block of the second update cache to the first server; andcomputing, by the first server, the updated data block of the oblivious database based on a current value of the data block being undated, the single block of the first update cache and the single block of the second update cache.
CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application Ser. No. 61/817,463, filed Apr. 30, 2013, the specification of which is hereby incorporated by reference.

Provisional Applications (1)
Number Date Country
61817463 Apr 2013 US