Implementing end-to-end encryption poses many challenges in the data management and database spaces. The goal of such encryption approaches is to provide a completely secure set of data for any client, irrespective of platform. Even when data is fully encrypted, there are opportunities for adversaries to exploit data leakage to learn about underlying encrypted data, where the opportunities for leakage depend on the underlying encrypted search design as well as on the adversarial model being considered.
According to some aspects, provided are systems and methods that implement end-to-end encryption, and provide implementation configured to secure information during execution of queries on a data source. Various embodiments include multiple encrypted multi-map data structures and associated encryption schemes configured to securely read, write, and delete information while supporting any one or more of the following features: snapshot security, multiple client support, efficient execution under concurrent operation, and resilience to client failures.
According to various aspects, provided are descriptions of encryption schemes for implementing end-to-end encryption in document oriented database systems, semi-structured, and/or unstructured database systems. According to one embodiment, a database system can include an OST1 construction. According one example, OST1 describes a (e.g., document) database encryption scheme that is configured to enable any one or more of the following features: (1) snapshot security; (2) support for multiple clients; (3) efficient support for concurrent operations; and (4) resilience to client failures. Further embodiments provide “lightweight clients”—in the sense that the implementation does not require or assume that the clients can have large memory or have access to a non-conventional computational power. Still other embodiments enable resilience to “server crashes,” and also provide for “scalability.” For example, the system can support scalable architecture and work in sharded clusters of the known MongoDB database (among other options). Some embodiments are configured to provide efficient search, updates and deletes, low storage overhead, and expressive queries including for example, support for more than point queries.
According to one aspect, a database system is provided. The system comprises at least one processor operatively connected to a memory, the at least one processor when executing configured to: enable end-to-end encryption of plaintext data via an emulation of a database implementation (e.g., distributed database, dynamic schema database, known MongoDB database, etc.); accept and process queries against the emulation of the database implementation, such that the queries operate on and retrieve encrypted data from the emulation; instantiate the emulation of the database implementation, the emulation including: at least a first encrypted data structure (e.g., multi-map, addressable multi-map, etc.) configured to: store encrypted representations of the plaintext data; link multi-dimension labels to respective encrypted representations in the first encrypted data structure; receive and execute database operations against the encrypted representations using the multi-dimension labels; and at least a second encrypted data structure (e.g., multi-map, addressable multi-map, etc.) configured to: store encrypted metadata associated with the first encrypted data structure; and prevent overwrite conditions from occurring on the first encrypted data structure using the encrypted metadata.
According to one embodiment, the at least one processor is further configured to receive and execute concurrent database operations against the first encrypted data structure. According to one embodiment, the at least one processor is further configured to receive and execute concurrent database operations against the second encrypted data structure. According to one embodiment, the at least one processor is further configured to receive and execute stateless database operations against the first encrypted data structure. According to one embodiment, the at least one processor is further configured to receive and execute stateless database operations against the second encrypted data structure. According to one embodiment, the emulation further comprises a third encrypted data structure configured to store gap information for the multi-dimension labels and respective encrypted representations.
According to one embodiment, the third encrypted data structure is configured to limit reads executed on the first encrypted data structure to occur on locations in the first encrypted data structure having existing data. According to one embodiment, the at least one processor is further configured to receive and execute concurrent and/or stateless database operations against the third encrypted data structure. According to one embodiment, the emulation further comprises an encrypted set structure configured to: store operation tokens generated for database operations on the second and third encrypted data structures; and enable compaction of the second and/or third encrypted data structures. According to one embodiment, the emulation further comprises an encrypted range data structure (e.g., multi-map, addressable multi-map, etc.) configured to: store encrypted representations of the plaintext data; and receive and execute range delimited database operations against the encrypted representations.
Still other aspects, examples, and advantages of these exemplary aspects and examples, are discussed in detail below. Moreover, it is to be understood that both the foregoing information and the following detailed description are merely illustrative examples of various aspects and examples and are intended to provide an overview or framework for understanding the nature and character of the claimed aspects and examples. Any example disclosed herein may be combined with any other example in any manner consistent with at least one of the objects, aims, and needs disclosed herein, and references to “an example,” “some examples,” “an alternate example,” “various examples,” “one example,” “at least one example,” “this and other examples” or the like are not necessarily mutually exclusive and are intended to indicate that a particular feature, structure, or characteristic described in connection with the example may be included in at least one example. The appearances of such terms herein are not necessarily all referring to the same example.
Various aspects of at least one embodiment are discussed herein with reference to the accompanying figures, which are not intended to be drawn to scale. The figures are included to provide illustration and a further understanding of the various aspects and embodiments, and are incorporated in and constitute a part of this specification, but are not intended as a definition of the limits of the invention. Where technical features in the figures, detailed description or any claim are followed by references signs, the reference signs have been included for the sole purpose of increasing the intelligibility of the figures, detailed description, and/or claims. Accordingly, neither the reference signs nor their absence are intended to have any limiting effect on the scope of any claim elements. In the figures, each identical or nearly identical component that is illustrated in various figures is represented by a like numeral. For purposes of clarity, not every component may be labeled in every figure. In the figures:
To facilitate understanding of elements of the end-to-end encrypted database and example encryption schemes, described are consideration for construction of OST1 and underlying development of two new multi-map encryption schemes ΩP and ΩR that achieve any one or more or any combination of the properties above (e.g., 1-4), in various examples. ΩR is an example range multi-map encryption scheme that can be used. ΩR itself based on ΩP and ΩP is based on multiple data structure encryption schemes that each achieve different characteristics and can be used for different purposes. Example considerations and implementation for the schemes are discussed in detail below.
Various embodiments enhance security over conventional approaches. For example, security can be enhanced over conventional implementation when considering a snapshot adversary. A (memory-level) snapshot adversary has access to the entire memory and disk of a server at a particular point in time. This means that at that instant, the adversary can access the entire database, any keys stored in memory, all the caches and all the logs. Some approaches exist that include snapshot-secure structured encryption. While such approaches exist, they are very complex and do not support the properties above. As is described in further detail below, example schemes ΩP and ΩR, are more efficient than known approaches and provide enhanced security guarantees.
According to various embodiments, the system supports databases that are accessed by multiple clients. Further, the implementation of the underlying structured encryption (“STE”) scheme can be configured to support a multi-writer multi-reader (“MWMR”) setting. In a multi-writer setting, clients can issue put operations (described in greater detail below) at the same time which can cause contention and reduce write throughput. Various embodiments resolve the complexity of multi-writer settings and improve over various known single writer approaches. To the inventors' awareness, the various embodiments described are the first multi-writer multi-reader structured encryption schemes.
Various conventional dynamic multi-map encryption schemes require the client to keep state. State becomes difficult to manage in a multi-client setting, for example, because clients need to maintain a consistent view of state. Another important consideration is that clients can crash at any time and cause state information to be lost. Various embodiments are configured to provide crash recovery protocols that are efficient. Some embodiments resolve the state issue by removing the consideration under a stateless architecture.
Construction Examples and Notation: The set of all binary strings of length n is denoted as {0, 1}n, and the set of all finite binary strings as {0, 1}*. [n] is the set of integers {1, . . . , n}. The output y of a probabilistic algorithm A on input x is denoted by y←A(x). The output y of a deterministic algorithm A on input x is denoted by y:=A(x). If S is a set then xS denotes sampling from S uniformly at random. Given a sequence s of n elements, the description refers to its ith element as si. If S is a set then #S refers to its cardinality. Throughout, k will denote the security parameter.
Example Dictionaries & multi-maps. A dictionary DX with capacity n is a collection of n label/value pairs {(i, vi)}i≤n and supports Get and Put operations. vi:=DX[
i] denotes getting the value associated with label
i and DX[
i]:=vi denotes the operation of putting the value vi in DX with label
i.
A multi-map “MM” with capacity n is a collection of n label/tuple pairs {(i, vi)i}i≤n that supports Get and Put operations. vi=MM[
i] denotes getting the tuple associated with label
i and MM[
i]=vi to denote operation of associating the tuple vi to label
i. Multi-maps are an abstract data type instantiated by an inverted index. In further example, the system can define a range multi-map “RMM” that supports—in addition to Get and Put operations-range queries: given a range [a, b]⊆Z2, return the set of values V=
RMM[
]. V=RMM[[a, b]] denotes getting the values associated with the range [a, b].
Example source databases can include any structured or semi-structured database. Various embodiments are configured to manage document databases. A document database DDB of size n holds n documents D1, . . . , Dn each of which is a set of field/value pairs. Various examples described herein are discussed under the assumption of documents in a database that have the same number of field/value pairs. More precisely, for all 1≤i≤n, Di=(f1, v1), . . . , (fm, vm). The examples are provided to illustrate operations and facilitate understanding and are not limited to such cases, and in other embodiments are configured to manage databases and documents having varying numbers of field/value pairs.
Examples are discussed that include document databases with fields that support the following exact queries and range queries. For example, an exact search query takes as input a field/value pair (f, v) and returns the documents in DDB that include the field f with value v. A range search query takes as input a range [a, b] instead of a single value and returns the documents in DDB that include the field f with values between a and b.
Various embodiments and operations are discussed with respect to the known MongoDB database and its mongo shell query and update operations. Other embodiments can be employed with different databases and query/update operations.
Example cryptographic primitives are included in, for example, a symmetric-key encryption scheme. The symmetric-key encryption scheme is a set of three polynomial-time algorithms SKE=(Gen, Enc, Dec) where Gen is a probabilistic algorithm that takes a security parameter k and returns a secret key K; Enc is a probabilistic algorithm that takes a key K and a message m and returns a ciphertext c; Dec is a deterministic algorithm that takes a key K and a ciphertext c and returns m if K was the key under which c was produced.
Informally, a private-key encryption scheme is secure against chosen-plaintext attacks (CPA) if the ciphertexts it outputs do not reveal any partial information about the plaintext even to an adversary that can adaptively query an encryption oracle. A scheme is random-ciphertext-secure against chosen-plaintext attacks (RCPA) if the ciphertexts the scheme outputs are computationally indistinguishable from random even to an adversary that can adaptively query an encryption oracle. In some examples, RCPA-secure encryption can be instantiated practically using either the standard PRF-based private-key encryption scheme or, e.g., AES in counter mode. In addition to encryption schemes, the system can be configured to leverage pseudo-random functions (PRF), which are polynomial-time computable functions that cannot be distinguished from random functions by any probabilistic polynomial-time adversary. In the following examples, described are the evaluation of a pseudo-random function F with a key K on an input x as FK(x) but sometimes as F(K, x) for visual clarity. Also the notation FK[s1, s2, . . . , sn] can be used to mean F(F(F(K, s1), s2), . . . ), sn). Various formal security definitions are known and include those described in Introduction to Modern Cryptography, by J. Katz and Y. Lindell, (2008).
Various embodiments employ hypergraph data structures. A hypergraph H=(V, E) consists of a set of n vertices V=v1, . . . , vn and a collection of m non-empty edges E=e1, . . . , em such that, for all i∈[m], ei⊆V. The degree of a vertex v∈V is the number of edges in E that contain v and is denoted by deg(v). In the following, described is a range hypergraph, H=(V, E) such that V is a total order and such that for all ranges r∈R(V), there exists a subset Cr⊆E such that ∪e∈C
In various embodiments, the system includes two efficient algorithms: EdgesH and MincoverH. In some examples, EdgesH takes as input a vertex v and outputs the subset of edges Ev⊆E that include v. In other examples, Mincover takes as input a range r∈R(V) and outputs its min-cover Cr. The two efficient algorithms permit use of a hypergraph H in various constructions.
Various embodiments include a stateless multi-map encryption scheme ΩP. In various examples, ΩP evolved and improved over some known multi-map encryption schemes. In various embodiments, the underlying encryption schemes were adapted and improved, and each one modified to have different characteristics and ultimately used for different purposes. At a high level, the first scheme, ΣM, can be used to encrypt the input multi-map which results in the main encrypted multi-map EMMM. The second scheme, ΣC, can be used to encrypt metadata about the main encrypted multi-map (e.g., that can be used to avoid overwriting items in EMMM). The third scheme, ΣD, can be used to store information about items deleted in the main encrypted multi-map (e.g., in order to speed up queries on EMMM). The last scheme, ΣP, can be used to store information that can be needed to compact the auxiliary structures. Compacting the auxiliary structures reduces their space consumption. The following description describes examples of the underlying encryption schemes, optimizations, improvements, and purposes.
As mentioned above, various embodiments include scheme ΩP, which can employ a first scheme ΣM to encrypt an input multi-map MM, resulting in the “main” encrypted multi-map EMMM. In various examples, ΣM is a πdyn-style construction that has been adapted to improve security and operation over known πdyn-style constructions. For example, ΣM is part of a two-dimensional multi-map encryption scheme (described in greater detail below). Further ΣM is configured to be stateless. This architecture can be implemented at the cost of correctness, in the sense that the values associated to a label can be overwritten. To better understand this example scheme and this behavior, ΣM is described as supporting read, write, and erase operations instead of get, put, and delete operations. More precisely, these operations work as follows:
According to some embodiments, the system is configured to enable concurrency via two-dimensionality. According to one embodiment, the encrypted multi-map EMMM will be used by ΩP to store the tuple associated with a label . Typical operation results in contention when multiple clients are writing to the same label, and which in turn, results in slowing down ΩP 's write throughput under parallel put operations. Various embodiments are configured to resolve the contention and the throughput issue. For example, the system can be configured to employ EMMM as a 2-dimensional (encrypted) multi-map, instead of using a standard multi-map. In this example, the multi-dimension multi-map is configured to hold label/tuple pairs with labels of the form
=(
x,
y). Given a high contention label
, ΩP is configured to process
as a multi dimensional label
′=(
, u), where u is a value sampled uniformly at random from {1, . . . , p}, and store the pair ((
, u), v) in EMMM. Stated broadly, the system manages the scenarios where n clients try to write to the same high-contention label
then, in expectation, only n/p writes will be executed on the same two-dimensional label
=(
, u) in EMMM. Further embodiments can be configured with additional optimization via a two-choice allocation instead of just sampling u at random.
Various embodiments enable this operation based on the two-dimensional encrypted multi-map supporting—in addition to read, write and erase operations—read operations on a single dimension. To facilitate understanding, in various embodiments, n write (, v) operations for EMMM can be transformed to n writes of the form ((
, u), v) for 1≤u≤p. In various examples, this architecture does not cause any issue during write operations, but potential issues can result for reads, since a hypothetical read needs to return the values associated with every two-dimensional label
. An example solution requires the client to compute and send p read tokens to the server; one for each u∈{1, . . . , p}. Other embodiments are configured to support two additional algorithms, ReadXToken and ReadXYToken, to improve operation. According to one example, the first algorithm, ReadXToken, can be used by the client to generate a read token for the x-component of a label
=(
x,
y). The second algorithm, ReadXYToken, is used by the server to generate a read token for
=(
x,
y) given a read token for
x and the y-component
y. When querying for a label
, the system (e.g., client) can be configured to send to the server a read token for
and the server can use that to generate read tokens for the two-dimensional labels (
, 1), . . . , (
, p).
The following examples and embodiments describe the syntax of addressable two-dimensional multi-map encryption schemes. Various embodiments provide a response-hiding stateless addressable two-dimensional multi-map encryption scheme. The scheme can be a structured encryption scheme ΣM=(Init, WriteToken, Write, ReadToken, ReadXToken, ReadXYToken, Read, EraseToken, Erase, Resolve) that can include the preceding polynomial-time algorithms. Examples of the algorithms are shown in the Source Code Appendix, which forms an instant part of this specification. In further embodiments, ΣM provides a practical stateless encryption scheme for addressable two-dimensional multi-maps.
According to one example, the scheme is described in detail in :=F(FK
x,
) and encryptions of each value in v under the key Ke. The Write algorithm stores pairs of the form (ti, cti) in the dictionary DX, where ti:=
(ai) and cti is the encryption of vi. The ReadToken algorithm is configured to return the key
:=F(FK
x,
) as the read token rtk and Read returns the ciphertexts in DX associated to the labels
(ai), for all ai∈a. The ReadXToken algorithm is configured to return Kx:=FK
x) as its read-x token, and ReadXYToken is configured to return FK
) as the read token. EraseToken is configured to output
:=F(FK
x),
) as the erase token etk and Erase sets DX[
(a)] to ⊥. Resolve recovers v by decrypting the sequence of ciphertexts ct using Ke.
According to one embodiment, ΣM is optimal with respect to communication complexity: write tokens are O(#v), read and erase tokens are O(1) and read responses are O(#a). In further example, the scheme is also optimal with respect to server-side computation since writes and reads are O(#a) and erase operations are O(1). Client-side operations are also optimal since computing write tokens is O(#a), computing read and erase tokens is O(1) and resolving is O(#ct).
In further embodiments, a second building block, ΣC, which can be a dictionary encryption scheme that achieves statelessness and correctness. Some examples provide these features at the cost of limited query functionality and (in some cases) a slight decrease in query efficiency. This scheme is configured to satisfy several non-standard properties described in greater detail below.
As discussed above, ΣM achieves statelessness by easing on correctness and, specifically, by not guaranteeing that values cannot be overwritten. Various embodiments address this limitation via an auxiliary encrypted structure EDXC produced with a dictionary encryption scheme ΣC to store information that limits overwrites in EMMM. Embodiments of ΣC have been designed so that they are both stateless and correct, in the sense that it does not allow overwrites.
An example approach that achieves these goals includes an option to associate a counter count with every label in the main encrypted multi-map EMMM, store the pairs (
, count
) in a dictionary DX, encrypt DX using a response-revealing dictionary encryption scheme and store the resulting encrypted dictionary EDXC with the main encrypted multi-map EMMM. To add a label/tuple pair (
, v) to EMMM, the system (e.g., the client) is configured to send encryptions of v and a ΣC get token gtkC for
so that the server can query EDXC, recover count
and store the ciphertexts ct in EMMM at addresses a=(count
+1, . . . , count
+#ct). The server then updates the pair (
, count
) in EDXC to (
, count
+#ct).
Additional embodiments resolve potential security concerns of the above approach. While this approach may seem reasonable, it has a subtle security flaw if implemented naively. The problem is with the last step where the server updates EDXC with the new counter value. If this is done in-place, then a snapshot adversary will be able to correlate EDXC put operations—and therefore EMMM write operations-since every put for a label results in changes at a specific location of EDX. It is realized that even if the location of the pairs in EDXC's underlying structure are randomized, there may still be a consistent string associated to the pair that could be used to correlate. While some embodiments use randomization, further security improvements can be realized.
For example, various embodiments include ΣC configured in such a way that ΣC supports edits in an immutable manner so that correlations are not revealed. One example approach implements the encrypted dictionary using an encrypted multi-map and implements dictionary edit operations with multi-map append operations. For example, the system is configured to, when changing a pair (, v) in the encrypted dictionary to (
, v′) append the new value v′ to
's tuple in an encrypted multi-map. A dictionary get operation for
can then be implemented by returning the last value of
's tuple in the underlying multi-map. According to various embodiments, because an EDXC-level edit is implemented as an encrypted multi-map append, a snapshot adversary cannot correlate between edit operations.
As discussed above, the STE schemes implemented by the system, and as a building block for ΩP have been designed to be stateless. The system is configured to maintain the stateless property for the encrypted dictionary EDXC and its underlying encrypted multi-map. The properties may seem cross purpose, however, various embodiments implement EDXC's underlying EMM to guarantee that the EMM has a special property which enables a stateless and correct scheme. For example, the underlying multi-map will always be complete, in the sense that for all labels , if
's tuple v includes m values then there does not exist an index 1≤i≤m such that vi=⊥.
According to some embodiments, the above guarantee of completeness enables support of get tail operations on the underlying encrypted multi-map efficiently—where the tail of a label/tuple pair is the last element of the label's tuple. More precisely, in various examples the system provides this functionality using the following variant of binary search. According to one example, consider a sequence S=(v1, . . . , vn, ⊥n+1, . . . , ⊥N). Given S, we would like to find the address a such that va≠⊥ but va+1=⊥. This problem can be solved in O(N) time with linear scanning but also in O(log N) time as follows: given S, check if the element at address N/2 is ⊥; if so recur on the “left half” of S otherwise recur on the “right half” of S. The base case occurs when the set holds a single element. Embodiments of the algorithm are described in detail in
According to some embodiments, another characteristic of ΣC is that, like ΣM, ΣC is two-dimensional in order to provide support for concurrent ΩP operations. The following examples and embodiments describe the syntax of addressable two-dimensional multi-map encryption schemes. Various embodiments provide a response-revealing stateless immutable two-dimensional multi-map encryption scheme. The scheme can be a structured encryption scheme ΣC=(Init, PutKey, PutToken, Put, GetToken, GetXToken, GetXYToken, Get, DeleteToken, Delete) that can include the preceding polynomial-time algorithms. Examples of the algorithms are shown in Source Code Appendix.
]).
As discussed above, ΩP encrypts the input multi-map MM with a stateless addressable scheme ΣM to produce a main encrypted multi-map EMMM and then encrypts a dictionary to avoid overwrites with a stateless (two-dimensional) immutable dictionary encryption scheme ΣC. Embodiments that include these features achieve a stateless snapshot-secure semi-dynamic scheme. However further embodiments expand functionality to support deletes. For example, augmenting the scheme to support deletes can be achieved with minor updates if all the system enables is correctness, but further implementation to handle deletes without affecting the scheme's query complexity includes additional considerations. The inventors have realized that the problem stems from deleting label/value pairs from the main encrypted multi-map EMMM. So for example, if the multi-map originally stored a pair (, v), where #v=m, and then values (v1, . . . , vm-1) are deleted, querying the structure for
would still be O(m). Some embodiments are configured to address this issue so, ΩP includes, in addition to EMMM and EMMC, an encrypted multi-map EMMD that stores, for every label in EMMM, the gaps/holes in
's tuple v. When the server executes a get for, it first queries EMMD to retrieve
's gaps gf and uses that to only read from the existing locations in
's tuple.
In further embodiments, other characteristics of ΣD include (like ΣC) two-dimensionality in order to provide support for concurrent ΩP operations. ΣD can also support two kinds of insert operations, append and put which work as described on Source Code Appendix.
Various embodiments enable ΣD to support multiple kinds of inserts to allow ΩP to make different kinds of insertions at different times. For example, ΩP can be configured to append gaps to 's tuple in EMMD when deletes on
are made; and ΩP can be configured to put entire label/tuple pairs in EMMD during compaction (discussed in greater detail below).
The following examples and embodiments describe the syntax of a response-revealing stateless two-dimensional multi-map encryption scheme. The scheme can be a structured encryption scheme ΣD=(Init, AppendKey, AppendToken, Append, PutKey, PutToken, Put, GetToken, GetXToken, GetXYToken, Get, DeleteToken, Delete) that can include the preceding polynomial-time algorithms. Examples of the algorithms are shown in Source Code Appendix.
According to some embodiments, the design of ΣD is shown with detailed examples in ]. The algorithms reference above are also optimal with the exception of Append which is O(log #MMD).
As discussed above, ΩP encrypts the input multi-map with a stateless addressable multi-map encryption scheme ΣM which results in a main encrypted multi-map EMMM. Overwrite protection can be achieved by encrypting a dictionary that stores counters with a stateless two-dimensional dictionary encryption scheme ΣC which results in an auxiliary structure EDXC. Information about deletions is stored in encrypted multi-map EMMD using a two-dimensional scheme ΣD. This information can be used to speed up query operations. The embodiments and examples described achieve statelessness and correctness but can still be optimized further as they are not necessarily space efficient. On review, the space complexity of the three structures described is O(#MM[
]+#puts+#deletes), where
#MM[
] is the size of the input multi-map and #puts and #deletes are the total number of put and erase operations made on the input multi-map. Note the analysis depends on the total number of puts and deletes ever made and not on the size of the input multi-map. To address these considerations, various embodiments of ΩP use a process called compaction to remove stale data from EMMC and EMMD.
According to one embodiment, the compaction process can be executed by the server which means it needs access to information stored in both EMMC and EMMD. More precisely, the server utilizes the ability to query these structures to delete certain pairs and to add new ones. To enable this operation, the client generates get, put and delete tokens for EMMC and EMMD whenever the client executes a put or erase for ΩP. According to one example, these tokens are stored in an auxiliary encrypted set structure ESTP and used at compaction time. According to some embodiments, the encrypted set structure supports the following operations:
Example implementation of the scheme ΣP is described in
Considerations for the high level structure of ΩP have been described above in the previous sub-sections to facilitate understanding and describe the design of example building blocks of the scheme. As discussed, embodiments of the scheme make use of an addressable multi-map encryption scheme ΣM, an immutable two-dimensional dictionary encryption scheme ΣC, a two-dimensional append multi-map encryption scheme and an enumerable set encryption scheme ΣP. According to one embodiment ΩP includes functions Init, PutToken, Put, GetToken, Get, DeleteToken, CompactionToken, Compaction, EraseToken, Erase, Resolve, which are described in
According to some embodiments, PutToken for a label and tuple v first determines if
is a high contention label. If so, the function creates a two-dimensional label
′=(
, u), where u←${1, . . . , p}. If not, the function creates a two-dimensional label
′=(
, 0), and then creates a put token ptk which consists of: (1) an EMMM write token wtkM for (
′, v); (2) an EDXC get token gtkC for
′; (3) an EDXC put key for
′; (4) an ESTP insert token itkP; and (5) the size of v. The ESTP insert token itkP is for an element that is the concatenation of EDXC get and delete tokens for
′, a put key for
′ and EMMD get and delete tokens for
′. According to one embodiment, these elements are stored in ESTP and also used later during compaction.
Given a put token ptk=(wtkM, gtkC, pkC, itkP, m), the Put algorithm uses gtkC to retrieve a counter count from EDXC that represents the number of previously used addresses in the tuple of ′. For example, the server uses this counter, together with the write token wtk, to write to EMMM without overwriting Specifically, the server executes ΣM. Write with wtkM and addresses a=(
, . . . ,
+m−1). The server can be configured to update the counter of EDXC by generating a put token ptk with the put key pkC and value count+m and applying ptkC to EDXC. The server can be configured to update the encrypted set ESTP with itkP.
According to one embodiment, GetToken produces a get token gtk for a label that consists of: (1) a read x-token rxtkM for
; (2) a get-x token gxtkC for
; (3) a get-x token gxtkD for
′; and (4) a flag that describes whether the label is a high contention label or not. Given a get token gtk=(rxtkM, gxtkC, gxtkD, cont), the Get algorithm first uses the flag to determine if the label is a high contention label. If so, the server uses gxtk with values {1, . . . , p} to generate p get tokens (gtkC,1, . . . , gtkC,p), where gtkC,i, is for the two-dimensional label (
, i). The server then queries EDXC with these tokens to retrieve p counters (count1, . . . , countp) from EDXC for the two-dimensional labels (counti, . . . , countp). Similarly, for all 1≤i≤p, if counti>0, the server uses gxtkD with {i} to generate a get token gtkD,i, and uses it to recover the gaps gi for the two-dimensional label (
, i). In addition, the server uses rxtkM to generate a read token rtkM,i for the two-dimensional label (
, i). According to some examples, the server then uses the counters and gaps to generate the sequence of used addresses it needs to read from EMMM. If the label is not a high contention label, the server can execute the above with a single two-dimensional label (
, 0).
According to one embodiment, EraseToken produces an erase token etk for a two-dimensional label (, u) and address a that consists of: (1) an erase token etkM for (
∥u); (2) a get token gtkD for (
, u); (3) an append token atkD for (
, u); (4) the address a to erase; and (5) an insert token itkP for a set of compaction-time tokens, i.e., a set of EDXC and EMMD tokens configured for use during compaction. According to one example, the Erase algorithm uses itkP to insert the compaction tokens in the encrypted set ESTP and uses etkm, to erase the element at address a from (
, u)'s tuple in EMMM.
According to one embodiment, CompactionToken outputs the key KP as a compaction token. At a high level, for every in EMMM, the compaction algorithm first retrieves 's counter from EDXC and
's gaps from EMMD. Once collected, the algorithm is configured to then delete everything related to from both EDXC and EMMD which includes “stale” data, for example, old counter values in EDXD. The deletion enables reclamation of wasted space. Once removed, the algorithm then re-inserts
's counter in EDXC, merges
's gaps and re-inserts them in EMMD. Merging in this context includes operations where
's gaps are re-encoded into a more compact representation. For example, if
's gaps include four holes i, i+1, i+2, i+3 then the algorithm encoded them as a single gap [i, 3]. A detailed description of an example merge process is given in
According to further embodiments, the compaction algorithm enumerates ESTP which returns a set P of elements of the form gtkC∥dtkC∥pkC∥gtkD∥dtkD. For example, these elements encode a set of tokens needed to compact EDXC and EMMD for some label . For each of the elements in P, the algorithm is configured to use gtkC to retrieve
's counter from EDXC and gtkD to retrieve gaps g from EMMD. The algorithm then merges g into a new sequence g.
is then deleted from EDXC and EMMD using dtkC and dtkD, respectively. If g′={1, . . . , count} then every element of
's tuple has been erased and nothing else needs to be done. If g′≠{1, . . . , count}, however, the algorithm is further configured to: (1) use pkc to generate a put token for
's counter and inserts the counter into EDXC; and (2) use pkD to generate a put token for g and inserts the token into EMMD.
According to some embodiments, during compaction, if the data related to a particular label is being compacted then get, put and delete operations can still occur simultaneously on any label
′≠
. According to one embodiment, the Resolve algorithm executes Σ's resolve algorithm and returns its output.
According to some embodiments, the system can include a range multi-map encryption scheme ΩR=(Init, PutToken, Put, RangeToken, Range, EraseToken, Erase, CompactionToken, Compaction, Resolve), for example, that is used by OST1. Various embodiments have been adapted from an ERX framework described in “Encrypted Range Search via Range Hypergraph”, by Kasemsan Kongsala, Seny Kamara, and Tarik Moataz. Example implementation makes use of a multi-map encryption scheme E and a range hypergraph H equipped with efficient algorithms EdgesH and MincoverH. According to some examples, the scheme is updated to instantiate E with the stateless multi-map encryption scheme ΩP and H with a new hypergraph referred to as a sparse partition hypergraph. Example details of the construction are provided in
A common belief in this space is that STE may be limited based on use of non-standard data structures and query algorithms which can limit applicability since STE requires re-architecting existing database systems. Various embodiments described herein resolve the legacy-friendly concern of STE. For example, one reason traditional STE schemes are believed to be not legacy-friendly is because they make two implicit assumptions about the server: (1) that it can store arbitrary data structures; and (2) that it can execute arbitrary algorithms. A legacy-friendly scheme does not make these assumptions and is designed to work with servers that can only store a fixed kind of data structure and execute a fixed set of operations. For example, a SQL-friendly STE scheme is a scheme that produces encrypted structures that can be stored as relational databases and that has query and update algorithms that can be executed as standard SQL operations. Similarly, a MongoDB-friendly STE scheme is a scheme that produces encrypted structures that can be stored as document databases and that have query and update algorithms that can be executed using standard MongoDB operations.
Stated broadly, various aspects provide emulation that is configured to take an encrypted data structure (e.g., an encrypted multi-map) and find a way to represent it as another data structure (e.g., a graph) without any additional storage or query overhead. Intuitively, emulation is a more sophisticated version of the classic data structure problem of simulating a stack with two queues. Designing storage- and query-efficient emulators can be challenging depending on the encrypted structure being emulated and the target structure (i.e., the structure used to emulate on top of). According to various embodiments, the benefits of emulation are twofold: (1) a low-overhead emulator essentially makes an STE scheme legacy-friendly; and (2) emulation preserves the STE scheme's security.
The following examples and embodiment describe a storage-level emulation rather than a fully-emulated version of OST1. The difference between full and storage-level emulation is that the latter emulates the data structures of the scheme but not its query and update algorithms. In other words, various embodiments of the emulated OST1 scheme require no modifications to the server's storage system but implement new query algorithms. Other embodiments provide for fully emulating OST1 (no query algorithm changes) but the following examples of the storage-level emulation results in a more communication-efficient scheme. Example implementation details for storage-level emulation of OST1 is described in
Example notation used: The set of all encrypted fields in the database is F, the set of encrypted fields that support exact queries is EF⊆F, the set of encrypted fields that support range queries is RF⊆F and the set of encrypted fields that are high contention as HC⊆F. Given some document D, denote by EFD, RFD and HCD the fields in D that support equality and range queries and that are high contention, respectively. Reference to a field f, f refers to the “absolute” path of the field, i.e., db.collection.f if the field f is not nested, or db.collection.field.f if it is nested. Various examples use this approach to guarantee that every field in the database is unique. To facilitate understanding, recall that when F is a pseudo-random function, it is sometimes written as FS[s1, s2, . . . , sn] to mean F(F(F(F(S, s1), s2), . . . ), sn).
According to some embodiments, the following description assumes that server stores a schema that includes the following information for encrypted fields:
Example scrub function. Various embodiments can optionally use a function “scrub” which takes in as input a query Q (e.g., a MongoDB Query Language (“MQL”) query) and outputs a clean query Q which is like Q with the exception that its values are replaced with an “obfuscation” symbol ▪. Other embodiments can be implemented with other native databases and their respective query languages.
Create collection. Described are operations on how to create a collection. There are many ways to create a collection each one supporting a different key generation mode. Example pseudo-code for these modes is given in
Example Insert Operation. An example insert operation is shown in
The terms “program” or “software” are used herein in a generic sense to refer to any type of computer code or set of processor-executable instructions that can be employed to program a computer or other processor to implement various aspects of embodiments as discussed above. Additionally, it should be appreciated that according to one aspect, one or more computer programs that when executed perform methods of the disclosure provided herein need not reside on a single computer or processor, but may be distributed in a modular fashion among different computers or processors to implement various aspects of the disclosure provided herein.
Processor-executable instructions may be in many forms, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.
Also, data structures may be stored in one or more non-transitory computer-readable storage media in any suitable form. For simplicity of illustration, data structures may be shown to have fields that are related through location in the data structure. Such relationships may likewise be achieved by assigning storage for the fields with locations in a non-transitory computer-readable medium that convey relationships between the fields. However, any suitable mechanism may be used to establish relationships among information in fields of a data structure, including through the use of pointers, tags or other mechanisms that establish relationships among data elements.
Also, various inventive concepts may be embodied as one or more processes, of which examples (e.g., the processes described herein) have been provided. The acts performed as part of each process may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.
In other embodiments, various ones of the functions and/or portions of the flows discussed herein can be executed in different order. In still other embodiments, various one of the functions and/or portions of the flow can be omitted, or consolidated. In yet other embodiments, various one of the functions and/or portions of the flow can be combined, and used in various combinations of the disclosed flows, portions of flows, and/or individual functions. In various examples, various one of the screens, functions and/or algorithms can be combined, and can be used in various combinations of the disclosed functions.
Having thus described several aspects of at least one example, it is to be appreciated that various alterations, modifications, and improvements will readily occur to those skilled in the art. For instance, examples disclosed herein may also be used in other contexts. Such alterations, modifications, and improvements are intended to be part of this disclosure, and are intended to be within the scope of the examples discussed herein. Accordingly, the foregoing description and drawings are by way of example only.
All definitions, as defined and used herein, should be understood to control over dictionary definitions, and/or ordinary meanings of the defined terms. As used herein in the specification and in the claims, the phrase “at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements. This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase “at least one” refers, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, “at least one of A and B” (or, equivalently, “at least one of A or B,” or, equivalently “at least one of A and/or B”) can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.
The phrase “and/or,” as used herein in the specification and in the claims, should be understood to mean “either or both” of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Multiple elements listed with “and/or” should be construed in the same fashion, i.e., “one or more” of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the “and/or” clause, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, a reference to “A and/or B”, when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
Use of ordinal terms such as “first,” “second,” “third,” etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed. Such terms are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term).
The phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of “including,” “comprising,” “having,” “containing”, “involving”, and variations thereof, is meant to encompass the items listed thereafter and additional items.
Having described several embodiments of the techniques described herein in detail, various modifications, and improvements will readily occur to those skilled in the art. Such modifications and improvements are intended to be within the spirit and scope of the disclosure. Accordingly, the foregoing description is by way of example only, and is not intended as limiting. The techniques are limited only as defined by the following claims and the equivalents thereto.
This Application claims priority under 35 U.S.C. § 119 to U.S. Provisional Application Ser. No. 63/349,208, entitled “SYSTEMS AND METHODS FOR END-TO END-ENCRYPTION WITH ENCRYPTED MULTI-MAPS” filed Jun. 6, 2022. This Application claims priority under 35 U.S.C. § 120 to and is a continuation in part of U.S. patent application Ser. No. 17/570,730, entitled “SYSTEMS AND METHODS USING EMULATION FOR END TO END ENCRYPTION”, filed Jan. 7, 2022, which claims priority under 35 U.S.C. § 120 to and is a continuation in part of U.S. patent application Ser. No. 17/563,425, entitled “SYSTEMS AND METHODS USING EMULATION FOR END TO END ENCRYPTION”, filed Dec. 28, 2021, which claims priority under 35 U.S.C. § 120 to and is a continuation in part of U.S. patent application Ser. No. 17/514,681, entitled “SYSTEMS AND METHODS USING EMULATION FOR END TO END ENCRYPTION”, filed Oct. 29, 2021, which claims priority under 35 U.S.C. § 119 to U.S. Provisional Application Ser. No. 63/135,053, entitled “SYSTEMS AND METHODS USING EMULATION FOR END TO END ENCRYPTION”, filed Jan. 8, 2021. Application Ser. No. 17/514,681 claims priority under 35 U.S.C. § 119 to U.S. Provisional Application Ser. No. 63/132,063, entitled “SYSTEMS AND METHODS USING EMULATION FOR END TO END ENCRYPTION”, filed Dec. 30, 2020. Application Ser. No. 17/514,681 claims priority under 35 U.S.C. § 119 to U.S. Provisional Application Ser. No. 63/131,487, entitled “SYSTEMS AND METHODS USING EMULATION FOR END TO END ENCRYPTION”, filed Dec. 29, 2020. Application Ser. No. 17/563,425 claims priority under 35 U.S.C. § 119 to U.S. Provisional Application Ser. No. 63/135,053, entitled “SYSTEMS AND METHODS USING EMULATION FOR END TO END ENCRYPTION”, filed Jan. 8, 2021. Application Ser. No. 17/563,425 claims priority under 35 U.S.C. § 119 to U.S. Provisional Application Ser. No. 63/132,063, entitled “SYSTEMS AND METHODS USING EMULATION FOR END TO END ENCRYPTION”, filed Dec. 30, 2020. Application Ser. No. 17/563,425 claims priority under 35 U.S.C. § 119 to U.S. Provisional Application Ser. No. 63/131,487, entitled “SYSTEMS AND METHODS USING EMULATION FOR END TO END ENCRYPTION”, filed Dec. 29, 2020. Application Ser. No. 17/570,730 claims priority under 35 U.S.C. § 120 to and is a continuation in part of U.S. patent application Ser. No. 17/514,681, entitled “SYSTEMS AND METHODS USING EMULATION FOR END TO END ENCRYPTION”, filed Oct. 29, 2021. Application Ser. No. 17/570,730 claims priority under 35 U.S.C. § 119 to U.S. Provisional Application Ser. No. 63/135,053, entitled “SYSTEMS AND METHODS USING EMULATION FOR END TO END ENCRYPTION”, filed Jan. 8, 2021, each of which is incorporated by reference in their entirety. At least a portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
Number | Date | Country | |
---|---|---|---|
63349208 | Jun 2022 | US | |
63135053 | Jan 2021 | US | |
63132063 | Dec 2020 | US | |
63131487 | Dec 2020 | US | |
63135053 | Jan 2021 | US | |
63132063 | Dec 2020 | US | |
63131487 | Dec 2020 | US | |
63135053 | Jan 2021 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 17570730 | Jan 2022 | US |
Child | 18328878 | US | |
Parent | 17563425 | Dec 2021 | US |
Child | 17570730 | US | |
Parent | 17514681 | Oct 2021 | US |
Child | 17563425 | US | |
Parent | 17514681 | Oct 2021 | US |
Child | 17570730 | US |