SYSTEM AND METHOD FOR OPTIMAL VERIFICATION OF OPERATIONS ON DYNAMIC SETS

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present patent document relates generally to verificatio of data returned in a search query and more particularly to a method of system for verifying the results of a search query performed on a finite set of data stored on an untrusted server.

2. Background of the Related Art

Providing integrity guarantees in third-party data management settings is an active area of research, especially in view of the growth in usage of cloud computing. In such settings, verifying the correctness of outsourced computations performed over remotely stored data becomes a crucial property for the trustworthiness of cloud services. Such a verificatio process should incur minimal overheads to the clients or otherwise the benefit of computation outsourcing are dismissed; ideally, computations should be verifie without having to locally rerun them or to utilize too much extra cloud storage.

In this paper, we study the verificatio of outsourced operations on general sets and consider the following problem. Assuming that a dynamic collection of m sets S₁, S₂, . . . , S_mis remotely stored at an untrusted server, we wish to publicly verify basic operations on these sets, such as intersection, union and set difference. For example, for an intersection query of t sets specifie by indices 1≦i₁, i₂, . . . , i_t≦m, we aim at designing techniques that allow any client to cryptographically check the correctness of the returned answer I=S_i₁∩S_i₂∩ . . . ∩S_i_t. Moreover, we wish the verificatio of any set operation to be operation-sensitive, meaning that the resulting complexity depends only on the (description and outcome of the) operation, and not on the sizes of the involved sets. That is, if δ=|I| is the answer size then we would like the verificatio cost to be proportional to t+δ, and independent of m or Σ_i|S_i|; note that work at least proportional to t+δ is needed to verify any such query's answer. Applications of interest include keyword search and database queries, which boil down to set operations.

Relation to verifiable computing. Recent works on verifiable computing [1, 12, 16] achieve operation-sensitive verificatio of general functionalities, thus covering set operations as a special case. Although such approaches clearly meet our goal with respect to optimal verifiabilit, they are inherently inadequate to meet our other goals with respect to public verifiability and dynamic updates, both important properties in the context of outsourced data querying. Indeed, to outsource the computation as an encrypted circuit, the works in [1, 12, 16] make use of some secret information which is also used by the verificatio algorithm, thus effectively supporting only one verifier instead, we seek for schemes that allow any client (knowing only a public key) to query the set collection and verify the returned results. Also, the description of the circuit in these works is fi ed at the initialization of the scheme, thus effectively supporting no updates in the outsourced data; instead, we seek for schemes that are dynamic. In other scenarios, but still in the secret-key setting, protocols for general functionalities and polynomial evaluation have recently been proposed in [11] and [6] respectively.

Aiming at both publicly verifiabl and dynamic solutions, we study set-operation verificatio in the model of authenticated data structures (ADSs). A typical setting in this model, usually referred to as the three-party model [36], involves protocols executed by three participating entities. A trusted party, called source, owns a data structure (here, a collection of sets) that is replicated along with some cryptographic information to one or more untrusted parties, called servers. Accordingly, clients issue data-structure queries to the servers and are able to verify the correctness of the returned answers, based only on knowledge of public information which includes a public key and a digest produced by the source (e.g., the root hash of a Merkle tree, see FIG. 10).¹Updates on the data structure are performed by the source and appropriately propagated by the servers. Variations of this model include: (i) a two-party variant (e.g., [30]), where the source keeps only a small state (i.e., only a digest) and performs both the updates/queries and the verifications this model is directly comparable to the model of verifiabl computing; (ii) the memory checking model [7], where read/write operations on an array of memory cells is verified—h wever, the absence of the notion of proof computation in memory checking (the server is just a storage device) as well as the feature of public verifiabilit in authenticated data structures make the two models fundamentally different². ¹Conveying the trust clients have in the source, the authentic digest is assumed to be publicly available; in practice, a time-stamped and digitally signed digest is outsourced to the server.²Indeed, memory checking might require secret memory, e.g., as in the PRF construction in [7].

Achieving operation-sensitive verification. In this work, we design authenticated data structures for the verificatio of set operations in an operation-sensitive manner, where the proof and verificatio complexity depends only on the description and outcome of the operation and not on the size of the involved sets. Conceptually, this property is similar to the property of super-efficient verification that has been studied in certifying algorithms [21] and certificatio data structures [19, 37], which is achieved as well as in the context of verifiabl computing [1, 12, 16], where an answer can be verifie with complexity asymptotically less than the complexity required to produce it. Whether the above optimality property is achievable for set operations (while keeping storage linear) was posed as an open problem in [23]. We close this problem in the affirmat ve.

All existing schemes for set-operation verificatio fall into the following two rather straightforward and highly inefficien solutions. Either short proofs for the answer of every possible set-operation query are precomputed allowing for optimal verificatio at the client at the cost of exponential storage and update overheads at the source and the server—an undesirable trade-off, as it is against storage outsourcing. Or integrity proofs for all the elements of the sets involved in the query are given to the client who locally verifie the query result: in this case the verificatio complexity can be linear in the problem size—an undesirable feature, as it is against computation outsourcing.

SUMMARY OF THE INVENTION

We achieve optimal verificatio by departing from the above approaches as follows. We firs reduce the problem of verifying set operations to the problem of verifying the validity of some more primitive relations on sets, namely subset containment and set disjointness. Then for each such primitive relation we employ a corresponding cryptographic primitive to optimally verify its validity. In particular, we extend the bilinear-map accumulator to optimally verify subset containment (Lemmas 1 and 4), inspired by [32]. We then employ the extended Euclidean algorithm over polynomials (Lemma5) in combination with subset containment proofs to provide a novel optimal verificatio test for set disjointness. The intuition behind our technique is that disjoint sets can be represented by polynomials mutually indivisible, therefore there exist other polynomials so that the sum of their pairwise products equals to one—this is the test to be used in the proof. Still, transmitting (and processing) these polynomials is bandwidth (and time) prohibitive and does not lead to operation-sensitive verification Bilinearity properties, however, allow us to compress their coefficient in the exponent and, yet, use them meaningfully, i.e., compute an internal product. This is why although a conceptually simpler RSA accumulator [5] would yield a mathematically sound solution, a bilinear-map accumulator [28] is essential for achieving the desired complexity goal.

We formally describe our protocols using an authenticated data structure scheme or ADS scheme (Definition 1). An ADS scheme consists of algorithms {genkey, setup, update, refresh, query, verify} such that: (i) genkey produces the secret and public key of the system; (ii) on input a plain data structure D, setup initializes the authenticated data structure auth(D); (iii) having access to the secret key, update computes the updated digest of auth(D); (iv) without having access to the secret key, refresh updates auth(D); (ν) query computes cryptographic proofs ø(q) for answers α(q) to data structure queries q; (vi) verify processes a proof Π and an answer α and either accepts or rejects. Note that neither query nor verify have access to the secret key, thus modeling computation outsourcing and public verifiabilit. An ADS scheme must satisfy certain correctness and security properties (Definition 2 and 3). We note that protocols in both the three-party and the two-party models can be realized via an ADS scheme.

Our main result, Theorem 1, presents the firs ADS scheme to achieve optimal verification of the set operations intersection, union, subset and set difference, as well as optimal updates on the underlying collection of sets. Our scheme is proved secure under the bilinear extension of the q-strong Diffie-Hellma assumption (see, e.g., [8]).

Complexity model. To explicitly measure complexity of various algorithms with respect to number of primitive cryptographic operations, without considering the dependency on the security parameter, we adopt the complexity model used in memory checking [7, 14], which has been only implicitly used in ADS literature. The access complexity of an algorithm is define as the number of memory accesses performed during its execution on the authenticated data structure that is stored in an indexed memory of n cells³E.g., a Merkle tree [24] has O(log n) update access complexity since the update algorithm needs to read and write O(log n) memory cells of the authenticated data structure, each cell storing exactly one hash value. The group complexity of a data collection (e.g., proof or ADS group complexity) is define as the number of elementary data objects (e.g., hash values or elements in custom-character ) contained in this collection. Note that although the access and group complexities are respectively related to the time and space complexities, the former are in principle smaller than the latter. This is because time and space complexities are counting number of bits and are always functions of the security parameter which, in turn, is always Ω(log n). Therefore time and space complexities are always Ω(log n), whereas access and group complexities can be O(1). Finally, whenever it is clear from the context, we omit the terms “access” and “group”. ³We use the term “access complexity” instead of the “query complexity” used in memory checking [7, 14] to avoid ambiguity when referring to algorithm query of the ADS scheme. We also require that each memory cell can store up to O(poly(log n)) bits, a word size used in [7, 14].

Related work. The great majority of authenticated data structures involve the use of cryptographic hashing [2, 7, 18, 20, 39, 23, 27] or other primitives [17, 31, 32] to hierarchically compute over the outsourced data one or more digests. Most of these schemes incur verificatio costs that are proportional to the time spent to produce the query answer, thus they are not operation sensitive. Some bandwidth-optimal and

TABLE 1

Asymptotic access and group complexities of various ADS schemes for

intersection queries on t = O(1) sets in a collection of m sets with

answer size δ.

update,

setup
refresh
query
verify, |π|
assumption

[23, 38]
m + M
log n +
n + log m
n + log m
Generic CR

log m

[26]
m + M
m + M
n
n
Strong RSA

[29]
m^t+ M
m^t
1
δ
Discrete Log

this work
m + M
1
n log³n +
δ
Bilinear

m^εlog m

q-Strong DH

Here, M is the sum of sizes of all the sets and 0 < ε < 1 is a constant.

Also, all sizes of the intersected or updated sets are ⊖(n), |π| denotes the size of the proof, and

CR stands from “collision resistance”.

operation-sensitive solutions for verificatio of various (e.g., range search) queries appear in [2, 19].

Despite the fact that privacy-related problems for set operations have been extensively studied in the cryptographic literature (e.g., [9, 15]), existing work on the integrity dimension of set operations appears mostly in the database literature. In [23], the importance of coming up with an operation-sensitive scheme is identified In [26], possibly the closest in context work to ours, set intersection, union and difference are authenticated with linear costs. Similar bounds appear in [38]. In [29], a different approach is taken: In order to achieve operation-sensitivity, expensive pre-processing and exponential space are required (answers to all possible queries are signed). Finally, related to our work are non-membership proofs, both for the RSA [22] and the bilinear-map [3, 13] accumulators. A comparison of our work with existing schemes appears in Table 1.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other features, aspects, and advantages of the present invention will become better understood with reference to the following description, appended claims, and accompanying drawings where:

FIG. 1 shows a f ow chart of a two-party implementation of the method and system of the present invention where the client is also the trusted source;

FIG. 2 shows a f ow chart of a three-party implementation of the method and system of the present invention where the trusted source is distinct from the client;

FIG. 3 shows a f ow chart of a multi-party implementation of the method and system of the present invention where there are multiple trusted sources and multiple clients;

FIG. 4 shows a f ow chart of an implementation where timestamps may be transformed into an intersection of keyword queries;

FIG. 5 shows a step where the trusted source owns sets collections;

FIG. 6 shows a step where the trusted source authenticates the sets collections and create a digest and signature;

FIG. 7 shows a step where the trusted source performs setup and updates the authenticated sets collection, digest and signature on an untrusted server;

FIG. 8 shows a step where a client, Bob, queries the untrusted server and receives an answer and a proof;

FIG. 9 shows a step where a client, Alice, queries the untrusted server and receives an answer and a proof

FIG. 10 shows a two level tree structure created to contain the digest and proof;

FIG. 11 shows a step in building a digest of the collection of sets by calculating an accumulation value for every set;

FIG. 12 shows a step in building a digest of the collection of sets by building an accumulation tree containing the accumulation value of every set in the collection of sets that was previously calculated;

FIG. 13 shows a firs step in the proof of findin the answer in the digest;

FIG. 14 shows a second step in the proof of findin the answer in the digest;

FIG. 15 shows a third step in the proof of findin the answer in the digest;

FIG. 16 shows a fourth step in the proof of findin the answer in the digest; and

FIG. 17 shows a fift step in the proof of findin the answer in the digest.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Preliminaries

We denote with k the security parameter and with neg(k) a negligible function⁴. ⁴Function ƒ: custom-character → is neg(k) if and only if for any nonzero polynomial p(k) there exits N such that for all k>N it is ƒ(k)<1/p(k).

The bilinear-map accumulator. Let custom-character be a cyclic multiplicative group of prime order p, generated by element gε. Let also be a cyclic multiplicative group of the same order p, such that there exists a pairing e: ×→ with the following properties: (i) Bilinearity: e(P^α, Q^b)=e(P, Q)^abfor all P, Qε and a, b ε; (ii) Non-degeneracy: e(g, g)≠1; (iii) Computability: For all P, Q ε custom-character , e(P, Q) is efficientl computable. We call (p, , , e, g) a tuple of bilinear pairing parameters, produced as the output of a probabilistic polynomial-time algorithm that runs on input I^k.

In this setting, the bilinear-map accumulator [28] is an efficien way to provide short proofs of membership for elements that belong to a set. Let sε custom-character be a randomly chosen value that constitutes the trapdoor in the scheme. The accumulator primitive accumulates elements in −{s}, outputting a value that is an element in . For a set of elements χ in −{s} the accumulation value acc(χ) of χ is define as

acc(χ)=g^Π^χεχ^(χ+s).⁵

⁵Πχεs_i(χ+s) is called characteristic polynomial of set S_iin the literature (e.g., see [25]).

Value acc(χ) can be constructed using χ and g, g^s, g^s², . . . , g^s^q(through polynomial interpolation), where q≧|χ|. Subject to acc(χ) each element in χ has a succinct membership proof. More generally, the proof of subset containment of a set S⊂χ—for |S|=1, this becomes a membership proof—is the witness (S, W_s,χ) where

W
_s,χ
=g
^Πχεχ−s
^(χ+s). (1)

Subset containment of S in χ can be checked through relation e(W_s,χ,g^Π^χεs^(χ+s)) custom-character e (acc(χ), g) by any verifie with access only to public information. The security property of the bilinear-map accumulator, namely that computing fake but verifiabl subset containment proofs is hard, can be proved using the bilinear q-strong Diffie-Hellma assumption, which is slightly stronger than the q-strong Diffie-Hellma assumption [8]⁶⁶However, the plain q-strong Diffie-Hellma assumption [28] suffice to prove just the collision resistance of the bilinear-map accumulator.

Assumption 1 (Bilinear q-strong Diffie-Hellman assumption) Let k be the security parameter and (p, custom-character , , g) be a tuple of bilinear pairing parameters. Given the elements g, g^s, . . . , g^s^qε for some s chosen at random from , where q=poly(k) no probabilistic polynomial-time algorithm can output a pair (a, e(g, g)^1/(a+s)) ε×, except with negligible probability neg(k).

We next prove the security of subset witnesses by generalizing the proof in [28]. Subset witnesses also appeared (independent of our work but without a proof) in [10].

Lemma 1 (Subset containment) Let k be the security parameter and (p, custom-character , , g) be a tuple of bilinear pairing parameters. Given the elements g, g^s, . . . , g^s^qε for some s chosen at random from and a set of elements χ in −{s} with q≧|χ|, suppose there is a probabilistic polynomial-time algorithm that finds S and W such that S⊂χ and e(W, g^Πχεs^(χ+s))=e(acc(χ), g). Then there is a probabilistic polynomial-time algorithm that breaks the bilinear q-strong Diffie-Hellman assumption.

Proof: Suppose there is a probabilistic polynomial-time algorithm that computes such a set S={y₁, y₂, . . . , y_l} and a fake witness W. Let χ={χ₁, χ₂, . . . , χ_n} and y_j⊂χ for some 1≦j≦l. This means that

e(W,g)^Π^ν^εs^(y+s)=e(g,g)^(1+s)(χ²^{+s) . . . (χ}ⁿ^+s).

Note that (y_j+s) does not divide (χ₁+s)(χ₂+s) . . . (χ_n+s). Therefore there exist polynomial Q(s) (computable in polynomial time) of degree n−1 and constant λ≠0, such that (χ₁+s)(χ₂+s) . . . (χ_n+s)=Q(s)(y_j+s)+λ. Thus we have

${e (W, g)}^{(y_{j} + s) Π_{1 \leq i \neq j \leq l} (y_{i} + s)} = {e (g, g)}^{Q (s) (y_{j} + s) + λ} \Rightarrow {e (g, g)}^{\frac{1}{y_{j} + s}} = {[{e (W, g)}^{Π_{1 \leq i \neq j \leq l} (y_{i} + s)} {e (g, g)}^{- Q (s)}]}^{λ^{- 1}} .$

Thus, this algorithm can break the bilinear q-strong Diffie-Hellma assumption. □

Tools for polynomial arithmetic. Our solutions use (modulo p) polynomial arithmetic. We next present two results that are extensively used in our techniques, contributing to achieve the desired complexity goals. The firs result on polynomial interpolation is derived using an FFT algorithm (see Preparata and Sarwate [34]) that computes the DFT in a finit fiel (e.g., custom-character ) for arbitrary n and performing O(n log n) fiel operations. We note that an n-th root of unity is not required to exist in for this algorithm to work.

Lemma 2 (Polynomial interpolation with FFT [34]) Let Π_i=1ⁿ(χ_i+s)=Σ_i=0ⁿb_isⁱbe a degree-n polynomial. The coefficients b_n=0, b_n-1, . . . b₀of the polynomial can be computed with O(n log n) complexity, given χ₁, χ₂. . . χ_n.

Lemma 2 refers to an efficien process for computing the coefficient of a polynomial, given its roots χ₁, χ₂, . . . , χ_n. In our construction, we make use of this process a numbers of times, in particular, when, given some values χ₁, χ₂, . . . , χ_nto be accumulated, an untrusted party needs to compute g^(χ¹^+s)(χ²^{+s) . . . (χ}ⁿ^+s)without having access to s. However, access to g, g^s, . . . , g^sⁿ(part of the public key) is allowed, and therefore computing the accumulation value boils down to a polynomial interpolation.

We next present a second result that will be used in our verificatio algorithms. Related to certifying algorithms [21], this result states that if the vector of coefficient b=[b_n, b_n-1, . . . , b₀] is claimed to be correct, then, given the vector of roots x=[χ₁, χ₂, . . . , χ_n], with high probability, vector b can be certifie to be correct with complexity asymptotically less than O(n log n), i.e., without an FFT computation from scratch. This is achieved with the following algorithm: Algorithm {accept, reject}←certify(b, x, pk): The algorithm picks a random κε custom-character . If Σ_i=0ⁿb_iκⁱ=Σ_i=1ⁿ(χ_i+κ), then the algorithm accepts, else it rejects.

Lemma 3 (Polynomial coefficients verification) Let b=[b_n, b_n-1, . . . , b₀] and x=[χ₁, χ₂, . . . , χ_n]. Algorithm certify(b, x, pk) has O(n) complexity. Also, if accept←certify(b, x, pk), then b_n, b_n-1, . . . , b₀are the coefficients of the polynomial Π_i=1ⁿ(χ_i+s) with probability Ω(1−neg(k)).

Authenticated data structure scheme. We now defin our authenticated data structure scheme (ADS scheme), as well as the correctness and security properties it must satisfy.

Definition 1 (ADS scheme) Let D be any data structure that supports queries q and updates u. Let auth(D) denote the resulting authenticated data structure and d the digest of the authenticated data structure, i.e., a constant-size description of D. An ADS scheme A is a collection of the following six probabilistic polynomial-time algorithms:

- 1. {sk, pk}←genkey(1^k): On input the security parameter k, it outputs a secret key sk and a public key pk;
- 2. {auth(D₀), d₀}←setup(D₀, sk, pk): On input a (plain) data structure D₀and the secret and public keys, it computes the authenticated data structure auth(D₀) and the respective digest do of it;
- 3. {D_h+1, auth(D_h+1), d_h+1, upd}←update(u, D_h, auth(D_h), d_h, sk, pk): On input an update u on data structure D_h, the authenticated data structure auth(D_h), the digest d_h, and the secret and public keys, it outputs the updated data structure D_h+1along with the updated authenticated data structure auth(D₊₁), the updated digest d_h+1and some relative information upd;
- 4. {D₊₁, auth(D₊₁), d_h+1}←refresh(u, D_h, auth(D_h), d_h, upd, pk): On input an update u on data structure D_h, the authenticated data structure auth(D_h), the digest d_h, relative information upd (output by update), and the public key, it outputs the updated data structure D_h+1along with the updated authenticated data structure auth(D_h+1) and the updated digest d_h+1;
- 5. {Π(q), α(q)}←query(q, D_h, auth(D_h), pk): On input a query q on data structure D_h, the authenticated data structure auth(D_h) and the public key, it returns the answer α(q) to the query, along with a proof Π(q);
- 6. {accept, reject}←verify(q, α, H, d_h, pk): On input a query q, an answer α, a proof H, a digest d_hand the public key, it outputs either accept or reject.

Let {accept, reject}←check(q, α, D_h) be an algorithm that decides whether α is a correct answer for query q on data structure D_h(check is not part of the definitio of an ADS scheme). There are two properties that an ADS scheme should satisfy, namely correctness and security (intuition follows from signature schemes definitions)

Definition 2 (Correctness) Let custom-character be an ADS scheme {genkey, setup, update, refresh, query, verify}. We say that the ADS scheme is correct if for all kε for all {sk, pk}output by algorithm genkey, for all D_h, auth(D_h), d_houtput by one invocation of setup followed by polynomially-many invocations of refresh, where h≧0, for all queries q and for all ø(q), α(q) output by query(q, D_h, auth(D_h), pk), with all but negligible probability, whenever algorithm check(q, α(q), D_h) outputs accept, so does algorithm verify(q, ø(q), α(q), d_h, pk).

Definition 3 (Security) Let custom-character be an ADS scheme {genkey, setup, update, refresh, query, verify}, k be the security parameter ν(k) be a negligible function and {sk, pk}←genkey(1^k). Let also Adv be a probabilistic polynomial-time adversary that is only given pk. The adversary has unlimited access to all algorithms of custom-character except for algorithms setup and update to which he has only oracle access. The adversary picks an initial state of the data structure D₀and computes D₀, auth(D₀), d₀through oracle access to algorithm setup. Then, for i=0, . . . , h=poly(k), Adv issues an update μ_iin the data structure D_iand computes D_i+1, auth(D_i+i) and d_i+1through oracle access to algorithm update. Finally the adversary picks an index 0≦t≦h+1, and computes a query q, an answer α and a proof ø. We say that the ADS scheme custom-character is secure if for all kε for all {sk, pk}output by algorithm genkey, and for any probabilistic polynomial-time adversary Adv it holds that

$\begin{matrix} Pe [\begin{matrix} {q, Π, α, t} \leftarrow Adv (1^{k}, pk); & accept \leftarrow verify (q, α, Π, d_{t}, pk); \\ reject \leftarrow check (q, α, D_{t}) . \end{matrix}] \leq v (k) . & (2) \end{matrix}$

Construction and Algorithms

In this section we present an ADS scheme for set-operation verification The underlying data structure for which we design our ADS scheme is called sets collection, and can be viewed as a generalization of the inverted index [4] data structure.

Sets collection. Referring now to FIGS. 5, 6, 11 and 12, the sets collection data structure consists of m sets, denoted with S₁, S₂, . . . , S_m, each containing elements from a universe μ. Without loss of generality we assume that the universe custom-character is the set of nonnegative integers in the interval [m+1, p−1]−{s}⁷where p is k-bit prime, m is the number of the sets in our collection that has bit size O(log k), k is the security parameter and s is the trapdoor of the scheme (see algorithm genkey). A set S_idoes not contain duplicate elements, however an element χε custom-character can appear in more than one set. Each set is sorted and the total space needed is O(m+M), where M is the sum of the sizes of the sets. ⁷This choice simplifie the exposition; however, by using some collision-resistant hash function, universe can be set to −{s}.

In order to get some intuition, we can view the sets collection as an inverted index. In this view, the elements are pointers to documents and each set S_icorresponds to a term w_iin the dictionary, containing the pointers to documents where term w_iappears. In this case, m is the number of terms being indexed, which is typically in the hundreds of thousands, while M, bounded from below by the number of documents being indexed, is typically in the billions. Thus, the more general terms “elements” and “sets” in a sets collection can be instantiated to the more specifi “documents” and “terms”.

The operations supported by the sets collection data structure consist of updates and queries. An update is either an insertion of an element into a set or a deletion of an element from a set. An update on a set of size n takes O(log n) time. For simplicity, we assume that the number m of sets does not change after updates. A query is one of the following standard set operations: (i) Intersection. Given indices i₁, i₂, . . . , i_t, return set I=S_i₁∩S_i₂∩ . . . S_i_t; (ii) Union: Given indices i_l, i₂, . . . , i_t, return set U=S_i₁∪S_i₂∪ . . . ∪S_it; (iii) Subset query: Given indices i and j, return true if S_i⊂S_jand false otherwise; (iv) Set difference: Given indices i and j, return set D=S_i−S_j. For the rest of the paper, we denote with δ the size of the answer to a query operation, i.e., δ is equal to the size of I, U, or D. For a subset query, δ is O(1).

We next detail the design of an ADS scheme custom-character for the sets collection data structure. This scheme provides protocols for verifying the integrity of the answers to set operations in a dynamic setting where sets evolve over time through updates. The goal is to achieve optimality in the communication and verificatio complexity: a query with t parameters and answer size δ should be verifie with O(t+δ) complexity, and at the same time query and update algorithms should be efficien as well.

Setup and Updates

Referring to FIGS. 7-9, 14 and 15, we describe an ADS scheme custom-character ={genkey, setup, update, refresh, query, verify} for the sets collection data structure and we prove that its algorithms satisfy the complexities of Table 1. We begin with the algorithms that are related to the setup and the updates of the authenticated data structure.

Algorithm {sk, pk}←genkey(1^k): Bilinear pairing parameters (p, custom-character , , e, g) are picked and an element ε is chosen at random. Subsequently, an one-to-one function h(•):→ is used. This function simply outputs the bit description of the elements of according to some canonical representation of . Finally the algorithm outputs sk=s and pk={h(•), p, custom-character , , g, g}, where vector g contains values

g={g
^s
,g
^s
²
, . . . , g
^s
^q},

where q≧max{m, max_i=1, . . . , m{|S_i|}}. The algorithm has O(1) access complexity.

Algorithm {D₀, auth(D₀), d₀}←setup(D₀, sk, pk): Let D₀be our initial data structure, i.e., the one representing sets S₁, S₂, . . . , S_m. The authenticated data structure auth(D) is built as follows. First, for each set S_iits accumulation value acc(S_i)=g^Πχεsⁱ^(χ+s)is computed (see Section 6.1). Subsequently, the algorithm picks a constant 0≦ε≦1. Let T be a tree that has l=[1/e] levels and m leaves, numbered 1, 2, . . . , m, where m is the number of the sets of our sets collection data structure. Since T is a constant-height tree, the degree of any internal node of it is O(m^ε). We call such a tree an accumulation tree, which was originally introduced (combined with different cryptography) in [32]. For each node of the tree ν, the algorithm recursively computes the digest d(ν) of ν as follows. If ν is a leaf corresponding to set S_iwhere 1≦i≦m, the algorithm sets d(ν)=acc(S_i)⁽ⁱ⁺^s); here, raising value acc(S_i) to exponent i+s, under the constraint that i≦m, is done to also accumulate the index i of set S_i(and thus prove that acc(S_i) refers to S_i). If node ν is not a leaf, then

d(ν)=g^øwεN(ν)^{(h(d(w)+(s)),} (3)

where custom-character (ν) denotes the set of children of node ν. The algorithm outputs all the sets $ as the data structure D₀, and all the accumulation values acc(S_i) for 1≦i≦m, the tree T and all the digests d(ν) for all νεT as the authenticated data structure auth(D₀). Finally, the algorithm sets d₀=d(r) where r is the root of T, i.e., d₀is the digest of the authenticated data structure (define similarly as in a Merkle tree).⁸The access complexity of the algorithm is O(m+M) (for postorder traversal of T and computation of acc(S_i)), where M=Σ_i=1^m|S_i|. The group complexity of auth(D₀) is also O(m+M) since the algorithm stores one digest per node of T, T has O(m) nodes and there are M elements contained in the sets, as part of auth(D₀).

Algorithm {D_h+1, auth(D_h+1), d_h+1, upd}←update(u, D_h, auth(D_h), d_h, sk, pk): We consider the update date “insert element χε custom-character into set S_i” (note that the same algorithm could be used for element deletions). Let ν₀be the leaf node of T corresponding to set S_i. Let ν₀, ν₁, . . . , ν_lbe the path in T from node ν₀to the root of the tree, where l=┌1/ε┐. The algorithm initially sets d(ν₀)=acc(S_i)^(χ+s), i.e., it updates the accumulation value that corresponds to the updated set (note that in the case where χ is deleted from S_ithe algorithm sets d′(ν₀)=acc(S_i)^(χ+s)-1). Then the algorithm sets ⁸Digest d(r) is a “secure” succinct description of the set collection data structure. Namely, the accumulation tree protects the integrity of values acc(S_i), 1≦i≦m, and each accumulation value acc(S_i) protects the integrity of the elements contained in set S_i.

d′(ν_j)=d(ν_j)^(h(d′(ν^j-1^))+s)(h(d(ν^j-1))+s)^-1for j=1, . . . , l, (4)

where d(ν_j-1) is the current digest of ν_j-1and d′(ν_j-1) is the updated digest of ν_j-1.⁹All these newly computed values (i.e., the new digests) are stored by the algorithm. The algorithm then outputs the new digests d′(ν_j-1), j=1, . . . , l, along the path from the updated set to the root of the tree, as part of information upd. Information upd also includes χ and d(ν_l). The algorithm also sets d_h+1=d′(ν_l), i.e., the updated digest is the newly computed digest of the root of T. Finally the new authenticated data structure auth(D_h+1) is computed as follows: in the current authenticated data structure auth(D_h) that is input of the algorithm, the values d(ν_j-1) are overwritten with the new values d(ν_j-1) (j=1, . . . , l), and the resulting structure is included in the output of the algorithm. The number of operations performed is proportional to 1/ε, therefore the complexity of the algorithm is O(1). ⁹Note that these update computations are efficien because update has access to secret key s.

Algorithm {D_h+1, auth(D_h+1), d_h+1}←refresh(u, D_h, auth(D_h), d_h, upd, pk): We consider the update “insert element χε custom-character into set S_i”. Let ν₀be the node of T corresponding to set S_i. Let ν₀, ν₁, . . . , ν_lbe the path in T from node ν₀to the root of the tree. Using the information upd, the algorithm sets d(ν_j)=d′(ν_j) for j=0, . . . , l, i.e., it updates the digests that correspond to the updated path. Finally, it outputs the updated sets collection as D_h+1, the updated digests d(ν_j) (along with the ones that belong to the nodes that are not updated) as auth(D_h+1) and d′(ν_l) (contained in upd) as d_h+1.¹⁰The algorithm has O(1) complexity as the number of performed operations is O(1/e). ¹⁰Note that information upd is not required for the execution of refresh, but is rather used for efficien y. Without access to upd, algorithm refresh could compute the updated values d(ν_j) using polynomial interpolation, which would have O(m^εlog m) complexity (see Lemma 2).

Authenticity of Accumulation Values

Referring to FIG. 15, so far we have described the authenticated data structure auth(D_h) that our ADS scheme custom-character will use for set-operation verifications Overall, auth(D_h) comprises a set of m accumulation values acc(S_i), one for each set S_i, i=1, . . . , m, and a set of O(m) digests d(ν), one for each internal node ν of the accumulation tree T. Our proof construction and verificatio protocols for set operations (described in Section 6.2.3) make use of the accumulation values acc(S_i) (subject to which subset-containment witnesses can be defined) and therefore it is required that the authenticity of each such value can be verified Tree T serves this exact role by providing short correctness proofs for each value acc(S_i) stored at leaf i of T, this time subject to the (global) digest d_hstored at the root of T. We next provide the details related to proving the authenticity of acc(S_i).

The correctness proof ø_iof accumulation value acc(S_i), 1≦i≦m, is a collection of O(1) bilinear-map accumulator witnesses (as define in Section 6.1). In particular, ø_iis set to be the ordered sequence ø=(π₁, π₂, . . . , π_l), where π_jis the pair of the digest of node ν_j-1and a witness that authenticates ν_j-1, subject to node νj, in the path ν₀, ν₁, . . . , ν_ldefine by leaf ν₀storing accumulation value acc(S_i) and the root ν_lof T. Conveniently, π_jis define as π_j=(β_j, γ_j), where

β_j=d(ν_j-1) and γ_j=W_ν_j-1_(ν_j₎=g^øwεN(ν^j^)−{ν^j-1^}^{(h(d(w)+(s)).} (5)

Note that π_jis the witness for a subset of one element, namely h(d(ν_j-1)) (recall, d(ν₀)=acc(S_i)^(i+s)). Clearly, pair π_jhas group complexity O(1) and can be constructed using polynomial interpolation with O(m^εlog m) complexity, by Lemma 2 and since ν_jhas degree O(m^ε). Since ø_iconsists of O(1) such pairs, we conclude that the proof ø_ifor an accumulation value acc(S_i) can be constructed with O(m^εlog m) complexity and has O(1) group complexity. The following algorithms queryTree and verifyTree are used to formally describe the construction and respectively the verificatio of such correctness proofs. Similar methods have been described in [32].

Algorithm {ø_i, α_i}←queryTree(i, D_h, auth(D_h), pk): Let ν₀, ν₁, . . . , ν_lbe the path of T from the node storing acc(S_i) to the root of T. The algorithm computes ø_iby setting ø_i=(π₁, π₂, . . . , π_l), where π_j=(d(ν_j-1), W_ν_j-1_(ν_j₎) and W_ν_j-1_(ν_j₎is given in Equation 5 and computed by Lemma 2. Finally, the algorithm sets α_i=acc(S_i).

Algorithm {accept, reject}←verifyTree(i, α_i, ø_i, d_h, pk): Let the proof be ø_i=(π₁, π₂, . . . , π_l), where π_j=(β_j, γ_j). The algorithm outputs reject if one of the following is true: (i) e(β₁, g)≠e(α_i, gⁱg^s); or (ii) e (β_j, g)≠e(γ_j-1, g^h(β^j-1⁾g^s) for some 2≦j≦1; or (iii) e(d_h, g)≠e(γ_l, g^h(β^l⁾g^s). Otherwise, it outputs accept.

We finall provide some complexity and security properties that hold for the correctness proofs of the accumulated values. The following result is used as a building block to derive the complexity of our scheme and prove its security (Theorem 1).

Lemma 4 Algorithm queryTree runs with O(m^εlog m) access complexity and outputs a proof of O(1) group complexity. Moreover algorithm verifyTree has O(1) access complexity. Finally, for any adversarially chosen proof ø_i(1≦i≦m), if accept←verifyTree(i, α_i, ø_i, d_h, pk), then α_i=acc(S_i) with probability Ω(1−neg(k)).

Queries and Verification

Referring to FIGS. 16 and 17, with the correctness proofs of accumulation values at hand, we complete the description of our scheme custom-character by presenting the algorithms that are related to the construction and verificatio of proofs attesting the correctness of set operations. These proofs are efficientl constructed using the authenticated data structure presented earlier, and they have optimal size O(t+δ), where t and δ are the sizes of the query parameters and the answer. In the rest of the section, we focus on the detailed description of the algorithms for an intersection and a union query, but due to space limitations, we omit the details of the subset and the set difference query. We note, however, that the treatment of the subset and set difference queries is analogous to that of the intersection and union queries.

The parameters of an intersection or a union query are t indices i₁, i₂, . . . , i_t, with 1≦t≦m. To simplify the notation, we assume without loss of generality that these indices are 1, 2, . . . , t. Let n_idenote the size of set S_i(1≦i≦t) and let N=Σ_i=1^tn_i. Note that the size δ of the intersection or union is always O(N) and that operations can be performed with O(N) complexity, by using a generalized merge.

Intersection query. Let I=S₁∩S₂∩ . . . ∩S_t={y₁, y₂, . . . , y_{δ}. We express the correctness of the set intersection operation by means of the following two conditions:}

Subset Condition:

I⊂S
₁
custom-character
I⊂S
₂

. . .

I⊂S
_t; (6)

Completeness Condition:

(S₁−I)∩(S₂−I)∩ . . . ∩(S_t−I)=Ø. (7)

The completeness condition in Equation 7 is necessary since set I must contain all the common elements. Given an intersection I, and for every set S_j, 1≦i≦t, we defin the degree-n_jpolynomial

$\begin{matrix} P_{j} (s) = \prod_{x \in S_{j} - 1} (x + s) . & (8) \end{matrix}$

The following result is based on the extended Euclidean algorithm over polynomials and provides our core verificatio test for checking the correctness of set intersection.

Lemma 5 Set 1 is the intersection of sets S₁, S₂, . . . , S_tif and only if there exist polynomials q₁(s), q₂(s), . . . , q_t(s) such that q₁(s)P₁(s)+q₂(s)P₂(s)+ . . . +q_t(s)P_t(s)=1, where P_j(s), j=1, . . . , t, are defined in Equation 8. Moreover the polynomials q₁(s), q₂(s), . . . , q_t(s) can be computed with O(N log²N log log N) complexity.

Using Lemmas 2 and 5 we next construct efficien proofs for both conditions in Equations 6 and 7. In turn, the proofs are directly used to defin the algorithms query and verify of our ADS scheme custom-character for intersection queries.

Proof of subset condition. For each set S_j, 1≦j≦t, the subset witnesses W_I,j=g^P^j^(s)=g^øχεS^j-1^(z+s)are computed, each with O(n_jlog n_j) complexity, by Lemma 2. (Recall, W_I,jserves as a proof that I is a subset of set S_j) Thus, the total complexity for computing all t required subset witnesses is O(N log N), where N=Σ_i=1^tn_i.¹¹¹¹This is because Σn_jlog n_j≦log NΣn_j=N log N.

Proof of completeness condition. For each q_j(s), 1≦j≦t, as in Lemma 5 satisfying q₁(s)P₁(s)+q₂(s)P₂(s)+ . . . +q_t(s)P_t(s)=1, the completeness witnesses F_I,j=g^q^j^(s)are computed, by Lemma 5 with O(N log²N log log N) complexity.

Algorithm {Π(q), α(q)}←query(q, D_h, auth(D_h), pk) (Intersection): Query q consists of t indices {1, 2, . . . , t}, asking for the intersection I of S₁, S₂, . . . , S_t. Let I={y₁, y₂, . . . , y_δ}. Then α(q)=I, and the proof Π(q) consists of the following parts.

- 1. Coefficients b_δ, b_δ-1, . . . , b₀of polynomial (y₁+s)(y₂+s) . . . (y_δ+s) that is associated with the intersection I={y₁, y₂, . . . , y_δ}. These are computed with O(δ log δ) complexity (Lemma 2) and they have O(δ) group complexity.
- 2. Accumulation values acc(S_j), j=1, . . . , t, which are associated with sets S_j, along with their respective correctness proofs Π_j. These are computed by calling algorithm queryTree(j, D_h, auth(D_h), pk), for j=1, . . . , t, with O(tm^εlog m) total complexity and they have O(t) total group complexity (Lemma 4).
- 3. Subset witnesses W_1,j, j=1, . . . , t, which are associated with sets S_jand intersection I (see proof of subset condition). These are computed with O(N log N) complexity and have O(t) total group complexity (Lemma 2).
- 4. Completeness witnesses F_I,j, j=1, . . . , t, which are associated with polynomials q_j(s) of Lemma 5 (see proof of completeness condition). These are computed with O(N log²N log log N) complexity and have O(t) group complexity (Lemma 5).

Algorithm {accept, reject}←verify(q, α, Π, d_h, pk) (Intersection): Verifying the result of an intersection query includes the following steps.

- 1. First, the algorithm uses the coefficient b=[b_δ, b_δ-1, . . . , b₀] and the answer α(q)={y₁, y₂, . . . , y_δ} as an input to algorithm certify(b, α(q), pk), in order to certify the validity of b_δ, b_δ-1, . . . , b₀. If certify outputs reject, the algorithm also outputs reject.¹²This step has O(δ) complexity (Lemma 3). ¹²Algorithm certify is used to achieve optimal verificatio and avoid an O(δ log δ) FFT computation from scratch.
- 2. Subsequently, the algorithm uses the proof Π_jto verify the correctness of acc(S_j), by running algorithm verifyTree(j, acc(S_j), Π_j, d_h, pk) for j=1, . . . , t. If, for some j, verifyTree running on acc(S_j) outputs reject, the algorithm also outputs reject. This step has O(t) complexity (Lemma4).
- 3. Next, the algorithm checks the subset condition:¹³¹³Group element Π_i=0^δg^sⁱ^bⁱ=g^(y¹^+s)(y²^{+s) . . . (y}^δ^+s)is computed once with O(δ) complexity.

$\begin{matrix} e (\prod_{i = 0}^{δ} {(g^{s^{i}})}^{b_{i}}, W_{1, j}) \overset{?}{=} e (acc (S_{j}), g), for j = 1, \dots, t . & (9) \end{matrix}$

If, for some j, the above check on subset witness W_I,jfails, the algorithm outputs reject. This step has O(t+δ) complexity.

- 4. Finally, the algorithm checks the completeness condition:

$\begin{matrix} \prod_{j = 1}^{t} e (W_{1, j}, F_{1, j}) \overset{?}{=} e (g, g) . & (10) \end{matrix}$

If the above check on the completeness witnesses F_I,j=1≦j≦t, fails, the algorithm outputs reject. Or, if this relation holds, the algorithm outputs accept, i.e., it accepts α(q) as the correct intersection. This step has O(t) complexity.

Note that for Equation 10, it holds Π_j=1^te(W_I,j, F_I,j)=e(g, g)^Σ^j=1^r^q^j^(s)P^j^(s)=e(g, g) when all the subset witnesses W_I,j, all the completeness witnesses F_I,jand all the sets accumulation values acc(S_j) have been computed honestly, since q₁(s)P₁(s)+q₂(s)P₂(s)+ . . . +q_t(s)P_t(s)=1. This is a required condition for proving the correctness of our ADS scheme, as define in Definitio 2. We continue with the description of algorithms query and verify for the union query.

Union query. Let U=S₁∪S₂∪ . . . ∪S_t={y₁, y₂, . . . , y_δ}. We express the correctness of the set union operation by means of the following two conditions:

Membership Condition:

∀y_iε∪∃jε{1,2, . . . , t}: y_iεS_j; (11)

Superset Condition:

(∪⊃S₁) custom-character (∪⊃S₂) . . . (∪⊃S_t). (12)

The superset condition in Equation 12 is necessary since set U must exclude none of the elements in sets S₁, S₂, . . . , S_t. We formally describe algorithms query and verify of our ADS scheme custom-character for union queries.

Algorithm {Π(q), α(q)}←query(q, D_h, auth(D_h), pk) (Union): Query q asks for the union ∪ of t sets S₁, S₂, . . . , S_t. Let ∪={y₁, y₂, . . . , y_δ}. Then α(q)=∪ and the proof Π(q) consists of the following parts. (1) Coefficients b_δ, b_δ-1, . . . , b₀of polynomial (y_1+s)(y_{2+s) . . . (y}_{δ+s) that is associated with the union ∪={y}₁, y₂, . . . , y_δ}. (2) Accumulation values acc(S_j), j=1, . . . , t, which are associated with sets S_j, along with their respective correctness proofs Π_j, both output of algorithm queryTree(j, D_h, auth(D_h), pk). (3) Membership witnesses W_y_i_{, S}_kof y_i, i=1, . . . , δ (see Equation 1), which prove that y_ibelongs to some set S_k, 1≦k≦t, and which are computed with O(N log N) total complexity and have O(δ) total group complexity (Lemma 2). (4) Subset witnesses W_S_j_,∪, J=1, . . . , t, which are associated with sets S_iand union ∪ and prove that ∪ is a superset of S_j, 1≦k≦t, and which are computed with O(N log N) total is complexity and have O(t) total group complexity (Lemma2).

Algorithm {accept, reject}←verify(q, α, Π, d_h, pk): (Union): Verifying the result of a union query includes the following steps. (1) First, the algorithm uses b=[b_δ, b_δ-1, . . . , b₀] and the answer ∪=α(q)={y₁, y₂, . . . , y_δ} as an input to algorithm certify(b, α(q), pk), in order to certify the validity of b_δ, b_δ-1, . . . , b₀. (2) Subsequently, the algorithm uses the proofs ø_jto verify the correctness of acc(S_j), by using algorithm verifyTree(j, acc(S_j), Π_j, d_h, pk) for j=1, . . . , t. If the verificatio fails for at least one of acc(S_j), the algorithm outputs reject. (3) Next, the algorithm verifie that each element y, i=1, . . . , δ, of the reported union belongs to some set S_k, for some 1≦k≦t (O(δ) complexity). This is done by checking that relation e(W_y_i, S_k, g^yⁱg^S)=e(acc(S_k), g) holds for all i=1, . . . , δ; otherwise the algorithm outputs reject. (4) Finally, the algorithm verifie that all sets specifie by the query are subsets of the union, by checking the following conditions:

$e (W_{S_{j}, U}, acc (S_{j})) \overset{?}{=} e (\prod_{i = 0}^{δ} {(g^{s^{i}})}^{b_{i}}, g), for j = 1, \dots, t .$

If any of the above checks fails, the algorithm outputs reject, otherwise, it outputs accept, i.e., ∪ is accepted as the correct union.

Subset and set difference query. For a subset query (positive or negative), we use the property S_i⊃S_j custom-character ∀_yεS_i, yεS_j. For a set difference query we use the property

D=S
_i
−S
_j
custom-character
∃F:F∪D=S
_i

F=S
_i
∩S
_j.

The above conditions can both be checked in an operation-sensitive manner using the techniques we have presented before. We now give the main result in our work.

Theorem 1 Consider a collection of m sets S₁, . . . , S_mand let M=Σ_i=1^m|S_i| and 0≦ε≦1. For a query operation involving t sets, let N be the sum of the sizes of the involved sets, and δ be the answer size. Then there exists an ADS scheme custom-character ={genkey, setup, update, refresh, query, verify} for a sets collection data structure D with the following properties: (1) is correct and secure according to Definitions 2 and 3 and based on the bilinear q-strong Diffie-Hellman assumption; (2) The access complexity of algorithm (i) genkey is O(1); (ii) setup is O(m+M); (iii) update is O(1) outputting information upd of O(1) group complexity; (iv) refresh is O(1); (3) For all queries q (intersection/union/subset/difference), constructing the proof with algorithm query has O(N log²N log log N+tm^εlog m) access complexity, algorithm verify has O(t+δ) access complexity and the proof Π(q) has O(t+δ) group complexity; (4) The group complexity of the authenticated data structure auth(D) is O(m+M).

Security, Protocols and Applications

In this section we give an overview of the security analysis of our ADS scheme, describe how it can be employed to provide verificatio protocols in the three-party [36](FIG. 2) and two-party [30] authentication models (FIG. 1), and finall discuss some concrete applications.

Security proof sketch. We provide some key elements of the security of our verificatio protocols focusing on set intersection queries. The security proofs of the other set operations share similar ideas. Let D₀be a sets collection data structure consisting of m sets S₁, S₂, . . . , S_m,¹⁴and consider our ADS scheme custom-character ={genkey, setup, update, refresh, query, verify}. Let k be the security parameter and let {sk, pk}←genkey(1^k). The adversary is given the public key pk, namely {h(•), p, e, g, g^s, . . . g^s^q}, and unlimited access to all the algorithms of , except for setup and update to which he only has oracle access. The adversary initially outputs the authenticated data structure auth(D₀) and the digest d₀, through an oracle call to algorithm setup. Then the adversary picks a polynomial number of updates μ_t(e.g., insertion of an element χ into a set S_r) and outputs the data structure D_i, the authenticated data structure auth(D_i) and the digest d_ithrough oracle access to update. Then he picks a set of indices q={1, 2, . . . , t}(wlog), all between 1 and m and outputs a proof Π(q) and an answer custom-character ≠S₁∩S₂∩ . . . ∩S_twhich is rejected by check as incorrect. Suppose the answer α(q) contains d elements. The proof Π(q) contains (i) Some coefficient b₀, b₁, . . . , b_d; (ii) Some accumulation values acc_jwith some respective correctness proofs Π_j, for j=1, . . . , t; (iii) Some subset witnesses W_jwith some completeness witnesses F_j, for j=1, . . . , t (this is, what algorithm verify expects for input). ¹⁴Note here that since the sets are picked by the adversary, we have to make sure that no element in any set is equal to s, the trapdoor of the scheme (see definitio of the bilinear-map accumulator domain). However, this event occurs with negligible probability since the sizes of the sets are polynomially-bounded and s is chosen at random from a domain of exponential size.

Suppose verify accepts. Then: (i) By Lemma 3, b₀, b₁, . . . , b_dare indeed the coefficient of the polynomial Π_χεχ^(χ+s), except with negligible probability; (ii) By Lemma 4, values acc_jare indeed the accumulation values of sets S_j, except with negligible probability; (iii) By Lemma 1, values W_jare indeed the subset witnesses for set custom-character (with reference to S_j), i.e., W_j=g^P^j^(s)percent with negligible probability; (iv) However, P₁(s), P₂(s), . . . , P_t(s) are not coprime since is incorrect and therefore cannot contain all the elements of the intersection. Thus the polynomials P₁(s), P₂(s), . . . , P_t(s) (Equation 8) have at least one common factor, say (r+s) and it holds that P_j(s)=(r+s)Q_j(s) for some polynomials Q_j(s) (computable in polynomial time), for all j=1, . . . , t. By the verificatio of Equation 10 (completeness condition), we have

$\begin{matrix} e (g, g) = \prod_{j = 1}^{t} e (W_{j}, F_{j}) \\ = \prod_{j = 1}^{t} e (g^{P_{j} (s)}, F_{j}) \\ = \prod_{j = 1}^{t} e (g^{(r + s) Q_{j} (s)}, F_{j}) \\ = \prod_{j = 1}^{t} {e (g^{Q_{j} (s)}, F_{j})}^{(r + s)} \\ = (\prod_{j = 1}^{t} e (g^{Q_{j} (s)}, F_{j})) . \end{matrix}$

Therefore we can derive an (r+s)-th root of e(g, g) as

${e (g, g)}^{\frac{1}{r + s}} = \prod_{j = 1}^{t} e (g^{Q_{j} (s)}, F_{j}) .$

This means that if the intersection custom-character is incorrect and all the verificatio tests are satisfied we can derive a polynomial-time algorithm that outputs a bilinear q-strong Diffie-Hellma challenge (r, e(g, g)^1/(r+s)) for an element r that is a common factor of the polynomials P₁(s), P₂(s), . . . , P_t(s), which by Assumption 1 happens with probability neg(k). This concludes an outline of the proof strategy for the case of intersection.

Protocols. As mentioned in the introduction, our ADS scheme custom-character can be used by a verificatio protocol in the three-party model [36](See FIG. 2). Here, referring to FIG. 5, a trusted entity, called source, owns a sets collection data structure D_h, but desires to outsource query answering, in a trustworthy (verifiable way. As shown in FIG. 6, the source runs genkey and setup and outputs the authenticated data structure auth(D_h) along with the digest d_h. The source subsequently signs the digest d_h, and it outsources auth(D_h), D_h, the digest d_hand its signature to some untrusted entities, called servers as shown in FIG. 7. On input a data structure query q (e.g., an intersection query) sent by clients, the servers use auth(D_h) and D_hto compute proofs Π(q), by running algorithm query, and they return to the clients Π(q) and the signature on d_halong with the answer α(q) to q (See FIGS. 8 and 9). Clients can verify these proofs Π(q) by running algorithm verify (since they have access to the signature of d_h, they can verify that d_his authentic). When there is is an update in the data structure (issued by the source), the source uses algorithm update to produce the new digest d′_hto be used in next verifications while the servers update the authenticated data structure through refresh.

Additionally, our ADS scheme custom-character can also be used by a non-interactive verificatio protocol in the two-party model [30] as shown in FIG. 2. In this case, the source and the client coincide, i.e., the client issues both the updates and the queries, and it is required to keep only constant state, i.e., the digest of the authenticated data structure. Whenever there is an update by the client, the client retrieves a verifiable constant-size portion of the authenticated data structure that is used for locally performing the update and for computing the new local state, i.e., the new digest. A non-interactive two-party protocol that uses an ADS scheme for a data structure D is directly comparable with the recent protocols for verifiabl computing [1, 12, 16] for the functionalities offered by the data structure D, e.g., computation of intersection, union, etc. Due to space limitations, we defer the detailed description of these protocols to the full version of the paper.

Furthermore, our ADS scheme custom-character can also be used by a non-interactive verificatio protocol in the multi-party model [30] as shown in FIG. 3. That is, where there is more than one trusted source. In this instance, the multiple sources must synchronize together whenever there is an update by one of them to the server in order that they maintain a consistent collection of sets and can produce a digest that is verifiabl across all sets.

Applications. First of all, our scheme can be used to verify keyword-search queries implemented by the inverted index data structure [4]: Each term in the dictionary corresponds to a set in our sets collection data structure which contains all the documents that include this term. A usual text query for terms m₁and m₂returns those documents that are included in both the sets that are represented by m₁and m₂, i.e., their intersection. Moreover, the derived authenticated inverted index can be efficientl updated as well. However, sometimes in keyword searches (e.g., keyword searches in the email inbox) it is desirable to introduce a “second” dimension: For example, a query could be “return emails that contain terms m₁and m₂and which were received between time t₁and t₂”, where t₁<t₂. We call this variant a timestamped keyword-search, which is shown in FIG. 4. One solution for verifying such queries could be to embed a timestamp in the documents (e.g., each email message) and have the client do the filterin locally, after he has verified—usin our scheme—the intersection of the sets that correspond to terms m₁and m₂. However, this approach is not operation-sensitive: The intersection can be bigger than the set output after the local filtering making this solution inefficient To overcome this inefficien y, we can use a segment-tree data structure [35], verifying in this way timestamped keyword-search queries efficientl with O(t log r+δ) complexity, where r is the total number of timestamps we are supporting. This involves building a binary tree T on top of sets of messages sent at certain timestamps and requiring each internal node of T be the union of messages stored in its children. Finally, our method can be used for verifying equi-join queries over relational tables, which boil down to set intersections.

Conclusion

In this paper, we presented an authenticated data structure for the optimal verificatio of set operations. The achieved efficien y is mainly due to new, extended security properties of accumulators based on pairing-based cryptography. Our solution provides two important properties, namely public verifiability and dynamic updates, as opposed to existing protocols in the verifiabl computing model that provide generality and secrecy, but verifiability in a static, secret-key setting only.

A natural question to ask is whether outsourced verifiabl computations with secrecy and efficient dynamic updates are feasible. Analogously, it is interesting to explore whether other specifi functionalities (beyond set operations) can be optimally and publicly verified Finally, according to a recently proposed definitio of optimality [33], our construction is nearly optimal: verificatio and updates are optimal, but not queries. It is interesting to explore whether an optimal authenticated sets collection data structure exists, i.e., one that asymptotically matches the bounds of the plain sets collection data structure, reducing the query time from O(N log²N) to O(N).

It would be appreciated by those skilled in the art that various changes and modification can be made to the illustrated embodiments without departing from the spirit of the present invention. All such modification and changes are intended to be within the scope of the present invention except as limited by the scope of the appended claims.

REFERENCES

[1]B. Applebaum, Y. Ishai, and E. Kushilevitz. From secrecy to soundness: Efficien verificatio via secure computation. In Int. Colloquium on Automata, Languages and Programming (ICALP), pp. 152-163, 2010.

[2]M. J. Atallah, Y. Cho, and A. Kundu. Efficien data authentication in an environment of untrusted third-party distributors. In Int. Conference on Data Engineering (ICDE), pp. 696-704, 2008.

[3]M. H. Au, P. P. Tsang, W. Susilo, and Y. Mu. Dynamic universal accumulators for DDH groups and their application to attribute-based anonymous credential systems. In RSA, Cryptographers' Track (CT-RSA), pp. 295-308, 2009.

[4]R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Publishing Company, Reading, Mass., 1999.

[5]M. Bellare and D. Micciancio. A new paradigm for collision-free hashing: Incrementality at reduced cost. In Advances in Cryptology (EUROCRYPT), pp. 163-192, 1997.

[6]S. Benabbas, R. Gennaro, and Y. Vahlis. Verifiabl delegation of computation over large datasets. In Int. Cryptology Conference (CRYPTO), 2011.

[7]M. Blum, W. S. Evans, P. Gemmell, S. Kannan, and M. Naor. Checking the correctness of memories. Algorithmica, 12(2/3):225-244, 1994.

[8]D. Boneh and X. Boyen. Short signatures without random oracles and the SDH assumption in bilinear groups. J. Cryptology, 21(2):149-177, 2008.

[9]D. Boneh and B. Waters. Conjunctive, subset, and range queries on encrypted data. In Theoretical Cryptography Conference (TCC), pp. 535-554, 2007.

[10]S. Canard and A. Gouget. Multiple denominations in e-cash with compact transaction data. In Financial Cryptography (FC), pp. 82-97, 2010.

[11]K.-M. Chung, Y. Kalai, F.-H. Liu, and R. Raz. Memory delegation. In Int. Cryptology Conference (CRYPTO), 2011.

[12]K.-M. Chung, Y. Kalai, and S. Vadhan. Improved delegation of computation using fully homomorphic encryption. In Int. Cryptology Conference (CRYPTO), pp. 483-501, 2010.

[13]I. Damgard and N. Triandopoulos. Supporting non-membership proofs with bilinear-map accumulators. Cryptology ePrint Archive, Report 2008/538, 2008. http://eprint.iacr.org/.

[14]C. Dwork, M. Naor, G. N. Rothblum, and V. Vaikuntanathan. How efficien can memory checking be? In Theoretical Cryptography Conference (TCC), pp. 503-520, 2009.

[15]M. J. Freedman, K. Nissim, and B. Pinkas. Efficien private matching and set intersection. In Advances in Cryptology (EUROCRYPT), pp. 1-19, 2004.

[16]R. Gennaro, C. Gentry, and B. Parno. Non-interactive verifiabl computing: Outsourcing computation to untrusted workers. In Int. Cryptology Conference (CRYPTO), pp. 465-482, 2010.

[17]M. T. Goodrich, R. Tamassia, and J. Hasic. An efficien dynamic and distributed cryptographic accumulator. In Information Security Conference (ISC), pp. 372-388, 2002.

[18]M. T. Goodrich, R. Tamassia, and A. Schwerin. Implementation of an authenticated dictionary with skip lists and commutative hashing. In DARPA Information Survivability Conference and Exposition II (DISCEXII), pp. 68-82, 2001.

[19]M. T. Goodrich, R. Tamassia, and N. Triandopoulos. Super-efficien verificatio of dynamic outsourced databases. In RSA, Cryptographers' Track (CT-RSA), pp. 407-424, 2008.

[20]M. T. Goodrich, R. Tamassia, and N. Triandopoulos. Efficien authenticated data structures for graph connectivity and geometric search problems. Algorithmica, 60(3):505-552, 2011.

[21]D. Kratsch, R. M. McConnell, K. Mehlhorn, and J. P. Spinrad. Certifying algorithms for recognizing interval graphs and permutation graphs. In Symposium on Discrete Algorithms (SODA), pp. 158-167, 2003.

[22]J. Li, N. Li, and R. Xue. Universal accumulators with efficien nonmembership proofs. In Applied Cryptography and Network Security (ACNS), pp. 253-269, 2007.

[23]C. Martel, G. Nuckolls, P. Devanbu, M. Gertz, A. Kwong, and S. G. Stubblebine. A general model for authenticated data structures. Algorithmica, 39(1):21-41, 2004.

[24]R. C. Merkle. A certifie digital signature. In Int. Cryptology Conference (CRYPTO), pp. 218-238, 1989.

[25]Y. Minsky, A. Trachtenberg, and R. Zippel. Set reconciliation with nearly optimal communication complexity. IEEE Transactions on Information Theory, 49(9):2213-2218, 2003.

[26]R. Morselli, S. Bhattacharjee, J. Katz, and P. J. Keleher. Trust-preserving set operations. In Int. Conference on Computer Communications (INFOCOM), 2004.

[27]M. Naor and K. Nissim. Certificat revocation and certificat update. In USENIX Security Symposium, pp. 217-228, 1998.

[28]L. Nguyen. Accumulators from bilinear pairings and applications. In RSA, Cryptographers' Track (CT-RSA), pp. 275-292, 2005.

[29]H. Pang and K.-L. Tan. Authenticating query results in edge computing. In Int. Conference on Data Engineering (ICDE), pp. 560-571, 2004.

[30]C. Papamanthou and R. Tamassia. Time and space efficien algorithms for two-party authenticated data structures. In Int. Conference on Information and Communications Security (ICICS), pp. 1-15, 2007.

[31]C. Papamanthou and R. Tamassia. Cryptography for efficien y: Authenticated data structures based on lattices and parallel online memory checking. Cryptology ePrint Archive, Report 2011/102, 2011. http://eprint.iacr.org/.

[32]C. Papamanthou, R. Tamassia, and N. Triandopoulos. Authenticated hash tables. In Int. Conference on Computer and Communications Security (CCS), pp. 437-448, 2008.

[33]C. Papamanthou, R. Tamassia; and N. Triandopoulos. Optimal authenticated data structures with multilinear forms. In Int. Conference on Pairing-Based Cryptography (PAIRING), pp. 246-264, 2010.

[34]F. P. Preparata and D. V. Sarwate. Computational complexity of Fourier transforms over finit fields Mathematics of Computation, 31(139):pp. 740-751, 1977.

[35]F. P. Preparata and M. I. Shamos. Computational Geometry: An Introduction. Springer-Verlag, New York, N.Y., 1985.

[36]R. Tamassia. Authenticated data structures. In European Symp. on Algorithms (ESA), pp. 2-5, 2003.

[37]R. Tamassia and N. Triandopoulos. Certificatio and authentication of data structures. In Alberto Mendelzon Workshop on Foundations of Data Management, 2010.

[38]Y. Yang, D. Papadias, S. Papadopoulos, and P. Kalnis. Authenticated join processing in outsourced databases. In Int. Conf. on Management of Data (SIGMOD), pp. 5-18, 2009.

[39]M. L. Yiu, Y. Lin and K. Mouratidis. Efficien verificatio of shortest path search via authenticated hints. In Int. Conference on Data Engineering (ICDE), pp. 237-248, 2010.

SYSTEM AND METHOD FOR OPTIMAL VERIFICATION OF OPERATIONS ON DYNAMIC SETS

Information

Publication Number

Date Filed

Date Published

Inventors

CPC

US Classifications

International Classifications

Abstract

Description

Claims

CROSS-REFERENCE TO RELATED APPLICATION

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

Provisional Applications (1)