The present disclosure relates to outsourced data stream processing and, specifically, to validating outsourced data stream processing using an interactive proof.
Data processing involved with streaming data warehouses may involve processing operations on an arriving data stream. The data processing may be associated with a heavy infrastructure burden to handle large-volume data streams. An owner of the data stream may accordingly choose to outsource the data processing to a data service provider. The data stream owner may desire validation that the data service provider has correctly performed processing operations on the outsourced data stream.
In a first aspect, disclosed methods, systems, devices, and software enable validating outsourced processing of a data stream arriving at a streaming data warehouse of a data service provider. In one embodiment, a verifier acting on behalf of a data owner of the data stream may, using a proof protocol, interact with a prover acting on behalf of the data service provider. The verifier may calculate a first root hash value of a binary tree during single-pass processing of the original data stream. A second root hash value may be calculated using the proof protocol between the verifier and the prover. The prover may be requested to provide certain queried values before receiving random numbers used to generate subsequent responses dependent on the provided values. The proof protocol may be used to validate the data processing performed by the data service provider.
In one aspect, a disclosed method for validating a data stream arriving at a data service provider includes sending a query to a prover associated with the data service provider for a first data block associated with the data stream, and receiving, from the prover, the first data block and a sibling data block. The sibling data block may be a sibling of the first data block in a lowest level of a binary tree of hash values having a number of levels. The method may further include calculating a current hash value using the first data block, the sibling data block, and a first random number. The first random number may be kept confidential with respect to the prover. After calculating the first hash value, the method may further include processing a next level in the binary tree of hash values. The method operation of processing the next level may include a) sending, to the prover, a current random number and a request for a sibling hash value, b) receiving the sibling hash value, c) determining a subsequent random number, and d) calculating a subsequent hash value using the current hash value, the sibling hash value, and the subsequent random number. The subsequent random number may be kept confidential with respect to the prover.
In particular embodiments, the method may further include recursively repeating operations a)-d) for successive levels in the binary tree, using a preceding parent hash value as the current hash value and a new random number as the current random number, until a root hash value for the binary tree is obtained. The number of levels in the binary tree may depend on a number of data sources associated with the data stream. The method operation of sending the query to the prover for the first data block may include sending a query selected from the group of data stream queries consisting of: index, predecessor, dictionary, range, range-sum, self join size, frequency moment, and inner product.
In certain embodiments, the method may further include comparing the obtained root hash value and a previously stored root hash value for a match. When the match is observed, the method may include validating the data stream and that the prover correctly provided the first data block. When the match is not observed, the method may include invalidating the data stream and determining that the prover did not correctly provide at least one value associated with the binary tree. Based on the number of levels in the binary tree, the method may include calculating a root hash value for the data stream arriving at the data service provider, and storing the calculated root hash value.
In another aspect, a disclosed computer system for validating a data stream includes a processor configured to access memory media. The memory media may include instructions executable by the processor to store a first root hash for the data stream, wherein the first root hash includes information for a number of levels in a binary tree, calculate a second root hash for the data stream using a proof protocol and a prover associated with a data service provider receiving the data stream, and determine whether the first root hash matches the second root hash. The proof protocol may represent instructions executable by the processor to perform the following operations: a) receive, from the prover, a first data block in the binary tree and a second data block that is a sibling of the first data block, b) calculate a first hash value using the first data block, the second data block, and a first random number that may be kept confidential with respect to the prover, and c) after calculating the first hash value, calculate a hash value for a next level in the binary tree. Calculating the hash value for the next level may further include processor executable instructions to send, to the prover, a random number used to calculate a previous hash value and a request for a sibling hash value, receive the sibling hash value from the prover, and calculate a parent hash value using the previous hash value, the sibling hash value, and a next random number that may be kept confidential with respect to the prover. The proof protocol may also represent instructions executable by the processor to further recursively execute operation c) above for successive levels in the binary tree, using a preceding parent hash value as the previous hash value and a new random number as the next random number, until the second root hash value for the binary tree is obtained.
In given embodiments, the memory media may further include instructions executable by the processor to validate that the data service provider is accurately processing the data stream when the first root hash matches the second root hash. When the first root hash does not match the second root hash, the processor executable instructions may be executable to return a fail result for the proof protocol. The number of levels in the binary tree may depend on a number of data sources associated with the data stream, while each data source may contribute a respective data block to the data stream. The proof protocol may further include instructions executable by the processor to send, prior to operation a), a query to the prover for the first data block. The query may include a query selected from the group of data stream queries consisting of: index, predecessor, dictionary, range, range-sum, self join size, frequency moment, and inner product. A hash value may be calculated using values for a sibling pair, along with a random variable R and a prime number P. A range for R and a value for P may be selected based on a desired degree of security for the proof protocol, while a larger range of R and a greater value of P may increase the degree of security.
In yet another aspect, disclosed computer-readable memory media include instructions for validating a data stream arriving at a data service provider. The instructions may be executable to store a first root hash value for the data stream, and calculate a second root hash value for the data stream using a proof protocol and a prover associated with the data service provider. The first root hash value may be derived based on a number of levels in a binary tree. The proof protocol may include instructions executable to perform the following operations: a) receive, from the prover, a first data block in the binary tree and a second data block that is a sibling of the first data block, b) calculate a first hash value using the first data block, the second data block, and a first random number that is kept confidential with respect to the prover, and c) after calculating the first hash value, process a next level in the binary tree. Processing the next level may further include instructions executable to send, to the prover, a random number used to calculate a current hash value, receive a sibling hash value from the prover, and calculate a parent hash value using the current hash value, the sibling hash value, and a next random number that may be kept confidential with respect to the prover. The proof protocol may also include instructions executable to recursively execute operation c) above for successive levels in the binary tree, using a preceding parent hash value as the current hash value and a new random number as the next random number, until the second root hash value for the binary tree is obtained. The memory media may further include instructions executable to determine whether the first root hash value matches the second root hash value.
In some embodiments, the memory media may further include instructions executable to validate that the prover accurately reproduced data blocks associated with the data stream when the first root hash matches the second root hash, and return a fail result for the proof protocol when the first root hash value does not match the second root hash value. The proof protocol may further include instructions executable to send, prior to operation a), an indication of the proof protocol to the prover specifying instructions executable by the prover to respond to the proof protocol.
In particular embodiments, the memory media may still further include instructions executable to generate new random numbers for calculating the first root hash value for respective levels in the binary tree after executing the proof protocol, and record an indication of the random numbers associated with the first root hash value for use in subsequent iterations of the proof protocol.
In the following description, details are set forth by way of example to facilitate discussion of the disclosed subject matter. It should be apparent to a person of ordinary skill in the field, however, that the disclosed embodiments are exemplary and not exhaustive of all possible embodiments.
Throughout this disclosure, a hyphenated form of a reference numeral refers to a specific instance of an element and the un-hyphenated form of the reference numeral refers to the element generically or collectively. Thus, for example, widget 12-1 refers to an instance of a widget class, which may be referred to collectively as widgets 12 and any one of which may be referred to generically as a widget 12.
Turning now to the drawings,
Streaming data processing system 100 may represent data processing that can be performed using a streaming data warehouse (not explicitly shown in
In
In
Also shown in
As shown in
The queries sent by verifier 112 to prover 114 may be any one or more of a variety of data stream queries that involve obtaining information about data stream 130. Examples of data stream queries that may be requested by verifier 112 include: index queries, predecessor queries, dictionary queries, range queries, range-sum queries, self join size queries, frequency moment queries, and inner product queries, which are defined in more detail below. Such data stream queries, among other queries that may be sent, make proof protocol 110 suitable for a wide variety of possible applications. In the example data stream queries described below, the universe from which all data elements are drawn is defined as [u]={0, . . . , u−1}.
INDEX QUERY: Given a data stream of u elements b1, . . . , bu, and an index q, the correct response is bq.
PREDECESSOR QUERY: Given a data stream of n elements in universe [u] and a query qε[u], the correct response is the largest p in the data stream such that p≦q. It is assumed that 0 appears in [u].
DICTIONARY QUERY: Given a data stream of n≦u elements of (key, value) pairs, where both the key and the value are drawn from universe [u] and all keys are distinct, and a query qε[u]. When q is one of the keys, the correct response is the value paired with q. When q is not one of the keys, the correct response is “not found”.
RANGE QUERY: Given a data stream of n elements in universe [u] and a range query [qL, qR], the correct response is the set of all elements in the data stream between qL and qR inclusive.
RANGE-SUM QUERY: Given a data stream of n≦u elements of (key, value) pairs, where both the key and the value are drawn from universe [u] and all keys are distinct, and a range query [qL, qR], the correct response is the sum of all values in the data stream between qL and qR inclusive.
SELF JOIN SIZE QUERY: Given a data stream of n elements in universe [u], the correct response is the result of the computation
where ai is the number or occurrences of i in the data stream. This query may also be referred to as the second frequency moment.
k-th FREQUENCY MOMENT QUERY: Given a data stream of n elements in universe [u], and an integer k≧1, the correct response is the result of the computation
where ai is the number of occurrences of i in the data stream.
INNER PRODUCT (JOIN SIZE) QUERY: Given two data streams A and B in universe [u], having respective frequency vectors (a1, . . . , au) and (b1, . . . bu), the correct response is the result of the computation
Referring to
It is noted that one example of data sources 110 contributing to data stream 130, as shown in
In
(X+R*Y)modulo P Equation [1],
where R is a random number and P is a prime number. It will be appreciated that Equation 1 will result in values in the range of (0, P−1). As the values for P and R are increased, a level of security associated with the resulting hash value given by Equation 1 will also increase, since the likelihood that a given hash value could be guessed or could result from a different value in Equation 1 will decrease. The random number R is the same for a given level of binary tree 200. In
([203−3]+RND1*[203−4])modulo P Equation [2],
where [203−3] represents a value (X) of data block 203-3, [203−4] represents a value (Y) of data block 203-4, RND1 is a random number for the second level, and P is a prime number.
In
Turning now to
In
Then proof phase 332, as shown in
Turning now to
Method 400 may begin by configuring (operation 402) a verifier for single-pass processing of a data stream sent for processing to a data service provider, including calculating a current root hash value. It is noted that verifier 112 may compute the current root hash value according to binary tree 200 (see
Turning now to
Method 500 may begin by receiving (operation 502) a preceding hash value, and by receiving (operation 504) a preceding random number. The preceding hash value and the preceding random number may represent inputs to method 500. The preceding random number and a request for a sibling hash value for the preceding hash value may be sent (operation 506) to the prover. It is noted that, until operation 506 is performed, the preceding random number may be kept confidential from the prover. The sibling hash value may be received (operation 508) from the prover. A parent hash value may be calculated (operation 510) using the preceding hash value, the sibling hash value, and a next random number kept confidential with respect to the prover. Then, the parent hash value may be output (operation 512) and the next random number may be output (operation 514). The values output in one iteration of method 500 may be used as inputs in a successive iteration of method 500.
Turning now to
The first root hash value may be compared (operation 602) with the second root hash value for a match. Then, in method 600, a decision may be made whether the root hash values match (operation 604). When the result of operation 604 is YES, a proof pass indication may be recorded (operation 606). Then, method 600 may validate (operation 608) that the prover correctly provided the first data block (see operation 410 in
When the result of operation 604 is NO, then a proof fail indication may be recorded (operation 612). Method 600 may determine (operation 614) that the provider did not provide at least one accurate value in the binary tree. Method 600 may also determine (operation 616) which data blocks were not accurately represented by the prover.
After operation 610 or operation 616, new random numbers for the levels of the binary tree may be generated and associated (operation 618) with a subsequent first root hash. Operation 618 may be performed to keep random numbers used in the binary tree confidential from the prover.
Referring now to
Device 700, as depicted in
Device 700 is shown in
Memory media 710 encompasses persistent and volatile media, fixed and removable media, and magnetic and semiconductor media. Memory media 710 is operable to store instructions, data, or both. Memory media 710 as shown includes sets or sequences of instructions 724-2, namely, an operating system 712 and proof protocol 110. Operating system 712 may be a UNIX or UNIX-like operating system, a Windows® family operating system, or another suitable operating system. Instructions 724 may also reside, completely or at least partially, within processor 701 during execution thereof. It is further noted that processor 701 may be configured to receive instructions 724-1 from instructions 724-2 via shared bus 702. In some embodiments, memory media 710 is configured to store and provide executable instructions for executing proof protocol 110, as mentioned previously. For example, proof protocol 110 may be configured to execute proof protocol 300, method 400, method 500 and/or method 600. In certain embodiments, computing device 700 may represent an implementation of verifier 112 and/or data stream owner 102 (see
To the maximum extent allowed by law, the scope of the present disclosure is to be determined by the broadest permissible interpretation of the following claims and their equivalents, and shall not be restricted or limited to the specific embodiments described in the foregoing detailed description.
Number | Name | Date | Kind |
---|---|---|---|
4309569 | Merkle | Jan 1982 | A |
4881264 | Merkle | Nov 1989 | A |
6151676 | Cuccia et al. | Nov 2000 | A |
7165246 | de Jong | Jan 2007 | B2 |
7450032 | Cormode et al. | Nov 2008 | B1 |
7584223 | Pinkas et al. | Sep 2009 | B1 |
7584396 | Cormode et al. | Sep 2009 | B1 |
7590657 | Cormode et al. | Sep 2009 | B1 |
7657503 | Cormode et al. | Feb 2010 | B1 |
7668205 | Putnam | Feb 2010 | B2 |
7742424 | Cormode et al. | Jun 2010 | B2 |
7743145 | Kaler | Jun 2010 | B2 |
7756805 | Cormode et al. | Jul 2010 | B2 |
7783647 | Cormode et al. | Aug 2010 | B2 |
7864077 | Cormode et al. | Jan 2011 | B2 |
7885911 | Cormode et al. | Feb 2011 | B2 |
7979711 | Chen et al. | Jul 2011 | B2 |
8112802 | Hadjieleftheriou et al. | Feb 2012 | B2 |
8256015 | Gentry et al. | Aug 2012 | B2 |
8316417 | Di Battista et al. | Nov 2012 | B2 |
20040128392 | Blakley, III et al. | Jul 2004 | A1 |
20050131946 | Korn et al. | Jun 2005 | A1 |
20050281404 | Yu | Dec 2005 | A1 |
20060224609 | Cormode et al. | Oct 2006 | A1 |
20070136285 | Cormode et al. | Jun 2007 | A1 |
20070237410 | Cormode et al. | Oct 2007 | A1 |
20070240061 | Cormode et al. | Oct 2007 | A1 |
20070286071 | Cormode et al. | Dec 2007 | A1 |
20080095360 | Vuillaume et al. | Apr 2008 | A1 |
20080195583 | Hsu et al. | Aug 2008 | A1 |
20080270372 | Hsu et al. | Oct 2008 | A1 |
20090041253 | Chen et al. | Feb 2009 | A1 |
20090083418 | Krishnamurthy et al. | Mar 2009 | A1 |
20090132561 | Cormode et al. | May 2009 | A1 |
20090153379 | Cormode et al. | Jun 2009 | A1 |
20090172058 | Cormode et al. | Jul 2009 | A1 |
20090172059 | Cormode et al. | Jul 2009 | A1 |
20090292726 | Cormode et al. | Nov 2009 | A1 |
20100043057 | Di Battista et al. | Feb 2010 | A1 |
20100114989 | Cormode et al. | May 2010 | A1 |
20100132036 | Hadjieleftheriou et al. | May 2010 | A1 |
20100153064 | Cormode et al. | Jun 2010 | A1 |
20100153328 | Cormode et al. | Jun 2010 | A1 |
20100153379 | Cormode et al. | Jun 2010 | A1 |
20100212017 | Li et al. | Aug 2010 | A1 |
20100235362 | Cormode et al. | Sep 2010 | A1 |
20100268719 | Cormode et al. | Oct 2010 | A1 |
20100312872 | Cormode et al. | Dec 2010 | A1 |
20100318438 | Cormode et al. | Dec 2010 | A1 |
20110041184 | Cormode et al. | Feb 2011 | A1 |
20110066600 | Cormode et al. | Mar 2011 | A1 |
20110138264 | Cormode et al. | Jun 2011 | A1 |
20110145223 | Cormode et al. | Jun 2011 | A1 |
20120030468 | Papamanthou et al. | Feb 2012 | A1 |
20120110336 | Frey et al. | May 2012 | A1 |
Entry |
---|
Chakrabarti et al., Annotations in Data Streams, ICALP 2009, Part I, LNCS 5555, pp. 222-234, published 2009. |
Yi et al., Small Synopses for Group-by Query Verification on Outsourced Data Streams, ACM Transactions on Database Systems, vol. 34, No. 3, Article 15, Published Aug. 2009. |
Li et al., Authenticated Index Structures for Outsourced Databases, Handbook of Database Security, published 2008. |
Erway et al., Dynamic Provable Data Possession, CCS'09, published Oct. 2009. |
Number | Date | Country | |
---|---|---|---|
20120143830 A1 | Jun 2012 | US |