The present disclosure relates to validating data processing and, specifically, to validating priority queue processing.
Validation of data processing may involve processing operations on data streams, such as those associated with a priority queue. The data processing may be associated with an infrastructure for handling large-volume data streams. An owner of a data stream may choose to outsource the data processing to a data service provider. The data stream owner may desire validation that the outsourced data service provider has correctly performed processing operations on the data stream.
Disclosed are methods for validating outsourced processing of a priority queue. The methods may include configuring a verifier for independent, single-pass processing of priority queue operations that include insertion operations and extraction operations and priorities associated with each operation. The verifier may be configured to validate N operations using a memory space having a size that is proportional to the square root of N using an algorithm to buffer the operations as a series of R epochs. Extractions associated with each individual epoch may be monitored using arrays Y and Z. Insertions for the epoch k may monitored using arrays X and Z. The processing of the priority queue operations may be verified based on the equality or inequality of the arrays X, Y, and Z. Hashed values for the arrays may be used to test their equality to conserve storage requirements.
In one aspect, a disclosed method for validating a priority queue includes assigning a plurality of priority queue operations into a plurality of epochs. The priority queue operations include priority queue insertions and priority queue extractions. Each insertion and each extraction is associated with a corresponding priority. A set of variables is maintained to record information indicative of insertions and extractions assigned to the corresponding epochs. The set of variables may include two or more variables for each of the plurality of epochs. Correct operation of the priority queue may be validated based on the set of variables.
Assigning the plurality of priority queue operations to epochs may include buffering consecutive operations and canceling operation pairs, where an operation pair includes an insertion of a given priority and a subsequently occurring extraction, within the buffer, of the same given priority. Assigning priority queue operations to the plurality of epochs may include processing each of the plurality of epochs sequentially and maintaining an array indicative of a maximum priority of extractions assigned to the current epoch and to each of the previously processed epochs.
The plurality of priority queue operations may include N operations and, in some embodiments, assigning the plurality of priority queue operations to epochs results in the formation of R epochs where R is proportional to the square root of N. Hashing functions may be employed to produce hashing function representations of the set of variables. These representations of the set of variables may be stored or otherwise maintained to reduce storage requirements. In some embodiments, for example, the memory space required to process the plurality of priority queue operations is proportional or roughly proportional to the number of epochs. The hashing function may be a linear function with respect to an input on which it operates.
The set of variables include may include a set of three variable arrays including an array X, an array Y, and an array Z where each of the arrays is initialized to zero. The arrays may each include a set of rows including a row corresponding to each of the epochs and a set of columns including a column corresponding to each of the set of operation priorities.
Informally, X tracks the number of insertions of u assigned to epoch k before the first extraction of u that is assigned to epoch k. Y tracks the number of extractions of u assigned to epoch k minus the number of insertions of u assigned to epoch k from the first extraction of u assigned to epoch k onwards. A necessary condition is that these two counts should agree. However, this counting alone fails to detect extractions of u that appear before the corresponding insertions. Therefore, Z is used to identify the maximum “balance” of u during epoch k. This should also match X if the sequence is correct.
After each epoch is processed, f[k] may indicate a maximum value of a priority corresponding to an extraction that occurs after the k-th epoch. For each epoch k, the method may include processing each extraction in the epoch and then processing each insertion in the epoch. For each extraction in epoch k, the method may include assigning the extraction to an earliest epoch L consistent with its priority and the values of f[k], incrementing an element [L, u] of array Y, assigning a value max(Y[L, u], Z[L, u]) to an element [L, u] of array Z, and assigning a value max(u, f[i]) to f[i] for each i in the range of {1 . . . (k−1)}. For each insertion in epoch k, the method may include assigning the insertion to an earliest epoch L consistent with its priority and the values of f[k], incrementing an element [L, u] of array X when f[L]<u, and decrementing an element [L, u] of array Y when f[L]=u. Priority queue validation may then be performed based on simple manipulations of arrays X, Y, and Z.
The first epoch may exclude extractions and the final epoch may exclude insertions. In some implementations, each of the R epochs may be equally-sized with respect to a number of operations. In other cases, each of the R epochs may be equally-sized with respect to a period of time.
In some embodiments, validating that the priority queue operations were correctly processed occurs when X=array Z and array X=array Y. Representations of the arrays X, Y, and Z may be stored using an array hashing function to conserve storage requirements. The array hashing function may be a linear function with respect to an input array. In some cases, a memory space required for processing N priority queue operations may be proportional to the square root of N.
In another aspect, a disclosed computer system for validating priority queue operations includes a processor configured to access memory media that include processor executable instructions. The instructions include instructions to perform the method operations described above.
In some embodiments, disclosed methods include, for each epoch k, determining a set of operations I that are inserted but not extracted during epoch k, determining a set of operations E that are extracted without a matching insertion operation during epoch k, and cancelling out all remaining operations in epoch k except for the set I and the set E.
In the following description, details are set forth by way of example to facilitate discussion of the disclosed subject matter. It should be apparent to a person of ordinary skill in the field, however, that the disclosed embodiments are exemplary and not exhaustive of all possible embodiments.
Turning now to the drawings,
Streaming data processing system 100 may represent data processing that can be performed using a data repository, such as a streaming data warehouse (not explicitly shown in
In
In
Referring to
In
[(priority value),(data element)]
wherein the priority value is represented by u, and the data element may include one or more data fields, such as, a data value, a timestamp, a data record, a data array, a data identifier, a data pointer, etc. Since the priority value and the data element are in a fixed association for the purposes related to priority queue 204, the discussion herein shall explicitly refer to u as the priority value with respect to a priority queue operation. It will be understood that the priority value u is associated with a corresponding implicit data element in priority queue operations and in elements stored within priority queue 204. In the exemplary embodiments described and discussed herein, it will also be understood that the magnitude or value of u increases with decreasing priority; in other words, operations with lower values for u have a higher priority. In various embodiments of priority queue 204, other conventions and scales (not shown or further discussed herein) may be implemented for the priority value u.
In operation of priority queue 204, at an initial state, priority queue 204 may be empty and may initially accept one or more insertions 230, before an extraction 232 is output. During a normal operational state, priority queue 204 may accept insertions 230 and may provide extractions 232. Finally, during a terminal state, priority queue 204 may no longer accept insertions 230, but may output extractions 232 until priority queue 204 is again empty.
An insertion 230 may arrive at priority queue 204 in the form of a defined insertion operator (i.e., a function call) that includes (or specifies) the priority value u and the data element. An extraction 232 may be obtained from priority queue 204 using a defined extraction operator (i.e., a function call or a request) that requests priority queue 204 to output the highest priority element from the elements currently stored in priority queue 204. When priority queue 204 outputs the highest priority element currently stored in response to an extraction request, priority queue 204 may be considered as operating correctly.
Also shown in
Turning now to
In PQ implementation 300 of
The underlying priority values u depicted in PQ implementation 300 are represented in Table 1 below, as ordered according to epochs 306 (no specific indication of time axis 302) and divided into extractions 320 and insertions 310.
As will now be described in further detail, the representation of extractions 320 and insertions 310, as shown in Table 1, may be used in a method for validating the operation of priority queue 204 by PQ validator 212 (see
In certain instances, a size of R may be selected for efficiency of processing such that:
R=N/R=sqrt(N) Equation 1.
Furthermore, arrays X, Y, Z, which are used in Algorithm 1 for tracking insertions 310 and validating extractions 320, may be stored in various forms, such as compressed, hashed, encrypted, etc. In one example, an array hashing function may be used to create a homomorphic fingerprint for arrays X, Y, Z. The array hashing function may be a linear function whose output is a linear function of an array input (i.e., array X, Y, or Z), such that an incremental (or decremental) value for the array hashing function can be calculated. One example of such an array hashing function that generates a homomorphic fingerprint is a polynomial hashing function described with respect to the Rabin-Karp string matching algorithm. As a result of the foregoing, it is noted that Algorithm 1 may be performed using a memory space that is proportional to sqrt (N).
In Algorithm 1, line 01 represents operations for receiving insertions 310 and extractions 320, as well as successively buffering each of a set of R epochs. In line 02, arrays X, Y, Z, which are functions of [L, u], along with function f[k], may be initialized to zero for all values. Lines 03-15 represent operations that are repeated for each epoch k. Lines 04-09 represent operations that are repeated for each extraction 320 in a given epoch k which lines 10-14 represent operations that are repeated for each insertion 310 in epoch k. Line 16 represents a validation operation. Subject to the assignment of f[k] in line 08, after each epoch k is processed, f[k] indicates a maximum priority for extractions that occur after the k-th epoch (i.e., no extraction occurring after epoch k has a priority value (u) greater that f[k]).
An example of Algorithm 1 using input values from Table 1 will now be described in detail for a hypothetical set of operations, having five priority values, divided or buffered into four epochs (R=4). In Table 1, the values for u are given by the set {20, 30, 50, 70, 100}; since R=4, the values for L and k are given by the set {1, 2, 3, 4}. In line 01, the values in Table 1 may begin arriving, starting with Epoch 1 (k=1) and continue for successive epochs, which may be buffered and respectively separated into a set of extractions 320 followed by a set of insertions 310 (see also
An initialized state for f[k] may be given by:
f[k]=[0,0,0,0] Equation 2.
In line 03, processing of Epoch 1 (k=1) of Table 1 may begin with the selection of extractions 320 in Epoch 1 in line 04. Since no extractions 320 are present in Epoch 1, which represents an initial state of priority queue 204 (see
L=min[k:f[k]≦u] Equation 3.
In Equation 3, L is assigned the minimum value of k for all f[k]≦u. Since f[k]=[0, 0, 0, 0], as given by Equation 2, L=1 for the first insertion in Epoch 1. In line 12, an element [1, 100] is incremented in array X since f[1]=0 and u=100, so f[1]<u. Accordingly, line 13 does not result in a change in array Y for Epoch 1, since f[1]≠u. Lines 10-14 may be repeated for the second insertion in Epoch 1, u=70, resulting in an increment of the element [1, 70] for array X. At the end of processing for Epoch 1, arrays Y and Z remain at zero, as in Table 3, while array X is given by Table 4. It is further noted that at the end of processing for Epoch 1, f[k] remains unchanged and equals [0, 0, 0, 0], as given by Equation 2.
Processing of Epoch 2 (k=2) may continue in Algorithm 1 by returning to line 03. In line 04, extractions 320 in Epoch 2 begin with a first extraction for which u=70. In line 05, L may be assigned in substantially the same manner as in line 11, that is, according to Equation 3, which yields L=1. In line 06, element [1, 70] in array Y is incremented. In line 07, element [1, 70] in array Z is incremented. In line 08, i is given by the set {1} and f[k] is updated to f[k]=[70, 0, 0, 0]. Processing of Epoch 2 may continue by returning to line 04 for the second extraction for which u=100. Then, in line 06, element [1, 100] in array Y is incremented. In line 07, element [1, 100] in array Z is incremented. In line 08, i is given by the set {1} and f[k] is updated to:
f[k]=[100,0,0,0] Equation 4.
Then, processing of Epoch 2 continues with the selection of insertions 310 in Epoch 2 in line 10. The first insertion corresponds to a value u=20. In line 11, L is assigned the value L=2, as given by Equation 3, since at this point f[k]=[100, 0, 0, 0], as given by Equation 4. Since f[2]<u for u=20, an element [2, 20] is incremented in array X. Accordingly, line 13 does not result in a change in array Y for Epoch 2, since f[2]≠u. Lines 10-14 may be repeated for the second insertion and third insertion in Epoch 2, for u=30 and u=50, resulting in an increment of the elements [2, 30] and [2, 50] for array X. At the end of processing for Epoch 2, arrays Y and Z remain as in Table 4, while array X is given by Table 5. It is further noted that at the end of processing for Epoch 2, f[k]=[100, 0, 0, 0], as given by Equation 4.
Processing of Epoch 3 (k=3) may continue in Algorithm 1 by returning to line 03. In line 04, extractions 320 in Epoch 3 begin with a first extraction for which u=20. In line 05, L may be assigned according to Equation 3, which yields L=2. In line 06, element [2, 20] in array Y is incremented. In line 07, element [2, 20] in array Z is incremented. In line 08, i is given by the set {1, 2} and f[k] is updated to f[k]=[100, 20, 0, 0]. Processing of Epoch 3 may continue by returning to line 04 for the second extraction for which u=30. Then, in line 06, element [2, 30] in array Y is incremented. In line 07, element [2, 30] in array Z is incremented. In line 08, i is given by the set {1, 2} and f[k] is updated to:
f[k]=[100,30,0,0] Equation 5.
Thus, at line 09 for Epoch 3, arrays Y and Z are given by Table 6.
Then, processing of Epoch 3 continues with the selection of insertions 310 for Epoch 3 in line 10. The first insertion corresponds to a value u=20. In line 11, L is assigned the value L=3, as given by Equation 3, since at this point f[k]=[100, 30, 0, 0], as given by Equation 5. Since f[3]<u for u=20, an element [3, 20] is incremented in array X. At this point, line 13 does not result in a change in array Y for u=20, since f[3]≠u. Line 10 in Epoch 3 is then repeated for the second insertion in Epoch 3, which corresponds to a value u=30. In line 11, L is assigned the value L=2, as given by Equation 3, since f[k]=[100, 30, 0, 0]. Since f[2]=u for u=30, no element is incremented in array X in line 12, but rather, element [2, 30] is decremented in array Y, since f[2]=u for u=30. Lines 10-14 may be repeated for the third insertion in Epoch 3, u=20, resulting in a second increment of the element [3, 20] for array X. At the end of processing for Epoch 3, array Y is given by Table 7, array Z remains as in Table 6, while array X is given by Table 8. It is further noted that f[k]=[100, 30, 0, 0], as given by Equation 5.
Processing of Epoch 4 (k=4) may continue in Algorithm 1 by returning to line 03. In line 04, extractions 320 in Epoch 4 begin with a first extraction for which u=20. In line 05, L may be assigned according to Equation 3, which yields L=3. In line 06, element [3, 20] in array Y is incremented. In line 07, element [3, 20] in array Z is incremented. In line 08, i is given by the set {1, 2, 3} and f[k] is updated to f[k]=[100, 30, 20, 0]. Processing of Epoch 4 may continue by returning to line 04 for the second extraction for which u=20. Then, in line 06, element [3, 20] in array Y is incremented. In line 07, element [3, 20] in array Z is incremented. In line 08, f[k] does not change. Processing of Epoch 4 may continue by returning to line 04 for the third extraction for which u=30. In line 05, L may be assigned according to Equation 3, which yields L=2. In line 06, element [2, 30] in array Y is incremented. In line 07, since element [2, 30] in array Z is equal to the corresponding element in array Y, no change occurs. In line 08, i is given by the set {1, 2, 3} and f[k] is updated to f[k]=[100, 30, 30, 0]. Then, processing of Epoch 4 may continue by returning to line 04 for the fourth extraction for which u=50. In line 05, L may be assigned according to Equation 3, which yields L=2. In line 06, element [2, 50] in array Y is incremented. In line 07, element [2, 50] in array Z is incremented. In line 08, i is given by the set {1, 2, 3} and f[k] is updated to
f[k]=[100,50,50,0] Equation 6.
Equation 6 represents a final value for f[k] at line 16, while arrays X=Y=Z, corresponding to the values in Table 8, at the end of Algorithm 1. The processing of priority queue 204 (see
Turning now to
In the depicted embodiment, method 400 includes configuring (operation 402) a verifier for independent, single-pass processing of N priority queue operations associated with a data service provider. The N priority queue operations may include insertions 230 to priority queue 204 and extractions 232 from priority queue 204 (see
Turning now to
The embodiment of method 500 depicted in
After the initialization blocks, method 500 as shown includes adding (block 508) a priority queue operation in the subset of operations to a set of unresolved insertions I if the operation is, in fact, an insertion. If the operation is an extraction, I is unaffected, but the value of the variable m is updated in block 510 to indicate the highest priority of the unresolved insertions, i.e., the highest priority insertion in I. Block 510 as shown in
The depicted embodiment of method 500 further includes removing (block 512) an insertion operation from the set of unresolved insertions I when the priority of an extraction is equal to the priority associated with the current value of m, i.e., the extraction priority equals the priority of the unresolved insertion with the highest priority. Block 512 as shown further includes updating the variable w to indicate the lowest priority of a resolved extraction. The embodiment of method 500 shown in
Turning now to
Method 600 may begin by setting (operation 602) function f[k] to zero. Operation 602 may result in a value for f[k] corresponding to Equation 2. Then, in method 600, the following operations may be performed (operation 604) for each extraction in epoch k having a priority value u:
Referring now to
Device 700, as depicted in
Device 700 is shown in
Memory media 710 encompasses persistent and volatile media, fixed and removable media, and magnetic and semiconductor media. Memory media 710 is operable to store instructions, data, or both. Memory media 710 as shown includes sets or sequences of instructions 724-2, namely, an operating system 712 and PQ validation 714. Operating system 712 may be a UNIX or UNIX-like operating system, a Windows® family operating system, or another suitable operating system. Instructions 724 may also reside, completely or at least partially, within processor 701 during execution thereof. It is further noted that processor 701 may be configured to receive instructions 724-1 from instructions 724-2 via shared bus 702. In some embodiments, memory media 710 is configured to store and provide executable instructions for executing a proof protocol as mentioned previously. For example, PQ validation 714 may be configured to execute PQ implementation 300, method 400, method 500 and/or method 600. In certain embodiments, computing device 700 may represent an implementation of verifier 112 and/or PQ validator 212 (see
To the maximum extent allowed by law, the scope of the present disclosure is to be determined by the broadest permissible interpretation of the following claims and their equivalents, and shall not be restricted or limited to the specific embodiments described in the foregoing detailed description.
Number | Name | Date | Kind |
---|---|---|---|
4423480 | Bauer et al. | Dec 1983 | A |
4488218 | Grimes | Dec 1984 | A |
4583219 | Riddle | Apr 1986 | A |
4965716 | Sweeney | Oct 1990 | A |
5521916 | Choudhury et al. | May 1996 | A |
5630123 | Hogge | May 1997 | A |
5634006 | Baugher et al. | May 1997 | A |
5784647 | Sugimoto | Jul 1998 | A |
5850538 | Steinman | Dec 1998 | A |
5872938 | Williams | Feb 1999 | A |
5963978 | Feiste | Oct 1999 | A |
6003101 | Williams | Dec 1999 | A |
6055533 | Hogge | Apr 2000 | A |
6434230 | Gabriel | Aug 2002 | B1 |
6510531 | Gibbons | Jan 2003 | B1 |
6570883 | Wong | May 2003 | B1 |
6633835 | Moran et al. | Oct 2003 | B1 |
6728792 | Wagner | Apr 2004 | B2 |
6771653 | Le Pennec et al. | Aug 2004 | B1 |
6872325 | Bandyopadhyay et al. | Mar 2005 | B2 |
6915360 | Karlsson et al. | Jul 2005 | B2 |
6934294 | Bertagna | Aug 2005 | B2 |
6981260 | Brenner et al. | Dec 2005 | B2 |
7293051 | Printezis et al. | Nov 2007 | B1 |
7310670 | Walbeck et al. | Dec 2007 | B1 |
7450032 | Cormode et al. | Nov 2008 | B1 |
7451258 | Hiratzka et al. | Nov 2008 | B1 |
7558775 | Panigrahy et al. | Jul 2009 | B1 |
7567187 | Ramaiah et al. | Jul 2009 | B2 |
7584396 | Cormode et al. | Sep 2009 | B1 |
7590657 | Cormode et al. | Sep 2009 | B1 |
7657503 | Cormode et al. | Feb 2010 | B1 |
7694040 | Yeh | Apr 2010 | B2 |
7710871 | Lavian et al. | May 2010 | B2 |
7734658 | Parkinson et al. | Jun 2010 | B2 |
7742424 | Cormode et al. | Jun 2010 | B2 |
7756805 | Cormode et al. | Jul 2010 | B2 |
7783647 | Cormode et al. | Aug 2010 | B2 |
7827435 | Sahoo et al. | Nov 2010 | B2 |
20020021701 | Lavian et al. | Feb 2002 | A1 |
20020083063 | Egolf | Jun 2002 | A1 |
20020087757 | Wagner | Jul 2002 | A1 |
20030182464 | Hamilton et al. | Sep 2003 | A1 |
20030195920 | Brenner et al. | Oct 2003 | A1 |
20030208552 | Karlsson et al. | Nov 2003 | A1 |
20040045635 | Bandyopadhyay et al. | Mar 2004 | A1 |
20040076161 | Lavian et al. | Apr 2004 | A1 |
20040151197 | Hui | Aug 2004 | A1 |
20040158464 | Baker | Aug 2004 | A1 |
20040179535 | Bertagna | Sep 2004 | A1 |
20050131946 | Korn et al. | Jun 2005 | A1 |
20050132153 | Yeh | Jun 2005 | A1 |
20060184939 | Sahoo et al. | Aug 2006 | A1 |
20060224609 | Cormode et al. | Oct 2006 | A1 |
20070136285 | Cormode et al. | Jun 2007 | A1 |
20070219816 | Van Luchene et al. | Sep 2007 | A1 |
20070237410 | Cormode et al. | Oct 2007 | A1 |
20070240061 | Cormode et al. | Oct 2007 | A1 |
20070286071 | Cormode et al. | Dec 2007 | A1 |
20080042880 | Ramaiah et al. | Feb 2008 | A1 |
20080071811 | Parkinson et al. | Mar 2008 | A1 |
20080075003 | Lee et al. | Mar 2008 | A1 |
20080098145 | Yeh | Apr 2008 | A1 |
20080276241 | Bajpai et al. | Nov 2008 | A1 |
20090083418 | Krishnamurthy et al. | Mar 2009 | A1 |
20090132561 | Cormode et al. | May 2009 | A1 |
20090153379 | Cormode et al. | Jun 2009 | A1 |
20090172058 | Cormode et al. | Jul 2009 | A1 |
20090172059 | Cormode et al. | Jul 2009 | A1 |
20090292726 | Cormode et al. | Nov 2009 | A1 |
20100114989 | Cormode et al. | May 2010 | A1 |
20100132036 | Hadjieleftheriou et al. | May 2010 | A1 |
20100152129 | Giridharan et al. | Jun 2010 | A1 |
20100153064 | Cormode et al. | Jun 2010 | A1 |
20100153328 | Cormode et al. | Jun 2010 | A1 |
20100153379 | Cormode et al. | Jun 2010 | A1 |
20100235362 | Cormode et al. | Sep 2010 | A1 |
20100268719 | Cormode et al. | Oct 2010 | A1 |
20100312872 | Cormode et al. | Dec 2010 | A1 |
Entry |
---|
Chakrabarti, A., Cormode, G., Kondapally, R., McGregor, A., Information Cost Tradeoffs for Augmented Index and Streaming Language Recognition, 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS 2010), Oct. 23-26, 2010, Las Vegas, Nevada. |
Number | Date | Country | |
---|---|---|---|
20120159500 A1 | Jun 2012 | US |