Multi-Query Optimization of Window-Based Stream Queries

BRIEF DESCRIPTION OF DRAWINGS

These and other advantages of the invention will be apparent to those of ordinary skill in the art by reference to the following detailed description and the accompanying drawings.

FIG. 1 is a block diagram of Query plans Q₁and Q₂to illustrate pior art sharing of continuous queries;

FIG. 2 is a block diagram of a known selection pull-up technique for sharing continuous query plans containing joins and selections applied to the sharing of joins with different window sizes;

FIG. 3 is a block diagram of known selection pull-up technique for sharing continuous query plans containing joins and selections applied to the sharing of joins with different window sizes;

FIG. 4 is a block diagram of a sliced one-way window join in accordance with the principles of the invention;

FIG. 5 is a chart of the execution steps to be followed for the sliced window join in accordance with the diagram of FIG. 4;

FIG. 6 is a block diagram of a chain of 1-way sliced window joins in accordance with the principles of the invention;

FIG. 7 is a block diagram of a chain of binary sliced window joins in accordance with the principles of the invention;

FIG. 8 is a chart of the execution steps to be followed for the binary sliced window join in accordance with the diagram of FIG. 7;

FIG. 9 is a block diagram of state-slice sharing in accordance with the principles of the invention;

FIG. 10 is a block diagram of memory-optimal state-slice sharing in accordance with the principles of the invention;

FIG. 11 is a block diagram depicting the merging of two sliced joins;

FIG. 12 is a diagram representing state-slice sharing in accordance with the principles of the invention; and

FIG. 13 is a block diagram of selection push-down for memory optimal state slice sharing in accordance with the principles of the invention.

DETAILED DESCRIPTION

To efficiently share computations of window-based join operators, the invention is a new method for sharing join queries with different window constraints and filters. The two key ideas of the invention are: state-slicing and pipelining. The window states of the shared join operator are sliced into fine-grained pieces based on the window constraints of individual queries. Multiple sliced window join operators, with each joining a distinct pair of sliced window states, can be formed. Selections now can be pushed down below any of the sliced window joins to avoid unnecessary computation and memory usage shown above. However, N²joins appear to be needed to provide a complete answer if each of the window states were to be sliced into N pieces. The number of distinct join operators needed would then be too large for a data stream management system DSMS to hold for a large N. We This hurdle is overcome by elegantly pipelining the slices. This enables building a chain of only N sliced window joins to compute the complete join result. This also enables to selectively share a subsequence of such a chain of sliced window join operators among queries with different window constraints.

Based on the inventive state-slice sharing, two algorithms are proposed for the chain buildup, one that minimizes the memory consumption and the other that minimizes the CPU usage. The algorithms are guaranteed to always find the optimal chain with respect to either memory or CPU cost, for a given query workload. Experimental results show that the invention provides the best performance over a diverse range of workload settings among alternate solutions in the literature.

State-Sliced One-Way Window Join

For purposes of the ensuing description, the following equivalent join operator notations are used: is equivalent to |x, is equivalent to x|, is

equivalent to

$ \overset{s}{\times},$

is equivalent to

$\overset{s}{\times} ,$

is equivalent to

$\overset{s}{\times}$

, and is equivalent to x.

A one-way sliding window join of streams A and B is denoted as A[W]|_xB

$(or B \overset{s}{\times}  A [W]),$

where stream A has a sliding window of size W. The output of the join consists of all pairs of tuples a ε A, b ε B, such that T_b−T_a<W, and tuple pair (a,b) satisfies the join condition.

- Definition 1. A sliced one-way window join on streams A and B is denoted as

$A [W^{start}, W^{end}]  \overset{s}{\times} B (or B \overset{s}{\times}  A [W^{start}, W^{end}]),$

where stream A has a sliding window of range: W^end−W^start. The start and end window are W^startand W^endrespectively. The output of the join consists of all pairs of tuples a ε A, b ε B, such that W^start≦T_b−T_a<W^end, and (a,b) satisfies the join condition.

We can consider the sliced one-way sliding window join as a generalized form of the regular one-way window join. That is

$A [W]  \overset{s}{\times} B = A [0, W]  \overset{s}{\times} B .$

The diagram 40FIG. 4 shows an example of a sliced one-way window join in accordance with the invention. This join has one output queue for the joined results, two output queues (optional) for purged A tuples and propagated B tuples. These purged tuples will be used by another sliced window join as input streams, which will be explained further below. The execution steps to be followed for the sliced window join

$A [W^{start}, W^{end}]  \overset{s}{\times} B$

are shown by the diagram 50 in FIG. 5.

The semantics of the state-sliced window join require the checking of both the upper and lower bounds of the time-stamps in every tuple probing step. In FIG. 5, the newly arriving tuple b will first purge the state of stream A with W^end, before probing is attempted. Then the probing can be conducted without checking of the upper bound of the window constraint W^end. The checking of the lower bound of the window W^endcan also be omitted in the probing since we use the sliced window join operators in a pipelining chain manner, as discussed below.

- Definition 2. A chain of sliced one-way window joins is a sequence of pipelined N sliced one-way window joins, denoted as

$A [0, W_{1}]  \overset{s}{\times} B, A [W_{1}, W_{2}]  \overset{s}{\times} B, \dots, A [W_{N - 1}, W_{N}]  \overset{s}{\times} B .$

The start window of the first join in a chain is 0. For any adjacent two joins, J_iand J_i+1, the start window of J_i+1equals the end window of prior J_i(0≦i<N) in the chain. J_iand J_i+1are connected by both the Purged-A-Tuple output queue of J_ias the input A stream of J_i+1, and the Propagated-B-Tuple output queue of J_ias the input B stream of J_i+1.

The diagram 60 of FIG. 6 shows a chain of state-sliced window joins having two one-way joins J₁and J₂. We assume the input stream tuples to J₂, no matter from stream A or from stream B, are processed strictly in the order of their global time-stamps. Thus we use one logical queue between J₁and J₂. This does not prevent us from using physical queues for individual input streams.

Table 2 below depicts an example execution of this chain. We assume that one single tuple (an a or a b ) will only arrive at the start of each second, w₁=2 sec, w₂=4 sec and every a tuple will match every b tuple (Cartesian Product semantics). During every second, an operator will be selected to run. Each running of the operator will process one input tuple. The content of the states in J₁and J₂, and the content in the queue between J₁and J₂after each running of the operator are shown in Table 2.

TABLE 2

Execution of the Chain: J₁, J₂.

T
AIT
OP
A × [0, 2]
Queue
A × [2, 4]
Output

1
a₁
J₁
[a₁]
[ ]
[ ]

2
a₂
J₁
[a₂, a₁]
[ ]
[ ]

3
a₃
J₁
[a₃, a₂, a₁]
[ ]
[ ]

4
b₁
J₁
[a₃, a₂]
[b₁, a₁]
[ ]
(a₂, b₁),

(a₃, b₁)

5
b₂
J₁
[a₃]
[b₂, a₂, b₁, a₁]
[ ]
(a₃, b₂)

6

J₂
[a₃]
[b₂, a₂, b₁]
[a₁]

7

J₂
[a₃]
[b₂, a₂]
[a₁]
(a₁, b₁)

8
a₄
J₁
[a₄, a₃]
[b₂, a₂]
[a₁]

9

J₂
[a₄]
[a₃, b₂]
[a₂, a₁]

10

J₂
[a₄]
[a₃]
[a₂, a₁]
(a₁, b₂),

(a₂, b₂)

Execution in Table 2 follows the steps in FIG. 5. For example at the 4th second, first a₁will be purged out of J₁and inserted into the queue by the arriving b₁, since T_b₁−T_a₁≧2 sec. Then b₁will purge the state of J₁and output the joined result. Lastly, b₁is inserted into the queue.

We observe that the union of the join results of J₁:

$A [0, w 1]  \overset{s}{\times} B and J_{2} : A [w 1, w 2]  \overset{s}{\times} B$

is equivalent to the results of a regular sliding window join:

$A [w 2]  \overset{s}{\times} B .$

The order among the joined results is restored by the merge union operator. To prove that the chain of sliced joins provides the complete join answer, we first introduce the following lemma.

- Lemma 1. For any sliced one-way sliding window join

$A [W_{i - 1}, W_{i}]  \overset{s}{\times} B$

in a chain, at the time that one b tuple finishes the cross-purge step, but not yet begins the probe step, we have: (1) ∀a ε A::[W_i−1,W_i]W_i−1≦T_b−Ta<W_i; and (2) ∀a tuple in the input steam A, W_i−1≦T_b−Ta<W_ia ε A::[W_i−1,W_i]. Here A::[W_i−1,W_i] denotes the state of stream A.

Proof: (1). In the cross-purge step (FIG. 6), the arriving b will purge any tuple a with T_b−T_a≧W_i. Thus ∀a_iε A::[W_i−1,W_i], T_b−Ta_i<W_i. For the first sliced window join in the chain, W_i−1=0. We have 0≦T_b−Ta. For other joins in the chain, there must exist a tuple a_mε A::[W_i−1,W_i] that has the maximum timestamp among all the a tuples in A::[W_i−1,W_i]. Tuple a_mmust have been purged by b′ of stream B from the state of the previous join operator in the chain. If b′=b, then we have T_b−T_a_m≧W_i−1, since W_i−1is the upper window bound of the previous join operator. If b′≠b, then T_b′−T_a_m>W_i−1, since T_b>T_b′. We still have T_b−T_a_m>W_i−1. Since T_a_m≧T_a_k, for ∀a_kε A::[W_i−1,W_i], we have W_i−1≦T_b−Ta_k, for ∀a_kε A::[W_i−1,W_i]).

(2We use a proof by contradiction. If a≠A::[W_i−1,W_i], then first we assume a ε A::[W_j−,W_j],j<i. Given W_i−1≦T_b−T_a, we know W_j≦T_b−T_a. Then a cannot be inside the state A::[W_j−1,W_j]since a would have been purged by b when it is processed by the join operator

$A [W_{j - 1}, W_{j}]  \overset{s}{\times} B .$

We got a contradiction. Similarly a cannot be inside any state A::[W_k−1,W_k], k>i. pt]0pt1.3expt]1.3ex0pt

- Theorem 1. The union of the join results of all the sliced one-way window joins in a chain

$A [0, W_{1}]  \overset{s}{\times} B, \dots, A [W_{N - 1}, W_{N}]  \overset{s}{\times} B$

is equivalent to the results of a regular one-way sliding window join A[W_N]|×B.

Proof: Lemma 1(1) shows that the sliced joins in a chain will not generate a result tuple (a,b) with T_a−T_b>W. That is, ∀(a,b) ε Å_1≦i≦NA[W_i−1,W_i]|^s×B(a,b) ε A[W]|×B. We need to show:

$\forall (a, b) \in A [W]  \times B \Rightarrow \exists i, s . t . (a, b) \in A [W_{i - 1}, W_{i}]  \overset{s}{\times} B .$

Without loss of generality, ∀(a,b) ε A[W]|×B, there exists unique i, such that W_i−1≦T_b−T_a<W_i, since W₀≦T_b−T_a<W_N. We want to show that

$(a, b) \in A [W_{i - 1}, W_{i}]  \overset{s}{\times} B .$

The execution steps in FIG. 5 guarantee that the tuple b will be processed by

$A [W_{i - 1}, W_{i}]  \overset{s}{\times} B$

at a certain time. Lemma 1(2) shows that tuple a would be inside the state of A[W_i−1,W_i] at that same time. Then

$(a, b) \in A [W_{i - 1}, W_{i}]  \overset{s}{\times} B .$

Since i is unique, there is no duplicated probing between tuples a and b .

From Lemma 1, we see that the state of the regular one-way sliding window join A[W]|×B is distributed among different sliced one-way joins in a chain. These sliced states are disjoint with each other in the chain, since the tuples in the state are purged from the state of the previous join. This property is independent from operator scheduling, be it synchronous or even asynchronous.

State-Sliced Binary Window Join

Similar to Definition 1, we can define the binary sliding window join. The definition of the chain of sliced binary joins is similar to Definition 2 and is thus omitted for space reasons. The diagram 70 of FIG. 7 shows an example of a chain of state-sliced binary window joins.

- Definition 3. A sliced binary window join of streams A and B is denoted as

$A [W_{A}^{start}, W_{A}^{end}] \overset{s}{\times} B [W_{B}^{start}, W_{B}^{end}],$

where stream A has a sliding window of range: W_A^end−W_A^startand stream B has a window of range W_B^end−W_B^start. The join result consists of all pairs of tuples a ε A, b ε B, such that either W_A^start≦T_b−T_a<W_A^endor W_B^start≦T_a−T_b<W_B^end, and (a,b) satisfies the join condition.

The execution steps for sliced binary window joins can be viewed as a combination of two one-way sliced window joins. Each input tuple from stream A or B will be captured as two reference copies, before the tuple is processed by the first binary sliced window join¹. The copies can be made by the first binary sliced join. One reference is annotated as the male tuple (denoted as a^m) and the other as the female tuple (denoted as a^f). The execution steps to be followed for the processing of a stream A tuple by

$A [W^{start}, W^{end}] \overset{s}{\times} B [W^{start}, W^{end}]$

are shown by the diagram 80 of FIG. 8. The execution procedure for the tuples arriving from stream B can be similarly defined. ¹The copies can be made by the first binary sliced join.

Intuitively the male tuples of stream B and female tuples of stream A are used to generate join tuples equivalent to a one-way join:

$A [W^{start}, W^{end}]  \overset{s}{\times} B .$

The male tuples of stream A and female tuples of stream B are used to generate join tuples equivalent to the other one-way join:

$A \overset{s}{\times}  B [W^{start}, W^{end}] .$

Note that using two copies of a tuple will not require doubled system resources since: (1) the combined workload (in FIG. 8) to process a pair of female and male tuples equals the processing of one tuple in a regular join operator, since one tuple takes care of purging/probing and the other filling up the states; (2) the state of the binary sliced window join will only hold the female tuple; and (3) assuming a simplified queue (M/M/1), doubled arrival rate (from the two copies) and doubled service rate (from above (1)) still would not change the average queue size, if the system is stable. In our implementation, we use a copy-of-reference instead of a copy-of-object, aiming to reduce the potential extra queue memory during bursts of arrivals. Discussion of scheduling strategies and their effects on queues is beyond the scope of this paper.

- Theorem 2. The union of the join results of the sliced binary window joins in a chain

$A [0, W_{1}] \overset{s}{\times} B [0, W_{1}], \dots, A [W_{N - 1}, W_{N}] \overset{s}{\times} B [W_{N - 1}, W_{N}]$

is equivalent to the results of a regular sliding window join A[W_N]×B[W_N].

Using Theorem 1, we can prove Theorem 2. Since we can treat a binary sliced window join as two parallel one-way sliced window joins, the proof is fairly straightforward.

We now show how the proposed state-slice sharing can be applied to the running example introduced above to share the computation between the two queries. The shared plan is depicted by the diagram 90 of FIG. 9. This shared query plan includes a chain of two sliced sliding window join operators and The purged tuples from the states of are sent to as input tuples. The selection operator σ₄filters the input stream A tuples for The selection operator σ_Afilters the joined results of for Q₂. The predicates in σ_Aand σ_Aare both A.value>Threshold.

Compared to alternative sharing approaches discussed in the background of the invention section, the inventive state-slice sharing method offers significant advantages. Selection can be pushed down into the middle of the join chain. Thus unnecessary probings in the join operators are avoided. The routing cost is saved. Instead a pre-determined route is embedded in the query plan. States of the sliced window joins in a chain are disjoint with each other. Thus no state memory is wasted.

Using the same settings as previously, we now calculate the state memory consumption C_mand the CPU cost C_pfor the state-slice sharing paradigm as follows:

$\begin{matrix} {\begin{matrix} C_{m} = & 2 λ W_{1} M_{t} + (1 - S_{σ}) λ (W_{2} - W_{1}) M_{t} \\ C_{p} = & 2 λ^{2} W_{1} + λ + 2 λ^{2} S_{σ} (W_{2} - W_{1}) + \\ 4 λ + 2 λ + 2 λ^{2} S_{W 1} \end{matrix} & (3) \end{matrix}$

The first item of C_mcorresponds to the state memory in; the second to the state memory in . The first item of C_pis the join probing cost of; the second the filter cost of σ_A; the third the join probing cost of ; the fourth the cross-purge cost; while the fifth the union cost; the sixth the filter cost of σ_A. The union cost in C_pis proportional to the input rates of streams A and B. The reason is that the male tuple of the last sliced join acts as punctuation for the union operator. For example, the male tuple a₁^fis sent to the union operator after it finishes probing the state of stream B in , indicating that no more joined tuples with timestamps smaller than a₁^fwill be generated in the future. Such punctuations are used by the union operator for the sorting of joined tuples from multiple join operators.

Comparing the memory and CPU costs for the different sharing solutions, namely naive sharing with selection pull-up (Eq. 1), stream partition with selection push-down (Eq. 2) and state-slice chain (Eq. 3), the savings of using the state slicing sharing are:

$\begin{matrix} {\begin{matrix} \frac{\frac{C_{m}^{(1)} + C_{m}^{(3)}}{C_{m}^{(1)}} = (1 - ρ) (1 - S_{σ})}{2} \\ \frac{C_{m}^{(2)} - C_{m}^{(3)}}{C_{m}^{(2)}} = ρ 1 + 2 ρ + (1 - ρ) S_{σ} \\ \frac{C_{p}^{(1)} - C_{p}^{(3)}}{C_{p}^{(1)}} = (1 - ρ) (1 - S_{σ}) + (2 - ρ) S 1 + 2 S \\ \frac{C_{p}^{(2)} - C_{p}^{(3)}}{C_{p}^{(2)}} = S_{σ} S ρ (1 - S_{σ}) + S_{σ} + S_{σ} S + ρ S \end{matrix} & (4) \end{matrix}$

with C_m⁽ⁱ⁾denoting C_m, C_p⁽¹⁾denoting C_pin Equation i (i=1,2,3); and window ratio

$ρ = \frac{W_{1}}{W_{2}}, 0 < ρ < 1.$

Compared to sharing alternatives discussed in the background section above, the inventive state-slice sharing achieves significant savings. As a base case, when there is no selection in the query plans (i.e., S₉₄=1), state-slice sharing will consume the same amount of memory as the selection pullup while the CPU saving is proportional to the join selectivity S. When selection exists, state-slice sharing can save about 20%-30% memory, 10%-40% CPU over the alternatives on average. For the extreme settings, the memory savings can reach about 50% and the CPU savings about 100%. The actual savings are sensitive to these parameters. Moreover, from Eq. 4 we can see that all the savings are positive. This means that the state-sliced sharing paradigm achieves the lowest memory and CPU costs under all these settings. Note that we omit λ in Eq. 4 for CPU cost comparison, since its effect is small when the number of queries is only 2. The CPU savings will increase with increasing λ, especially when the number of queries is large.

Turning now to the consideration of how to build an optimal shared query plan with a chain of sliced window joins. Consider a data stream management system DSMS with N registered continuous queries, where each query performs a sliding window join A[w_i]B[w_i] (1≦i≦N) over data streams A and B. The shared query plan is a DAG with multiple roots, one for each of the queries.

Given a set of continuous queries, the queries are first sorted by their window lengths in ascending order. Two processes are proosed for building the state-slicing chain in that order memory-optimal state-slicing and CPU-optimal state-slicing. The choice between them depends on the availability of the CPU and memory in the system. The chain can also first be built using one of the algorithms and migrated towards the other by merging or splitting the slices at runtime.

Memory-Optimal State-Slicing

Without loss of generality, we assume that w_i<w_i+1(1≦i<N). Let's consider a chain of the N sliced joins: J₁, J₂, . . . , J_N, with J_ias

$A [w_{i - 1}, w_{i}] \begin{matrix} \overset{s}{} \end{matrix} B [w_{i - 1}, w_{i}]$

(1≦i≦N, w₀=0). A union operator U_iis added to collect joined results from J₁, . . . , J_ifor query Q_i(1<i≦N), as shown by diagram 100 of FIG. 10. We call this chain the memory-optimal state-slice sharing (Mem-Opt). The correctness of Mem-Opt state-slice sharing is proven in Theorem 3 by using Theorem 2. We have the following equivalence for i (1≦i≦N):

$Q_{i} : A [w_{i}] B [w_{i}] = ⋃_{1 \leq j \leq i} A [W_{j - 1}, W_{j}] \begin{matrix} \overset{s}{} \end{matrix} B [W_{j - 1}, W_{j}]$

- Theorem 3. The total state memory used by a Mem-Opt chain of sliced joins J₁, J₂, . . . , J_N, with J_ias

$\begin{matrix} A [w_{i - 1}, w_{i}] \begin{matrix} \overset{s}{} \end{matrix} B [w_{i - 1}, w_{i}] & (1 \leq i \leq N, w_{0} = 0) \end{matrix}$

is equal to the state memory used by the regular sliding window join: A[w_N]B[w_N].

Proof: From Lemma 1, the maximum timestamp difference of tuples (e.g., A tuples) in the state of J_iis (w_i−w_i−1), when continuous tuples from the other stream (e.g., B tuples) are processed. Assume the arrival rate of streams A and B is denoted by λ_Aand λ_Brespectively. Then we have:

$\begin{matrix} 1 \leq i \leq N \sum {Mem}_{J_{i}} = (λ_{A} + λ_{B}) [(w_{1} - w_{0}) + (w_{2} - w_{1}) + \dots + (w_{N} - w_{N - 1})] \\ = (λ_{A} + λ_{B}) w_{N} \end{matrix}$

Where (λ_A+λ_B)w_Nis the minimal amount of state memory that is required to generate the full joined result for Q_N. Thus the Mem-Opt chain consumes the minimal state memory.

Let's again use the count of comparisons per time unit as the metric for estimated CPU costs. Comparing the execution (FIG. 8) of a sliced window join with the execution (table 1) of a regular window join, we notice that the probing cost of the chain of sliced joins: J₁, J₂, . . . , J_Nis equivalent to the probing cost of the regular window join: A[w_N]B[w_N].

Comparing to the alternative sharing methods noted above in the Background of the Invention, we notice that the Memory-Optimal chain may not always win since it requires CPU cost for: (1) (N−1) more times of purging for each tuple in the streams A and B; (2) extra system overhead for running more operators; and (3) CPU cost for (N−1) union operators. In the case that the selectivity of the join S is rather small, the routing cost in the selection pull-up sharing may be less than the extra cost of the Mem-Opt chain. In short, the Mem-Opt chain may not be the CPU-optimal solution for all settings.

CPU-Optimal State-Slicing

We hence now discuss how to find the CPU-Optimal state-slice sharing (CPU-Opt) which will yield minimal CPU costs. We notice that the Mem-Opt state-slice sharing may result in a large number of sliced joins with very small window ranges each. In such cases, the extra per tuple purge cost and the system overhead for holding more operators may not be capable of being neglected.

In FIG. 11(b), diagram 110, the state-sliced joins from J_ito J_jare merged into a larger sliced join with the window range being the summation of the window ranges of J_iand J_j. A routing operator then is added to split the joined results to the associated queries. Such merging of concatenated sliced joins can be done iteratively until all the sliced joins are merged together. In the extreme case, the totally merged join results in a shared query plan, which is equal to that formed by using the selection pull-up sharing method shown in Section 3. The CPU cost may decrease after the merging.

Both the shared query plans in FIG. 11 have the same join probing costs and union costs. Using the symbols defined in Section 3 and C_sysdenoting the system overhead factor, we can calculate the difference of partial CPU cost C_p^(a)in FIG. 5.2 and C_p^(b)in FIG. 11(b) as:

$C_{p}^{(a)} - C_{p}^{(b)} = (λ_{A} + λ_{B}) (j - i) - 2 λ_{A} λ_{B} (w_{j} - w_{i - 1}) σ (j - i) + C_{sys} (j - i + 1) (λ_{A} + λ_{B})$

The difference of CPU costs in these scenarios comes from the purge cost (the first item), the routing cost (the second item) and the system overhead (the third item). The system overhead mainly includes the cost for moving tuples in/out of the queues and the context change cost of operator scheduling. The system overhead is proportional to the data input rates and number of operators.

Considering a chain of N sliced joins, all possible merging of sliced joins can be represented by edges in a directed graph G={V,E}, where V is a set of N+1 nodes and E is a set of

$\frac{N (N + 1)}{2}$

edges. Let ∀v_iε V(0≦i≦N) represent the window w_iof Q_i(w₀=0). Let the edge e_i,jfrom node v_ito node v_j(i<j) represent a sliced join with start-window as w_iand end-window as w_j. Then each path from the node v₀to node v_Nrepresents a variation of the merged state-slice sharing, as shown by the diagram 120 in FIG. 12.

Similar to the above calculation of C_p^(a)and C_p^(b), we can calculate the CPU cost of the merged sliced window joins represented by every edge. We denote the CPU cost e_i,jof the sliced join as the length of the edge l_i,j. We have the following lemma.

- Lemma 2. The calculations of CPU costs l_i,jand l_m,nare independent if 0≦i<j≦m<n≦N.

Based on Lemma 2, we can apply the principle of optimality here and transform the optimal state-slice problem to the problem of finding the shortest path from v₀to v_Nin an acyclic directed graph. Using the well-known Dijkstra 's algorithm, we can find the CPU-Opt query plan in O(N²), with N being the number of the distinct window constraints in the system. Even when we incorporate the calculation of the CPU cost of the

$\frac{N (N + 1)}{2}$

edges, the total time for getting the CPU optimal state-sliced sharing is still O(N²).

In case the queries do not have selections, the CPU-Opt chain will consume the same amount of memory as the Mem-Opt chain. With selections, the CPU-Opt chain may consume more memory.

Online Migration of the State-Slicing Chain

Online migration of the shared query plan is important for efficient processing of stream queries. The state-slicing chain may need maintenance when: (1) queries enter or leave the system, (2) queries update predicates or window constraints, and (3) runtime statistic collection invokes plan adaptation.

The chain migration is achieved by two primitive operation: merging and splitting of the sliced join. For example when query Q_i(i<N) leaves the system, the corresponding sliced join

$A [w_{i - 1}, w_{i}] \begin{matrix} \overset{s}{} \end{matrix} B [w_{i - 1}, w_{i}]$

could be merged with the next sliced join in the chain. Or if the corresponding sliced join had been merged with others in the CPU-Opt chain, splitting of the merged join may be invoked first.

Online splitting of the sliced join J_ican be achieved by: (1) stopping the system execution for J_i; (2) updating the end window of J_ito w′_i; (3) inserting a new sliced join J′_iwith window [w′_i,w_i] to the right of J_iand connecting the query plan; and (4) resuming the system. The queue between J_iand J′_iis empty right after the insertion. The execution of J_iwill purge tuples, due to its new smaller window, into the queue between J_iand J′_iand eventually fill up the states of J′_icorrectly.

Online merging of two adjacent sliced joins J_iand J_i+1requires the queues between these two joins empty. This can be achieved by scheduling the execution of J_i+1after stopping the scheduling of J_i. Once the queue between J_iand J_i+1is empty, we can simply (1) concatenate the corresponding states of J_iand J_i+1; (2) update the end window of J_ito w_i+1; (3) remove J_i+1from the chain; and (4) resume the system.

The overhead for chain migration corresponds to constant system cost for operator insertion/deletion. The system suspending time during join splitting is neglectable, while during join merging it is bounded by the execution time needed to empty the queue in-between. No extra processing costs arise in either case.

Push Selections into Chain

When the N continuous queries each have selections on the input streams, we aim to push the selections down into the chain of sliced joins. For clarity of discussion, we focus on the selection push-down for predicates on one input stream. Predicates on multiple streams can be pushed down similarly. We denote the selection predicate on the input stream A of query Q_ias σ_iand the condition of σ_ias cond_i.

Mem-Opt Chain with Selection Push-Down

The selections can be pushed down into the chain of sliced joins as shown by the diagram 130 in FIG. 13. The predicate of the selection σ′_icorresponds to the disjunction of the selection predicates from σ_ito σ_N. That is:

cond′_i=cond_iv cond_i+1v . . . v cond_N

Logically each tuple may be evaluated against the same selection predicate for multiple times. In the actual execution, we can evaluate the predicates (cond₁, 1≦i≦N) in the decreasing order of i for each tuple. As soon as a predicate (e.g. cond_k) is satisfied, stop further evaluating and attach k to the tuple. Thus this tuple can survive until the k th slice join and no further. Similar to Theorem 3, we have the following theorem.

- Theorem 4. The Mem-Opt state-slice sharing with selection push-down consumes the minimal state memory for a given workload.

Intuitively the total state memory consumption is minimal since that: (1) each join probe performed by in FIG. 13 is required at least by one of the queries: Q_i, Q_i+1, . . . , Q_N; (2) any input tuple that won't contribute to the joined results will be filtered out immediately; and (3) the contents in the state memory of all sliced joins are pairwise disjoint with each other.

CPU-Opt Chain with Selection Push-Down

The merging of adjacent sliced joins with selection push-down can be achieved following the scheme shown in FIG. 11. Merging sliced joins having selection between them will cost extra state memory usage due to selection pull-up. The tuples, which would be filtered out by the selection before, will now stay unnecessarily long in the state memory. Also, the consequent join probing cost will increase accordingly. Continuous merging of the sliced joins will result in the selection pull-up sharing approach discussed in the background.

Similarly to the CPU optimization discussed above with respect to the CPU-optimal state-slicing, the Dijkstra's algorithm can be used to find the CPU-Opt sharing plan with minimized CPU cost in O(N²) Such CPU-Opt sharing plan may not be Memory-Optimal.

In summary, window-based joins are stateful operators that dominate the memory and CPU consumptions in a data stream management system DSMS. Efficient sharing of window-based joins is a key technique for achieving scalability of a DSMS with high query workloads. The invention is a new method for efficiently sharing of window-based continuous queries in a DSMS. By slicing a sliding window join into a chain of pipelining sliced joins, the inventive method results in a shared query plan supporting the selection push-down, without using an explosive number of operators. Based on the state-slice sharing, two algorithms are proposed for the chain buildup, which achieve either optimal memory consumption or optimal CPU usage.

The present invention has been shown and described in what are considered to be the most practical and preferred embodiments. The inventive state-slice method can be extended to distributed systems, because the properties of the pipelining sliced joins fit nicely in the asynchronous distributed system. Also, when the queries are too many to fit into memory, combining query indexing with state-slicing is a possibility. That departures may be made there from and that obvious modifications will be implemented by those skilled in the art. It will be appreciated that those skilled in the art will be able to devise numerous arrangements and variations which, although not explicitly shown or described herein, embody the principles of the invention and are within their spirit and scope.

Multi-Query Optimization of Window-Based Stream Queries

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

US Classifications

International Classifications

Abstract

Description

Claims

Parent Case Info

Provisional Applications (1)