Method for accelerating range queries using periodic monotonic properties of non-monotonic functions

Information

  • Patent Application
  • 20080059410
  • Publication Number
    20080059410
  • Date Filed
    August 24, 2006
    17 years ago
  • Date Published
    March 06, 2008
    16 years ago
Abstract
A method for accelerating range queries using periodic monotonic properties of non-monotonic functions including mapping a base column x to an existing index on a column y that is correlated with column x through a periodic piecewise monotonic function F(x), rewriting an index construction statement to force the existing index on column y to track a periodic piecewise monotonic property by assigning identical values of F(x) to different periods to different ranges to create an annotated index, and rewriting range queries over the annotated index on F(x) by modifying a derived predicate.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other exemplary purposes, aspects and advantages will be better understood from the following detailed description of an exemplary embodiment of the invention with reference to the drawings, in which:



FIG. 1 is an exemplary schematic diagram of multi-dimensional clustering;



FIG. 2 is a schematic diagram of an exemplary PPM function with a pre-defined number of periods;



FIG. 3 is a schematic diagram of an exemplary PPM function with an unlimited number of periods; and



FIG. 4 is a flowchart of a method 400 for accelerating range queries in accordance with an exemplary embodiment of the present invention.





DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS OF THE INVENTION

Referring now to the drawings, and more particularly to FIGS. 1-4, there are shown exemplary embodiments of the system and method according to the present invention.


For the purpose of this application, a PPM function F(x) is a piecewise periodic monotonic function that has the property that, for some period p, when x is split into consecutive intervals Ii of the same size/period p, F(x) is monotonic with respect to x for each Ii. F(x) can be monotonic increasing for some interval Ii, and monotonic decreasing for another interval Ij. The idea is to force a composite index to assign values into blocks based on both the generated expression and its periodicity.


There are several known functions considered to be guaranteed to be PPM such as:
















PPM FUNCTION
PERIOD SIZE









COS
π/2



COT
π/2



DAYOFWEEK
7



DAYOFYEAR
year



HOUR
24



QUARTER
4



MOD
x



MONTH
year



SIN
π/2



TAN
π/2










For these functions, the size of the period is known. Another way to determine if a derivation function is PPM, is if the user explicitly provides this information. Finally, by tracking the values of a base and derived column, the PPM relationship can be verified, and the period sized can be calculated.


Unlike monotonic derivation functions where only distinct values on the base column have the same value on the derived column, in the case of PPM functions, two non-consecutive values of the base columns can map to the same value on the derived column.


For example:


. . . MONTH (date(‘99/12/08’))=12


MONTH (date(‘00/01/18’))=1


MONTH (date(‘00/02/24’))=2


MONTH (date(‘02/01/07’))=1


For different dates (‘00/01/18’ and ‘02/01/07’) the value on the derived column will be the same MONTH( )=1.


If only an index on the derived column (MONTH) is accessed, then a query whose predicate is based on the base column (date) will use the existing index on MONTH and may retrieve a superset of the correct answer tuples. Then, only by retrieving the tuples and examining them for the date values, the system can filter out the incorrect tuples and return the final answer. For correctness, an index over MONTH could be built that is annotated with the periodicity number. The periodicity number is obtained by dividing the value of the base column of a tuple by the size of the period p that is characteristic to the PPM derivation function.


PPM functions can be classified into the ones that have a pre-defined number of periods and the ones where this number is undefined. If the allowed range for trigonometric functions is between o and 2π, then the following have a pre-defined number of periods:


















COS
p = π/2



COT
p = π/2



SIN
p = π/2



TAN
p = π/2










The period is actually π, but for uniformity (since their periods do not overlap), π/2 is used.


For these functions, the period number is x/(π/2), where x is the value of the base column. In FIG. 2, an example of F(x)=tan(x) is shown. The index should maintain the values of F(x) together with the annotation of the period number x(π/2). The construction of the index will be on F(x)=tan(x), for example: ORGANIZE BY TAN (x)→(ORGANIZE BY TAN (x), x/(π/2))


The following have undefined and/or unlimited number of periods:


















MONTH (date)
p = year



QUARTER (date)
p = year



MOD (x)
p = x



DAYOFYEAR
p = year



HOUR
p = 24



DAYOFWEEK
p = 7










These functions are different than the previously mentioned functions, in that the number of periods is unbounded and depends on the base data. When the periodicity can be calculated by a function such as YEAR (date) (rather than being constant), it is possible to use these functions to enforce the mapping of the domain into monotonic ranges.


In FIG. 3, examples of F(x)=MONTH(date) and QUARTER (date) are shown. In the case MONTH, for example, the period will be one year and the period number will coincide with the YEAR (date). The construction of the index would use the following exemplary annotation:


ORGANIZE BY MONTH (date)→(ORGANIZE BY MONTH (date), YEAR (date))


The following is an overview of the index construction steps and the corresponding queries.


Index construction begins by recognizing PPM functions and their periodicity. This may be accomplished from explicit user input, from recognizing semantically that the function is known to be PPM (such as sin (x), month (date)), or from observation (verify the relationship between the values of the based and derived columns).


An index is built over the PPM function, with the annotation of the period number, and an index is maintained with updates to the base columns, which also generates updates to the derived columns. If the derived column is known to be PPM by observation, at each update the relationship should be verified to hold. If it does not hold, then either that current index is deleted and an index over both columns is created, or the existing index may still be used for the correct PPM intervals and additional data structures or retrieval would be necessary for the non-PPM intervals.


For a query rewrite, the range and point queries in terms of a base column can now be answered in terms of the derived columns, if the derivation function is PPM, and the value on the base column x to F(x) and the period number to x/(period size) needs to be transformed.


For generality, PPM functions can be described as F(f) instead of F(x), where f is a function over base data. Recall that when a PPM function is described as F(x), it is assumed that x is the base data. In order to maintain the properties of PPM, the requirement that f in F(f) is monotonic with respect to the base data must be imposed. For MDC, a subset of the monotonic functions can be checked, but the checking module should be reused. For simplicity, the notation of F(x) will be used herein, however that notation can be replaced with F(f) when f is monotonic, for example, SIN (x+y), where x and y are from base columns.


To enforce the splitting of the function domain such that only monotonic ranges can be accessed, simple indexes should be rewritten as composite indexes.


The base values x that are used for function F(x) in the generated expression are mapped to period numbers according to their assignment into periods. As noted, the periods may be constant or inferred, in which case the number of periods is data-dependent. For example, the number of periods for F(x)=tan(x) is 4π/(π/2), that returns 8. If F(x)=MONTH(x), then the period is “year” and the number of years included in the index depends on the data inserted.


In either case, the ORGANIZE BY statement is modified, to force the index on the derived column to separate the domain according to periods.


ORGANIZE BY (F(x))→ORGANIZE BY (F(x), x/p)


For inferred periods, there are functions that calculate x/p already such as YEAR( ), so those can be used as well.


ORGANIZE BY (F(x))→ORGANIZE BY (F(x), f(p))


Same as in derived predicates for monotonic functions, the derived predicates are stored as additional information about the columns. The derived predicates include equality and “in” predicates and inequality predicates.


Equality and “in” predicates include:

    • flag the new predicate as derived, and
    • let an encapsulator evaluate the expression for both dimensions of the composite index.


By way of example, where day=d→where MONTH=MONTH(d) and YEAR=YEAR(d) and where radians=π→where SIN(π) and RADIANS=RADIANS(π).


Inequality predicates include:

    • conditions on the references columns
    • C1 appears in the predicate with a constant value
    • the deriving function is PPM, and
    • all other columns appear in equality predicates with constant values.


Accordingly, FIG. 4 illustrates an exemplary method 400 for accelerating range queries using periodic monotonic properties of non-monotonic functions, in accordance with certain exemplary embodiments of the present invention. The method includes mapping 410 a base column x to an existing index on a column y that is correlated with column x through a periodic piecewise monotonic function F(x), rewriting 420 an index construction statement to force the existing index on column y to track a periodic piecewise monotonic property by assigning identical values of F(x) to different periods to different ranges to create an annotated index, and rewriting 430 range queries over the annotated index on F(x) by modifying a derived predicate.


While the invention has been described in terms of several exemplary embodiments, those skilled in the art will recognize that the invention can be practiced with modification within the spirit and scope of the appended claims.


Further, it is noted that, Applicants' intent is to encompass equivalents of all claim elements, even if amended later during prosecution.

Claims
  • 1. A method for accelerating range queries using periodic monotonic properties of non-monotonic functions, comprising: mapping a base column x to an existing index on a column y that is correlated with column x through a periodic piecewise monotonic function F(x);rewriting an index construction statement to force the existing index on column y to track a periodic piecewise monotonic property by assigning identical values of F(x) to different periods to different ranges to create an annotated index; andrewriting range queries over the annotated index on F(x) by modifying a derived predicate.
  • 2. The method for accelerating range queries using periodic monotonic properties of non-monotonic functions according to claim 1, further comprising constructing said existing index.
  • 3. The method for accelerating range queries using periodic monotonic properties of non-monotonic functions according to claim 2, wherein said constructing said existing index comprises: recognizing said periodic piecewise monotonic function and a periodicity of said periodic piecewise monotonic function; andbuilding an index over the periodic monotonic function.
  • 4. The method for accelerating range queries using periodic monotonic properties on non-monotonic functions according to claim 3, wherein said recognizing said periodic piecewise monotonic function is accomplished from explicit user input, recognizing semantically that a function is known to be a periodic piecewise monotonic function and verifying a relationship between values of the base column x and an a derived column.
  • 5. The method for accelerating range queries using periodic monotonic properties on non-monotonic functions according to claim 3, wherein during said building an index over the periodic monotonic function if a derived column is known to be a periodic monotonic function by observation, at each of a plurality of updates a relationship should be verified.