Method for processing multiple continuous top-K queries

Information

  • Patent Application
  • 20070288475
  • Publication Number
    20070288475
  • Date Filed
    October 11, 2006
    17 years ago
  • Date Published
    December 13, 2007
    16 years ago
Abstract
A method for processing multiple continuous Top-K queries, which is performed between a master server and multiple of slave servers, including steps of: a first step, for arbitrarily selecting the multiple of slave servers and querying and counting up K of accumulated values of which are recorded at most; a second step, for calculating every two adjacent values which have been sent from the same servers to obtain an average value as a threshold; and a third step, for measuring variations of an upper bound and a lower bound for each of the values by using the threshold, and reporting to the master server at a time of the value being in excess of the upper bound or lower than the lower bound for each of the values.
Description

BRIEF DESCRIPTION OF DRAWINGS


FIG. 1 shows two graphs, illustrating results of simulated query 1 after parameters received by severs N1 and N2 in Table 3, respectively, and



FIG. 2 is a schematic drawing, showing variation of result of Top-2 occurred.





DETAILED DESCRIPTION OF INVENTION

Hereinafter, an embodiment that exemplarily needs to be processed multiple continuous queries is described. Firstly, a whole system is generally illustrated where four severs having similar web contents are arranged in the system and distributed in different areas like Asia, America, Europe, and so on. For sake of simplification, these four servers are respectively referred to as N1, N2, N3 and N4. Further, there exists a main server for handling the continuous queries, which is referred to as N0. Table 1 shows contents of these four servers (ordering based on number of clicks to the web page).


In addition, different users or web service people will care on what needs to be informed by the system that the previous Top-K pages are located at which specified servers. In the embodiment, it is assumed that k is 2, thus the results for queries are all in focus on a first order and a second order.


Supposed that there are three continuous queries described as the following:

    • Query 1: Which two webs are most-frequently browsed in N1 and N2?
    • Query 2: Which two webs are most-frequently browed in N3 and N4?
    • Query 3: Which two webs are most-frequently browsed in N2 and N3?


Here, the Query 1 and Query 2 will arrive at the same time, and the Query3 will arrive after the system having reported the Top-2 results of both Queries 1 and 2.


For the Query 1, because there is no any previous query data processed before in the beginning, the server N0 will firstly request N1 and N2 to report web ID and number of clicks for this ID. When N0 finds that there are at least two IDs which have bean reported by the N1 and N2, N0 will cease requesting the servers N1 and N2 to report next ID and number of clicks of the next ID.


Further, for the reported IDs, a threshold will be calculated between every two reported IDs of the same server by sum-mean of number of clicks for these two IDs. The steps for the Query 2 are the same as those of the Query 1. Table 2 shows contents of servers N0, N1, N2, N3, and N4 after completing the above described step.


Here, a RLT (Ranked List Table) has been automatically set up. Therefore, the server NO can calculate which IDs are possible to be the result for Top-K. In terms of the Query 1:





number of clicks on ID1≦2000+number of clicks on ID4=3000





number of clicks on ID2=1700+1800=3500





number of clicks on ID3=1840+3020=4860





number of clicks on ID5≦number of clicks on ID6+2780=3980,


thus ID1 is not possible to be Top-2 and can be deleted at first. ID5 can not be determined until number of clicks on ID5 from the server N1 was received by N0. Such an action is referred to as random access. Here, provided that number of clicks on ID5 reported by N1 is 0, then the number of clicks on ID5 becomes 2780. Thus, the first order shall be ID3 and the second order shall be ID2. At the time for processing Query 3, because N0 exists the information of RLT, the N0 will check RLT firstly. N0 founds that two IDs has been listed simultaneously in the tables of N2 and N3. The result is that ID3 and ID5 satisfy such a condition. Then, the server N0 starts to calculate which IDs are possible to be the Top-K result.


The following is the calculated results from N0 by using the threshold:






T3, 2=1400≦number of clicks on ID2≦T2, 2+T2,3=2290+1015=3305






T2, 2+T2, 3=2900+1015=3915≦number of clicks on ID3






T2, 3=1210≦number of clicks on ID4≦T3, 2+T1, 3=1400+1360=2760






T2, 2+T1, 3=2290+1360=3650≦number of clicks on ID5.


Because the upper bounds of ID2 and ID4 (3305 and 2760, respectively) are all smaller than the lower bound of ID3 or ID5 (3915 and 3650), the server N0 can only acquire the current total number of clicks on ID3 and ID5. Further, due to the existence of the information of RLT in this embodiment, it can be saved 4 times of access. If it still employ the conventional method, the codes of the previous 4 web pages should be found and the total number of clicks needs 8 times of access to be calculated.


Supposed that the initial Top-2 result has been obtained, the system should indicate the servers N1, N2, N3, and N4 to judge by themselves whether the Top-2 result has varied. For example, in case of Query 1, after the N0 obtained the Top-2 result, the corresponding parameters calculated from the current number of clicks on IDs will be:





Parameter of ID1 of N1=3000/2−2000=−500





Parameter of ID2 of N1=3500/2−1700=50





Parameter of ID3 of N1=4860/2−1840=590





Parameter of ID5 of N1=2780/2−0=1390





Parameter of ID1 of N2=3000/2−1000=500





Parameter of ID2 of N2=3500/2−1800=−50





Parameter of ID3 of N2=4860/2−3020=−590





Parameter of ID5 of N2=2780/2−2780=−1390


The server N0 will transmit these above parameters to the corresponding servers Ni (i.e, 1≦i≦4). The table 3 and FIG. 1 show the results of the Query 1 that are simulated by servers N1 and N2 after they received the corresponding parameters.

The N1 will request the N0 to recalculate Top-2 result if the simulated Top-2 result has varied in Ni. For example, number of clicks on ID1 of Query 1 increases 500, as shown in FIG. 2. It varies the simulated Top-2 result, and then the result should be recalculated.


In summary, in addition that the method of the present invention has the advantages of cost effectiveness and faster response for processing the subsequently continuous Top-K queries, the method of the present invention can also possess the advantages of large calculation rate that each of the Ni is capable of judging the current Top-K result for the successive processing by itself. However, in the conventional method, the N0 should request all related web page code and number of clicks for calculating all of the corresponding parameters. Therefore, the method of the present invention is advantageous for processing multiple continuous queries.


Having thus described the embodiment of the invention, it is to be appreciated various alterations, modifications, and improvements will readily occur to those skilled in the art, for example, the present invention can apply to the monitoring for wireless sensor or webs, etc. Therefore, such alterations, modifications, and improvements are intended to be part of this disclosure, and are intended to be within the spirit and scope of the invention defined in the appended claims.

    • Table 1 shows the contents of 4 servers.
    • Table 2 shows the contents of N0, N1, N2, N3, and N4 after N0 process Query 1 and Query 2.
    • Table 3 shows the results of Query 1 that are simulated by servers N1 and N2 after they received the corresponding parameters.









TABLE 1







contents of N1, N2, N3 and N4













Number of

Number of



Web page code
clicks
Web page code
clicks
















N1

N2













1
2000
3
3020



3
1840
5
2780



2
1700
2
1800



6
1200
4
1000



. . .
. . .
. . .
. . .












N3

N4













5
1400
4
2020



4
1320
6
1780



3
1100
3
1400



6
 930
7
1140



. . .
. . .
. . .
. . .

















TABLE 2







N0










N1
N2
N3
N4














Web page
Number of
Web page
Number of
Web page
Number of
Web page
Number of


code
clicks
code
clicks
code
clicks
code
clicks





1
2000
3
3020
5
1400
4
2020


T1,1
1980
T1,2
2900
T1,3
1360
T1,4
1900


3
1840
5
2780
4
1320
6
1780


T2,1
1770
T2,2
2290
T2,3
1210
T2,4
1590


2
1700
2
1800
3
1100
3
1400


T3,1
1450
T3,2
1400
T2,3
1015
T3,4
1270


6
1200
4
1000
6
 930
7
1140















N1

N2














Number of

Number of



Web page code
clicks
Web page code
clicks







1
2000
3
3020



T1,1
1980
T1,2
2900



3
1840
5
2780



T2,1
1770
T2,2
2290



2
1700
2
1800



T3,1
1450
T3,2
1400



6
1200
4
1000



. . .
. . .
. . .
. . .
















N3

N4














Number of

Number of



Web page code
clicks
Web page code
clicks







5
1400
4
2020



T1,3
1360
T1,4
1900



4
1320
6
1780



T2,3
1210
T2,4
1590



3
1100
3
1400



T2,3
1015
T3,4
1270



6
 930
7
1140



. . .
. . .
. . .
. . .


















TABLE 3







N1
N2












Web page
Number of

Web page
Number of



code
clicks
Query 1:δ
code
clicks
Query 1:δ















1
2000
−500
3
3020
−590


T1,1
1980

T1,2
2900


3
1840
590
5
2780
−1390


T2,1
1770

T2,2
2290


2
1700
50
2
1800
−50


T3,1
1450

T3,2
1400


6
1200
0
4
1000
0


. . .
. . .
0
1
. . .
500


5
  0
1390
. . .
. . .
0








Claims
  • 1. A method for processing multiple continuous Top-K queries, which is performed between a master server and multiple of slave servers, comprising steps of: a first step, for arbitrarily selecting the multiple of slave servers, and querying and counting up K of accumulated values of which are recorded at most;a second step, for calculating every two adjacent values which have been sent from the same servers to obtain an average value as a threshold; anda third step, for measuring variations of an upper bound and a lower bound of each of said values by using said thresholds, and reporting to the master server at a time of said value being in excess of the upper bound or lower than the lower bound of each of said values.
Priority Claims (1)
Number Date Country Kind
095120360 Jun 2006 TW national