The present invention relates to spectrum or periodogram estimation in streaming data under conditions of limited resources.
Numerals presented herebelow in square brackets—[ ]—are keyed to the list of references found towards the close of the present disclosure.
Spectrum estimation, that is, analysis of the frequency content of a signal, is a core operation in numerous applications, such as data compression, medical data analysis (ECG data) [2], pitch detection of musical content [4], and other applications. Widely used estimators of the frequency content are the periodogram and the autocorrelation [5] of a sequence. For statically stored sequences, both methods have an O(nlogn) complexity using the Fast Fourier Transform (FFT). For dynamically updated sequences (streaming case), the same estimators can be computed incrementally, by continuous update of the summation in the FFT computation, through the use of Momentary Fourier Transform [12,9,15].
However, in a high-rate, data streaming environment with multiple processes ‘competing’ over computational resources, there is no guarantee that each running process will be allotted sufficient processing time to fully complete its operation. Instead to of blocking or abandoning the execution of processing threads that cannot fully complete, a desirable compromise would be for the system to make provisions for adaptive process computation. Under this processing model every analytic unit (e.g., in this case the ‘periodogram estimation unit’) can provide partial (‘coarser’) results under tight processing constraints.
Under the aforementioned processing model and given limited processing time, one may not be seeking for results that are accurate or perfect, but only ‘good enough’. Even so, since a typical streaming application will require fast, ‘on-the-fly’ decisions, an intelligent sampling procedure of exemplary efficiency would appear to represent a significant improvement over conventional efforts. A need has thus been recognized in connection with effecting such an improvement, among others.
There is broadly contemplated herein a method and apparatus for periodogram estimation based on resource (such as CPU, memory etc.) availability, in accordance with at least one presently preferred embodiment of the present invention. Also broadly contemplated herein is an intelligent sampling procedure that can decide whether to retain or discard an examined sample, based on a “lightweight” linear predictor whereby a sample is recorded only if its value cannot be predicted by previously seen sequence values.
Also, considering that in view of the sampling process, the retained data samples (a subset of the examined data window) are not guaranteed to be equi-spaced, there is also contemplated herein an elaboration of a closed-form periodogram estimation in the context of unevenly spaced samples.
In summary, one aspect of the invention provides a method of providing a spectrum estimation for data in a data stream, the method comprising the step of providing a spectrum estimation based on resource availability.
Another aspect of the invention provides an apparatus for providing a spectrum estimation for data in a data stream, the apparatus comprising an arrangement for providing a spectrum estimation based on resource availability.
Furthermore, an additional aspect of the present invention provides a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for providing a spectrum estimation for data in a data stream, the method comprising the step of providing a spectrum estimation based on resource availability.
The above and other objects, features, and advantages of the present invention will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings, in which:
Generally speaking, in considering a data streaming scenario, one objective addressed herein is the provision of efficient mechanisms for estimating and updating the spectrum within a current data window. As such, a periodogram may be used as an estimate of the spectrum. A schematic illustration of resource-adaptive methodology in accordance with a preferred embodiment of the present invention is provided in
Briefly, in the context of an examined window 102 of a data stream 100, a load shedding arrangement will provide an intelligent sampling scheme (104). At a decision point 106, if there is insufficient CPU time (108), then more points will be removed (110) to yield a spectrum estimation (112) in that context. But given sufficient CPU time (114), then no intermediate step is essentially needed to arrive at spectrum estimation (116). This process will be described in greater detail below.
Essentially, at any given time, there might not be enough processing capacity to provide a periodogram update using all the samples within the data window. The first step toward tackling this problem is the reduction of points using an ‘on-the-fly’ load-shedding scheme. Sub-sampling can lead to data aliasing and deteriorate the quality of the estimated periodogram. Therefore, the sampling should not only be fast but also intelligent, mitigating the impact of the sub-sampling on the squared error of the estimated periodogram. Sampling is based on a linear predictor, which retains a sample only if its value cannot be predicted by its neighbors. An estimator unit is also employed, which changes over time the ‘elasticity’ of the linear predictor, for proper adaptation to the current CPU load.
If there is enough CPU time to process the final number of retained samples, the spectrum is computed. Otherwise, more samples are dropped randomly and the new estimate is computed on the remaining samples.
The computation of the approximate periodogram is based on a formulation of the DFT and the periodogram using unevenly spaced samples, a highly desirable step due to the sampling process. Under a sliding window model, some of the previously used samples are discarded, while new samples are added in the window. A periodicity estimation algorithm proposed herein possesses a very simple update structure, requiring only subtraction of contributions from discarded samples and addition of contributions due to the newly included samples.
Turning now to specific implementations, the Discrete Fourier Transform is used to analyze the frequency content in a discrete and evenly sampled signal. In particular for a discrete time signal x[n] the DFT X [m] is defined for all samples 0≦m, n≦N−1 as:
The periodogram P of a signal corresponds to the energy of its DFT:
P[m]=∥X[m]∥2 (2)
Consider now, a continuous signal x(t) sampled unevenly at discrete time instants {t0, t1, . . . , tN−1}. An example of this is shown in
One may write this unevenly sampled signal using the discrete notation as x[kn] where ti=kiT(kiε) and T corresponds to the sampling interval with all sampling instants as multiples. This is also shown in
One can measure the complexity of algorithms in terms of the number of additions (subtractions), multiplications and divisions involved in the computations. Thus, one may label the complexity of a single multiplication as ξMul, of a division as ξDiv and of a sum/subtraction as ξSub.
In connection with a load shedding scheme, one may consider the typical problem of running spectral analysis where we slide a window across the temporal signal and incrementally update the signal's DFT (and the respective periodogram). Preferably, one starts with an evenly sampled signal, with sampling interval T. Consider that the window slides by a fixed amount Width×T. As a result of this sliding we discard n1 points from the beginning of the signal and add n2 points to the end. However, if the available CPU cycles do not allow us to update the DFT using all the points, one can adaptively prune the set of added points using uneven sub-sampling to meet the CPU constraint while minimizing the impact on the accuracy of the updated DFT.
The disclosure now turns to an algorithm (with linear complexity) for the adaptive pruning of the newly added samples. In order to decide whether one can retain a particular sample, one may preferably determine whether it can be linearly predicted from its neighbors. (Higher order predictors are also possible, but would clearly result in higher complexity.) In particular, to make a decision for sample ki one preferably compares the interpolated value xint[ki] with the actual value x[ki], where the interpolated value is computed as:
where sample ki−1 is the last retained sample before sample ki and sample ki+1 is the immediately following sample. If
one can discard the sample ki, otherwise it is retained. The parameter Thresh is an adaptive threshold that determines the quality of the approximation. If the threshold is large, more samples are discarded, and similarly if the threshold is small fewer samples are discarded. (It should be noted that the squared approximation error due to this sub-sampling scheme cannot be bounded in general for all signals, however it is selected for its computational simplicity. In particular, for the wide variety of signals considered in experimentation herein, there has not been observed squared error significantly larger than the absolute squared threshold value. Modification of this scheme to guarantee bounds on the approximation error can well be further explored.) An example of this interpolation scheme is shown in
In
ξinterp=(2ξMul+4ξSub+ξDiv)(n2−2) (4)
Further below is a discussion of how to tune the threshold Thresh in order to obtain the desired number of {circumflex over (n)}2 samples, out of the n2 samples added by the sliding window.
In
The load-shedding algorithm assumes the input of a threshold value, which directly affects the resulting number of retained points within the examined window. The desirable number of final points after the thresholding is dictated by the available CPU load. An optimal threshold value would lead to sampling exactly as many points as could be processed by the currently available CPU time. However, there is no way of predicting accurately the correct threshold without having seen the complete data, or without resorting to an expensive processing phase. In
A simple estimator of the threshold value with constant complexity can be provided, derived by training on previously seen portions of the data stream. The expectation is that the training will be performed on a data subset that captures a sufficient variation of the stream characteristics. The estimator will accept as input the desired number of final samples that should remain within the examined window, along with a small subset of the current data characteristics, which—in a way—describe its ‘shape’ or ‘state’ (e.g. a subset of the data moments, its fractal dimensionality, etc.). The output of the estimator is a threshold value that will lead (with high expectation) to the desirable number of window samples.
The estimator is not expected to have zero error, but it should lead approximately to the desired compression ratio. In the majority of cases the selected threshold will lead either to higher or lower compression ratio. Intuitively, higher compression (or overestimated threshold) is preferable. This is the case, because then one does not have to resort to the additional phase of dropping randomly some of the retained samples (a sampling that is ‘blind’ and might discard crucial points, such as important local minima or maxima). In experimentation (as discussed further below), it has been verified that this desirable feature is true for the threshold estimator that is presented immediately below.
By way of a training phase, assume that F is a set of features that capture certain desirable characteristics of the examined data window w, and Pε{1, 0, . . . , |w|} describes how many points can be processed at any given time. The threshold estimator will provide a mapping F×PT, where T is a set of threshold values.
It is not difficult to visualize that data whose values change only slightly (or depict small variance of values) do not require a large threshold value. The reverse situation exists for sequences that are ‘busy’, or exhibit large variance of values. With this observation in mind, there is employed the variance within the examined window as a descriptor of the window state. Higher order moments of the data could also be used in conjunction with the variance for improving the accuracy of the predictor. However, for simplicity and for keeping the computational cost as low as possible, there is employed just the variance in our current prototype implementation.
The training phase proceeds as now described. Given the training data, a sliding window is run on them. For each data window there is computed the variance and there is executed the load-shedding algorithm for different threshold values (typically, 20, 40, . . . , 100, 120). After the algorithm execution the remaining number of data points is recorded. This process is repeated for all the extracted data windows. The result of this algorithm will be a set of triplets: [threshold, variance, number of points]. Given this, one can construct the estimator as a mapping f (numPoints, variance)Thresh, where the actual estimator is essentially stored as a 2-dimensional array for constant retrieval time. An example of this mapping is shown in
It is clear that the training phase is not performed in real-time. However it happens only once (or periodically) and it allows for a very fast prediction step.
It should be pointed out that, even though we assume that the training data will provide ‘sufficient’ clues on the data stream characteristics, the estimator might come upon an input of [variance, numpoints] that has not encountered during the training phase. In this case, one can simply provide the closest match, e.g. the entry that has the closest distance (in the Euclidean sense) to the given variance and number of points. Alternatively, one could provide an extrapolation of the values, in other words, explicitly learn the mapping function. This can be achieved by constructing an RBF network [1] based on the training triplets. Since this approach is significantly more expensive and could present overfitting problems, in the experiments below the former alternative is followed.
Further, over a period of time, the stream characteristics may gradually change, and in the end may differ completely from the training data, hence leading to inconsistent predictions. One can compensate for this by ‘readjusting’ the predictor, by also recording the observed threshold error during the algorithm execution. This will result in a more extended maintenance phase of the estimator, but this cost is bound to pay off in the long run for datasets that exhibit frequent ‘concept drifts’ [10, 7]. This extension is not further elucidated upon herein, but it is presently noted as a potential addition for a more complex version of the threshold estimator.
Consider a signal x[ki], 0≦i≦N−1, as shown in
where for m=1, . . . , M−1
and for m=0
A significant benefit that equation (5) brings is that the DFT for such unevenly sampled signals can be evaluated incrementally. Hence, if the window is shifted by a fixed width such that the first n1 points are discarded, and n2 points are added at the end, then the DFT of the signal may be updated as follows:
There will now be considered the complexity of computing this update. As with several papers that analyze the complexity of the FFT, it is assumed that the complex exponentials
(and the intermediate value
are considered pre-computed for all m and n. Using labels for complexity as defined in the notation, the complexity of computing one single update coefficient Xn[m] for m=1, . . . , M−1−1 may be represented as:
{circumflex over (ξ)}=6ξMul+5ξSub+ξDiv (9)
and for m=0 as
{circumflex over (ξ)}=2ξMul+2ξSub (10)
Finally, the complexity of updating all the M DFT coefficients in this scenario is:
ξupdate(M,n1,n2)=(n1+n2)[(M−1)(6ξMul+5ξSub+ξDiv)+(2ξMul+2ξSub)+MξSub]+2MξSub
Using a presently inventive sub-sampling algorithm, one can reduce the number of samples that need to be used to update the DFT. Consider that as a result of the pruning, one can reduce n2 samples into a set of {circumflex over (n)}2 samples ({circumflex over (n)}2≦n2). While the reduction in the number of samples directly translates to a reduction in the complexity of the update, one should also factor in the additional cost of the sub-sampling algorithm. Comparing equations (11) and (4) it is apparent that the overall complexity of the update (including the sub-sampling) is reduced when:
ξupdate(M,n1,n2)≧ξupdate(M,n1,{circumflex over (n)}2)+ξinterp (12)
To determine when this happens, consider a simple case when {circumflex over (n)}2=n2−1, i.e. the sub-sampling leads to a reduction of one sample. The increase in complexity for the sub-sampling is (2ξMul+4ξSub+ξDiv) (n2−2) while the corresponding decrease in the update complexity is (M−1) (6ξMul+5ξSub+ξDiv)+(2ξMul+2ξSub)+MξSub (from equation (11). Clearly, since {circumflex over (n)}2<n2≦M, one can easily realize that the reduction in complexity far outweighs the increase due to the sub-sampling algorithm. In general, equation (12) is always true when the sub-sampling algorithm reduces the number of samples (i.e., when {circumflex over (n)}2<n2).
If, at a certain time, the CPU is busy, thereby imposing a computation constraint of ξlimit, one should preferably perform a DFT update within this constraint. If ξupdate (M,n1,n2)>ξlimit one cannot use all the samples n2 for the update, and hence one needs to determine the optimal number of samples to retain {circumflex over (n)}2, such that ξupdate(M,n1,{circumflex over (n)}2)+ξinterp≦ξlimit. Specifically, one may compute this as:
Finally, one can achieve this by tuning the sub-sampling threshold Thresh based on the threshold estimator algorithm described hereinabove.
The disclosure now turns to experimentation with the algorithms and concepts discussed hereinabove.
The usefulness of the presently inventive resource-adaptive periodicity estimation depends on two factors:
The quality of the approximated Fourier coefficients is measured on a variety of periodic datasets obtained from the time-series archive at UC Riverside [14]. These datasets only have a length of 1024, therefore it is difficult to provide a meaningful evaluation on the streaming version of the algorithm. However, by providing the whole sequence as input to the periodicity estimation unit one can evaluate the effectiveness of the load-shedding scheme in conjunction with the closed-form DFT computation on the unevenly spaced samples. One can compute the accuracy by comparing the estimated periodogram against the actual one (had we not discarded any point from the examined data window). Thus, the above experiment was run on different threshold values Thresh=20 . . . 120. For example, a value of Thresh=20 signifies that the predicted value (using the linear predictor) does not differ more than 20% from the actual sequence value.
Note that the original periodogram is evaluated on a window of M points (M=1024), while the one based on uneven sampling uses only the N remaining samples (N≦M). In order to provide a meaningful comparison between them the latter periodogram is evaluated on all M/2 frequencies—see equation (6)—, even though this is not necessary with an actual deployment of the algorithm.
The accuracy of the presently inventive methodology is compared against a naive approach that uses equi-sampling every N/M points (i.e., leading again to N remaining points within the examined window). This approach is bound to introduce aliasing and distort more the original periodogram, because (unlike the intelligent loadshedding) it does not adapt according to the signal characteristics.
The results suggest that the load-shedding scheme employed by the presently inventive technique can lead to spectrum estimates of much higher quality than competing methods. In two cases (
To test the accuracy of the threshold estimator, longer datasets are needed, which could be used for simulating a sliding window model execution and additionally provide a training subset. Thus, for experimentation purposes, there were utilized real datasets provided by the automotive industry. As such, these are diagnostic measurements that monitor the evolution of variables of interest during the operation of a vehicle. Examples of such measurements could be the engine pressure, the torque, vibration patterns, instantaneous fuel economy, engine load at current speed, etc.
Periodic analysis is an indispensable tool in automotive industry, because predictive maintenance can be possible by monitoring the changes in the spectrum of the various rotating parts. Therefore, a change in the periodic structure of the various engine measurements can be a good indicator of machine wear and/or of an incipient failure.
The measurements used have a length of 50000 points and represent the monitoring of a variable over an extended period of time. On this data, there was employed a sliding window of 1024 points. A synthetic CPU load is generated, which is provided as input to the periodicity estimation unit. Based on the synthetic CPU trace, at any given point in time the periodicity unit is given adequate time for processing a set of points with cardinality within the range of 50 to 1024 (1024 being the length of the window). In
Executing the presently inventive algorithm on the complete data stream, the accuracy of the threshold estimator is monitored. The estimator is fed with the current CPU load and provides a threshold estimate Threshest that will lead with high probability to {circumflex over (P)} remaining points (so that they could be sufficiently processed given the available CPU load). Suppose that the actual remaining points after the application of the threshold Threshest are P. An indicator of the estimator accuracy is provided by contrasting the estimated number of points {circumflex over (P)} against the actual remaining ones P(error==|{circumflex over (P)}−P|).
The experimental results are very encouraging and indicate an average error on the estimated number of points in the range of 5% of the data window. For this experiment, if the predicted number of points for a certain threshold is 250 points, the actual value of remaining points could be (for example) 200 points. This is the case of an overestimated threshold which compressed more the flowing data stream. As mentioned before, this case is more desirable (than an underestimated threshold), because no additional points need to be subsequently dropped from the current data window (which is not bound to introduce additional aliasing problems).
A histogram of the estimator approximation error is given on the left part of
In summary, there, has been presented herein the first resource-adaptive method for periodicity estimation for streaming data. By way of brief, albeit non-restrictive recapitulation, some key aspects of a proposed method in accordance with at least one embodiment of the present invention are:
(1) An intelligent load-shedding scheme that can adapt to the CPU load using a lightweight predictor.
(2) A DFT estimation that utilizes unevenly spaced samples, provided by the previous phase.
The quality of the approximated DFT has been shown and it has also been demonstrated that the scheme can adapt closely to the available CPU resources. The intelligent load-shedding scheme has been compared against equi-sampling and improvements in the periodogram estimation ranging from 10% to 90% are shown.
Further exploration could involve an examination of whether it is possible to reduce the computational cost even further.
By way of further recapitulation, some of the important contributions set forth herein are as follows:
Other recent work on periodicity estimation on data streams has appeared in [6], where the authors study sampling techniques for period estimation using sublinear space. [8] proposes sampling methods for retaining (with a given approximation error) the most significant Fourier coefficients. In [11] Papadimitriou, et al., adapt the use of wavelet coefficients for modeling a data stream, providing also a periodicity estimator using logarithmic space complexity. However, none of these approaches addresses the issue of resource adaptation, which is one of the main contributions provided herein. It should be further noted that a presently proposed method for periodogram reconstruction based on irregularly spaced samples is significantly more lightweight than the widely used Lomb periodogram [13] (which incurs a very high computational burden).
It is to be understood that the present invention, in accordance with at least one presently preferred embodiment, includes an arrangement for providing a spectrum estimation based on resource availability, which may be implemented on at least one general-purpose computer running suitable software programs. It may also be implemented on at least one integrated Circuit or part of at least one Integrated Circuit. Thus, it is to be understood that the invention may be implemented in hardware, software, or a combination of both.
If not otherwise stated herein, it is to be assumed that all patents, patent applications, patent publications and other publications (including web-based publications) mentioned and cited herein are hereby fully incorporated by reference herein as if set forth in their entirety herein.
Although illustrative embodiments of the present invention have been described herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various other changes and modifications may be affected therein by one skilled in the art without departing from the scope or spirit of the invention.
References
1. D. Broomhead and D. Lowe. Multivariate functional interpolation and adaptive networks. In Complex Systems, 2: 321:355, 1988.
2. P. Castiglioni, M. Rienzo, and H. Yosh. A Computationally Efficient Algorithm for Online Spectral Analysis of Beat-to-Beat Signals. In Computers in Cardiology: 29, 417:420, 2002.
3. J. W. Cooley and J. W. Tukey. An algorithm for the machine calculation of complex Fourier series. In Math. Comput. 19, 297:301, 1965.
4. P. Cuadra, A. Master, and C. Sapp. Efficient Pitch Detection Techniques for Interactive Music. In International Computer Music Conference, 2001.
5. M. G. Elfeky, W. G. Aref, and A. K. Elmagarmid. Using Convolution to Mine Obscure Periodic Patterns in One Pass. In EDBT, 2004.
6. F. Erg un, S. Muthukrishnan, and S. C. Sahinalp. Sublinear methods for detecting periodic trends in data streams. In LATIN, 2004.
7. W. Fan. StreamMiner: A Classifier Ensemble-based Engine to Mine Concept-drifting Data Streams. In Proc. of VLDB, pages 1257-1260, 2004.
8. A. C. Gilbert, S. Guha, P. Indyk, S. Muthukrishnan, and M. Strauss. Near-optimal Sparse Fourier Representations via Sampling. In STOC, pages 152-161, 2002.
9. M. Kontaki and A. Papadopoulos. Efficient similarity search in streaming time sequences. In SSDBM, 2004.
10. M. Lazarescu, S. Venkatesh, and H. H. Bui. Using Multiple Windows to Track Concept Drift. In Intelligent Data Analysis Journal, Vol 8(1), 2004.
11. S. Papadimitriou, A. Brockwell, and C. Faloutsos. Awsom: Adaptive, hands-off stream mining. In VLDB, pages 560-571, 2003.
12. A. Papoulis. Signal Analysis. McGraw-Hill, 1977.
13. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, 1992.
14. Time-Series Data Mining Archive. [http://]www.cs.ucr.edu/eamonn/TSDMA/.
15. Y. Zhu and D. Shasha. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB, 2002.
This application is a continuation application of U.S. patent application Ser. No. 11/389,344 filed on Mar. 24, 2006 now U.S. Pat. No. 8,112,247, the contents of which are hereby fully incorporated by reference in its entirety.
This invention was made with Government support under Contract No.: H98230-05-3-0001 awarded by the U.S. Department of Defense. The Government has certain rights in this invention.
Number | Name | Date | Kind |
---|---|---|---|
5463775 | DeWitt et al. | Oct 1995 | A |
5781861 | Kang et al. | Jul 1998 | A |
5943170 | Inbar et al. | Aug 1999 | A |
5943429 | Handel | Aug 1999 | A |
6009129 | Kenney et al. | Dec 1999 | A |
6014620 | Handel | Jan 2000 | A |
6400310 | Byrnes et al. | Jun 2002 | B1 |
6594524 | Esteller et al. | Jul 2003 | B2 |
6996412 | Hunzinger et al. | Feb 2006 | B2 |
7817717 | Malayath et al. | Oct 2010 | B2 |
20020114383 | Belge et al. | Aug 2002 | A1 |
20020140873 | Van Dijk et al. | Oct 2002 | A1 |
20020194251 | Richter et al. | Dec 2002 | A1 |
20030028582 | Kosanovic | Feb 2003 | A1 |
20030163044 | Heimdal et al. | Aug 2003 | A1 |
20030214907 | Berkcan et al. | Nov 2003 | A1 |
20030215867 | Gulati | Nov 2003 | A1 |
20040137915 | Diener et al. | Jul 2004 | A1 |
20060075102 | Cupit | Apr 2006 | A1 |
20070142873 | Esteller et al. | Jun 2007 | A1 |
Number | Date | Country |
---|---|---|
1500260 | May 2004 | CN |
03065351 | Jul 2003 | WO |
Entry |
---|
Kim: Won-Ik, et al., “A New Traffic-Load Shedding Scheme in the WCDMA Mobile Communication Systems”, Proceedings of the Vehicular Technology Conference, 2002, pp. 2405-2409, IEEE, New York, New York, USA. |
Broomhead, D., and, Lowe, D., Multivariate functional interpolation and adaptive networks. In Complex Systems, 2:321:355, 1988. |
Castiglioni, P., De Rienzo, M., and Yosh, H., A Computationally Efficient Algorithm for Online Spectral Analysis of Beat-to-Beat Signals. In Computers in cardiology: 29, 417:420, 2002. |
Cooley, J.W., and Tukey, J.W., An algorithm for the machine calculation of complex Fourier series. In Math. Comput. 19, 297:301, 1965. |
De La Cuadra, P., Master, A., and Sapp, C., Efficient Pitch Detection Techniques for Interactive Music. In International Computer Music Conference, 2001. |
Elfeky, M.G., Aref, W.G., and Elmagarmid, A.K., Using Convolution to Mine Obscure Periodic Patterns in One Pass. In EDBT, 2004. |
Ergun, F., Muthukrishnam, S., and Sahinalp, S.C., Sublinear methods for detecting periodic trends in data streams. In LATIN, 2004. |
Fan, W., SteamMiner: A Classifier Ensemble-based Engine to Mine Concept-drifting Data Streams. In Proc. of VLDB, pp. 1257-1260, 2004. |
Gilbert, A.C., Guha, S., Indyk, P., Muthukrishnan, S., and Strauss, M., Near-optimal Sparse Fourier Representations via Sampling. In STOC, pp. 152-161, 2002. |
Kontaki, M., and Papadopoulos, A., Efficient similarity search in streaming time sequences. In SSDBM, 2004. |
Lazarescu, M., Venkatesh, S., and Bui, H.., Using Multiple Windows to Track Concept Drift. In Intelligent Data Analysis Journal, vol. 8(1), 2004. |
Papadimitriou, S., Brockwell, A. and Faloutsos, C., Adaptive, hands-off stream mining. In VLDB, pp. 560-571, 2003. |
Zhu, Y. and Shasha, D.. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB, 2002. |
Number | Date | Country | |
---|---|---|---|
20090074043 A1 | Mar 2009 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11389344 | Mar 2006 | US |
Child | 12177300 | US |