1. Field of the Invention
The subject matter disclosed herein relates to processing time series data and, more specifically, determining whether the time series data contains predefined patterns.
2. Brief Description of the Related Art
Data is stored on data storage devices in a variety of different formats. Additionally, various types of data storage devices are used to store data and these data storage devices may vary in cost. In one example, data may be stored according to certain formats on high cost devices such as random access memories (RAMs). In other examples, data may be stored on low cost devices such as on hard disks.
One type of data that is stored is time series data. In one aspect, time series data is obtained by some type of sensor or measurement device and is stored as a function of time. For example, a measurement sensor may take a reading of a parameter at predetermined time intervals, and each of the measurements is stored in memory. Since large amounts of data are typically involved with time series measurements, the storage of this data becomes particularly cumbersome.
The volume of time series data has gown exponentially over time, and this growth presents a unique set of challenges when attempting to store and mine historical data for analysis. Previous approaches for querying time series datasets require retrieving large amounts of data from a data repository at one time and executing an analytic on that complete dataset, discarding any data that is not required on the client side after the data has been transferred.
Unfortunately, the accessing of large pieces of data is inefficient, slow, and costly. User dissatisfaction has resulted from these previous approaches.
The approaches described herein provide re-streaming of time series data that minimizes the need to access large pieces of data at once thereby reducing the amount of large-scale input/output (I/O) operations and memory footprint that can slow processing. The re-streaming of time series data in the present approach accesses the time series data repository and retrieves data elements in small sets and send them onward for further processing through a stream-based operation. In sonic aspects, the re-streamed data is time-synchronized such that the data is replayed in chronological order. Further, depending on the specific analytic requirements, the data may be properly spaced such that a separation (e.g., n seconds) between two data points in the repository appears as a separation (e.g., n seconds) in the data that is being re-streamed.
In the present approaches, a defined set of time series events can be subscribed to by a user. Additionally, an event producer may re-strewn the historical data and actively looks for subscribed events, emitting them as they are found. Further, consumers or users may receive the data associated with the events as the event is detected.
As mentioned, the present approaches obtain small pieces of time series data and re-stream the subset of the time series data as though it were being generated in real time. In some aspects, the stream is analyzed and events are generated from historical time series data. Those events could then be subscribed to and consumed by different analytics further downstream. Events that could be generated include, but are not limited to, operations to reduce the size of the data (such as sampling operations or aggregation operations) or more complex pattern matching functions across single or multiple parameters at a point in time or over time. Other examples of analysis are possible.
In many of these embodiments, time series data is received from a time series data repository and the time series data includes a plurality of sub-portions. The sub-portions of data are sorted in chronological order to appear as if the data is being generated in real time arid are sent onward for further processing. The received and sorted time series data is analyzed to determine if one or more predefined events or patterns are found within the data. If one or more predefined events or patterns are found in the time series data by the analysis, a user is informed that the one or more predefined events or patterns have been detected or discovered.
In some aspects, at least some of the one or more predefined events or patterns are subscribed to by the user. In other aspects, the predefined events or patterns include an operation to reduce the size of the data and a pattern matching operation. Other examples of analytics are possible.
In some examples, the time series data repository is stored within a cloud or cloud-based network. In other examples, the predefined event or pattern is stored in a data library.
In others of these embodiments, an apparatus that is configured to re-stream stored time series data includes an interface and a controller. The interface has an input and output and is configured to receive time series data from a time series data repository. The time series data includes a plurality of sub-portions and when the sub-portions of data are returned, they are returned sorted in chronological order to appear as if the data is being generated in real time.
The controller is coupled to the interface and is configured to analyze the received and sorted time series data to determine if one or more predefined events or patterns occurred in the data. The controller is further configured to when the predefined events or patterns are detected in the time series data by the analysis, to inform a user at the output that the one or more predefined events or patterns have been discovered.
For a more complete understanding of the disclosure, reference should be made to the following detailed description and accompanying drawings wherein:
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity. It will further be appreciated that certain actions and/or steps may be described or depicted in a particular order of occurrence while those skilled in the art will understand that such specificity with respect to sequence is not actually required. It will also be understood that the terms and expressions used herein have the ordinary meaning as is accorded to such terms and expressions with respect to their corresponding respective areas of inquiry and study except where specific meanings have otherwise been set forth herein.
The approaches described herein provide for the re-streaming of time series data. In one aspect, a time series data repository can be searched and a subset of the time series data can be extracted and analyzed in chronological order. A re-streaming analytic execution engine may receive the data stream and execute the selected analytics against the stream, generating and emitting events as they are detected. In another aspect, a library of standard time series events is maintained that can be searched, and this allows users to specify which of those analytics to actively execute.
In still other aspects, a collection of event consumers is maintained. Users can subscribe to the events generated by a re-streaming execution engine. Each event consumer can communicate with the re-streaming execution engine to specify the specific events it is interested in receiving. The re-streaming execution engine understands which events to monitor and where to send those events when the events are detected. In one advantage of the present approaches, a common approach is provided by which historical and current data are analyzed, analytics become easier to build and maintain since the same analytic is used to do exploration on historical data and event detection on live data streams in real-time. This contrasts with previous data mining which required analytics to be built twice: once to mine and build analytic models on historical data, and a second time to turn that new model into an analytic that can be executed in real time.
Another advantage of the present approaches is that they allow for events/results to be analyzed as they are found during data exploration. In other words, the entire historical dataset would not have to be completely processed before the detected historical events of interest can be utilized. This reduces the time to make decisions and gain business value from the historical data.
Referring now to
The re-streaming analytic execution engine 104 may include a receive module 120 that receives the chronologically sorted the time series data stream; an execution module 122 that executes selected analytics 105 against the stream; a generation module 124 that generates and emits events as they are detected; and a search module 126 (that searches for patterns or events in the time series data). The re-streaming analytic execution engine 104 may be located in the cloud-based network 102 or outside the cloud-based network 102. It will be appreciated that the re-streaming analytic execution engine 104 may be disposed at the cloud-based network 102 or at various locations within and outside the cloud-based network 102.
The predefined events and patterns 114 may be a variety of different pieces of information. In some aspects, the predefined events or patterns include an operation to reduce the size of the data and a pattern matching operation. Other examples are possible.
The cloud-based network 102 is any combination of networks. For example, it may be any combination of the Internet, cellular phone networks, wide area networks or local area networks. Other types of networks and combinations of networks are possible.
The time series data repository 110 may in one example be a random access memory (RAM). However, it may be any type of memory storage device. The analytic library 112 may also be any type of data storage device.
The user interface 106 is any combination of hardware and software that allows a user to access information. For example, this may be a computer terminal with a mouse and a keyboard. Other examples of user interfaces are possible.
In one example of the operation of the system of
It will be appreciated that the modules 120, 122, 124, and 126 may be any combination of electronic hardware and software. For example, the modules 120, 122, 124, and 126 may be computer instructions that execute on general purpose processing devices.
In some aspects, at least some of the one or more predetermined events or patterns 114 are subscribed to by the user. This is accomplished via a subscribe to events or patterns message 119.
In some examples, the time series data repository 110 is disposed at the cloud-based network 102. In other examples, the predetermined event or pattern 114 is stored in the analytic library 112. In other examples, the analytics library 112 is searched by the re-streaming analytic execution engine 104 for a selected predefined event or pattern and analytics 105 to execute on the stream. In some other aspects, the predefined events or patterns 114 are consumed downstream by a downstream analytic 107. Examples of analytics 105 and 107 include event correlation, anomaly classification, or root cause analysis. Other examples are possible.
In another example of the operation of
The re-streamed data (re-streamed by the re-streaming analytic execution engine 104) is time synchronized such that the data is replayed in chronological order. Further, depending on the specific analytic requirements, the data may be properly spaced such that a separation (e.g., n seconds) between two data points in the repository appears as a separation (e.g., it seconds) in the data re-stream.
In another example of the operation of the system of
Referring now to
In some aspects, at least some of the one or more predefined events or patterns are subscribed to by the user. In other aspects, the predefined events or patterns include an operation to reduce the size of the data and a pattern matching operation. Other examples are possible.
In some examples, the time series data repository is disposed at a cloud or cloud-based network. In other examples, the predefined event or pattern is stored in an analytics library. In other examples, the analytics library is searched for analytics to execute to search for the selected predefined events or patterns. In some other aspects, the predetermined events or patterns are consumed downstream by a downstream analytic such as an event correlator or root cause analyzer.
Referring now to
The apparatus 300 includes an interface 302 and a controller 304. The interface 302 has an input 306 and output 308 and is configured to receive time series data 301 from a time series data repository. The time series data includes a plurality of sub-portions and the sub-portions of data are returned sorted in chronological order to appear as if the data is being generated in real time. The sorting may be performed by the controller 304 or the time series data 301 may be received in already-sorted form.
The controller 304 is coupled to the interface 302 and is configured to analyze the received and now sorted time series data in order to detect one or more predefined events or patterns. The controller 304 is further configured to when the predefined events or patterns are detected in the time series data by the analysis, to inform a user at the output 308 by a message 310 that the one or more predefined events or patterns have been found.
It will be appreciated by those skilled in the art that modifications to the foregoing embodiments may be made in various aspects. Other variations clearly would also work, and are within the scope and spirit of the invention. The present invention is set forth with particularity in the appended claims. It is deemed that the spirit and scope of that invention encompasses such modifications and alterations to the embodiments herein as would be apparent to one of ordinary skill in the art and familiar with the teachings of the present application.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US13/44964 | 6/10/2013 | WO | 00 |