The present invention relates to a method for filtering data series, preferably time series of data, prior to further processing. The present invention further relates to a system for filtering data series, preferably time series of data, prior to further processing.
In internet-of-things or machine-to-machine systems, devices conventionally send or actuate or any other automated data-generating task constantly provide information about any object, also called “thing”, mostly in form of so-called time series. Time series usually refer to data that are generated and/or collected at successive times in regular or irregular intervals and comprise a key-value pair. For example the value is a simple data type, for instance numeric, alphanumeric or binary data and a corresponding timestamp. For example time series stemming from internet-of-thing devices are one of the enablers of the so-called big data.
Time series collected by internet-of-things devices are often forwarded and stored via deployments based on a system illustrated in
Conventionally the data is forwarded and stored in a data center DC, for example a cloud. However, this causes a plurality of problems: For instance one of the problems is the bandwidth consumption and/or latency between the data delivering devices D or the gateways GW and the network core NC or the data center DC respectively. The further problems are the storage costs and the database performance of a data center DC. Another problem is the energy consumption on various tiers and further the system resilience because of potential concurrent database transactions. With the increasing use of the internet-of-things these problems will become even bigger in the future.
To address these problems the non-patent literature of Tak-chung Fu, “A review on time series data mining”, Engineering Applications of Artificial Intelligence, Volume 24, Issue 1, February 2011, Pages 164-181, ISSN 0952-1976 and of W. Lang, M. Morse, and J. M. Patel, “Dictionary-Based Compression for Long Time-Series Similarity,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 11, pp. 1609-1622, November, 2010 apply conventional reduction procedures like sampling, compression and/or selected forwarding, for example rule based and/or application-specific, or are customized for conventional internet-of-thing architectures such as for example shown in
In the further non-patent literature of J. Zhang, K. Yang, L. Xiang, Y. Luo, B. Xiong, and Q. Tang, “A Self-Adaptive Regression-Based Multivariate Data Compression Scheme with Error Bound in Wireless Sensor Networks”, International Journal of Distributed Sensor Networks, Vol. 2013, Article ID 913497 a method is shown for deciding automatically to transmit either raw or regression coefficients and in the latter case to select the number of data involved in the regressions.
However, these conventional methods act upon already collected data sets. Further they are often avoided because of the information loss that selected forwarding or data filtering inherently applies.
In
These information “losses” are very difficult to determine, especially when designing a data-agnostic system, i.e. a system that cannot filter based on the semantics of the data or based on application-specific needs. One reason for example is that it is unknown who will use the data and in which way.
In an embodiment, the present invention provides a method for filtering data series, preferably time series of data, prior to further processing, wherein the data series are collected by collecting entities and provided to one or more filtering entities from one or more data delivering devices, and wherein the filtered data series are forwarded to further processing entities. The method includes filtering, by the filtering entities, the data series by the following steps: collecting a data series including original information; reducing the original information of the data series based on and by at least one data reduction procedure to produce at least one set of reduced information of the data series; reconstructing the original information for the at least one set of reduced information of the data series; calculating a level of reconstruction for the reconstructed information based on a comparison between the reconstructed information and the original information for the at least one data reduction procedure; and determining reduced or non-reduced information of the data series to be forwarded based on a comparison between a desired level of reconstruction and the calculated level of reconstruction.
The present invention will be described in even greater detail below based on the exemplary figures. The invention is not limited to the exemplary embodiments. All features described and/or illustrated herein can be used alone or combined in different combinations in embodiments of the invention. The features and advantages of various embodiments of the present invention will become apparent by reading the following detailed description with reference to the attached drawings which illustrate the following:
A method and a system are described herein for filtering data series with enhanced efficiency in terms of storage, bandwidth and averaged data quality preferably in an internet-of-things or machine-to-machine systems.
A method and a system are described herein for filtering data series which can maintain the desired level for the reconstructability of the original data from a subset that has been forwarded and/or stored in a data center.
A method and a system are described herein for filtering data series that enhance control of the degree of information “loss” due to filtering independent of which filtering method is being applied.
Although applicable to any kind of systems, the present invention will be described with regard to data series in connection with the internet-of-things or machine-to-machine systems.
Although applicable in general to any type of data series, the present invention will be described with regard to time series of data.
In an embodiment, a method for filtering data series, preferably time series of data, prior to further processing, wherein the data series are collected by collecting entities and provided to one or more filtering entities from one or more data delivering devices, and wherein the filtered data series are forwarded to further processing entities is defined. The method is characterized in that the filtering by the filtering entities is performed by the steps of:
In an embodiment a system for filtering data series, preferably time series of data, prior to further processing, comprising one or more data delivering devices adapted to provide data series, one or more collecting entities adapted to collect said data series and to provide them to one or more filtering entities and wherein said one or more filtering entities are adapted to forward the filtered data series to further processing entities is defined. The system is characterized in that said one or more filtering entities are adapted to perform the steps of:
The term “reconstructability” can be understood as a degree to which an original data set can be reproduced from a reduced instance of the data set, preferably the original data set with missing points or values, the function that can be used to retrieve points of the data set, or a different data set that has been generated from a transformation of the original data set. It is preferably expressed as a percentage, based on system-specific matrix, etc.
The term “gateway” can be understood in its broadest sense, in particular as an entity at a network edge.
The term “entity” can be understood in its broadest sense, in particular an entity like a filtering entity can preferably also act as further processing entity and/or act as entity of another type, etc.
According to methods and systems described herein, it can be easily determined which data to forward, which to filter and which to cache based on a reconstructability of the data series, preferably of data points of time series.
According to methods and systems described herein, the efficiency in terms of storage, bandwidth and average data quality is enhanced, while simultaneously maintaining a predefined level for the reconstructability of the original data from the subset that has been stored, for example in a data center.
According to methods and systems described herein, filtering or compression techniques for time series can preferably applied before actually collecting and are replied upon preferably frontend samples.
According to methods and systems described herein, by using reconstructability levels, settings of the data reduction procedures can be translated into degrees of information loss.
According to methods and systems described herein, knowledge about the reconstructability of the data series can be related with decisions about settings of used data reduction procedures.
According to methods and systems described herein, data-agnostic data filtering with controlled degree of information loss can be enabled.
According to methods and systems described herein, a translation of settings of data reduction procedures into degrees of expected information losses can be provided. Further methods and systems described herein can relate the knowledge about reconstructability of data with decisions about settings of the used data reduction procedures. Further, methods and systems described herein can enable data agnostic data filtering with a controlled degree of information loss.
According to a preferred embodiment at least steps a)-d) are performed in irregular and/or regular time intervals, upon prespecified changes and/or appearance of prespecified values in data series. This enables in a simple and efficient way to trigger filtering and the analysis of incoming data series: When and how frequently the data is being (re)examined is determined. A simple timer may trigger the analysis in regular or irregular intervals. An event detector may trigger the analysis upon detection that certain prespecified values of the data series are changing and/or are exceeded a certain prespecified threshold. Another possibility is that the event detector may trigger the analysis upon appearance of certain prespecified values in the information of the data series that indicate a change of behavior. Of course any other procedure may alternatively or additionally be used.
According to a further preferred embodiment when collecting the data series the highest possible polling rate and/or the highest possible resolution is used. This enables to provide most actual and/or most precise data when collecting the data series, for example based on the available bandwidth of the communication between the data delivering devices and the filtering entities. Further precision of the reduction procedures is enhanced since the largest possible amount of data for later analysis can be used.
According to a further preferred embodiment reconstructability information is generated specifying for each data series and for each reduction procedure and for corresponding input values for said reduction procedures a value for the level of reconstruction. This enhances the flexibility to a great extent which reduced data shall be forwarded for further processing to the further processing entities.
According to a further preferred embodiment the reconstructability information are updated when steps a)-d) are performed. This enables providing most actual reconstructability information for deciding which data to be forwarded in what way.
According to a further preferred embodiment a reduction procedure is provided in form of a procedure reducing dimensionality and/or size of the data series and/or a generation of a function representing the data series. Dimensionality reduction is for example provided in sampling of each, every second, every fourth or no data point of a data series. Function-based representation of a reduction procedure, for example forwards only a function which represents the data “as good as possible”, for example only every second data point is used and a spline function is generated through every second data point and the function of said spline together with the corresponding data interval is forwarded for further processes providing efficient reduction procedures. Of course any other data reduction procedure can be used additionally or alternatively. Also applying of different reduction procedures sequentially is possible.
According to a further preferred embodiment the comparison according to step d) is performed on a similarity metric, preferably using Euclidian distance. This enables in a fast and efficient way to provide a comparison between the reconstructed information and the original information.
According to a further preferred embodiment the collecting entities are configured based on the operational status of said filtering entities. This enhances the flexibility while providing an optimum of communication between the filtering entities and the collecting entities.
According to a further preferred embodiment when the operational status of the filtering entities is dedicated for energy saving then the collecting entities are reconfigured such that only reduced information satisfying the desired level of reconstruction is collected. “Reduced” means here as much as needed to satisfy the reconstructability degree that has been requested. This reduces the collecting entity traffic and saves energy of the collecting entity.
According to a further preferred embodiment when the operational status of the filtering entities is dedicated for network resource saving then the collecting entities are reconfigured such that only reduced information satisfying the desired level of reconstruction are forwarded and preferably the collected information is cached in the filtering entity and/or in the collecting entity. This releases the network and storage demand equally to the reconfiguration of the collecting entities for energy saving and keeps more collected data in the cache which might be retrieved later. Therefore, the flexibility is further enhanced since data in the cache can be provided at any time if needed.
According to a further preferred embodiment when the operational status of the filtering entities is dedicated for network resource saving, the collected information are forwarded upon demand of the further processing entities in regular time intervals and/or never. “Upon demand” means that data can be eventually retrieved upon request. This preferably means that it may take time until the data is delivered for example to optimize bandwidth usage or because intermediate nodes are unreliable such that manual fetching of the data to the backend system is preferred. Another option is that big amounts of data will be sent at all to the backend system if not explicitly requested for instance. “In regular time interval” means that the data cached is copied, or i.e. transmitted to the backend system BS regularly with time intervals that are preferably much bigger than the data capture intervals. If they are forwarded never then the data series cached can only be used locally and might be dropped at any time.
The above mentioned entities in the gateway GW and the backend system BS are in the following described in more detail:
The Event Detector ED triggers an analysis of incoming time series, for example upon fulfillment of a custom condition. For triggering this, the Event Detector ED has a mechanism or procedure which determines when and how frequently the data is being re-examined in order to update the reconstructability table RT. This mechanism/procedure can be for example:
When the Calibrator C is triggered by the Event Detector ED (Step S1.1), then the following sub-steps are preferably performed:
Upon expiration of the time period T1, the Calibrator C uses the data collected during T1 to compute the reconstructability table RT.
Once the reconstructability table RT has been computed, two options may be performed:
Therefore, the “network-relieving-mode” has preferably three sub-modes:
In this case the gateway GW can be preferably operating either in the “energy-saving-mode” or in the “network-relieving-mode”.
Step S2.1 is preferably never interrupted, but it is dependent on the reconstructability table RT and on further system configuration settings, which can be modified when a new iteration of the entire Phase P1 takes place, triggered in Step S2.2.
Then, the reconstructability table RT is computed as follows: For each triple (t, r, v) where t ∈ TS, r ∈ RM, and v ∈ V1 ∪ V2 ∪ . . . ∪ VY, i.e., for each combination of a time series with a data reduction procedure and a value of this data reduction procedure the reconstructability degree p of the triple is measured. The computation of ρ can be based, for example, on the Euclidean distance between the vector of the original data and the vector of the reconstructed data. Similarly, the reconstruction might be performed with linear interpolation or any similar method. ρ is calculated as the degree to which the data of time series t that was collected during period T1 can be reconstructed after it has been reduced with method r using the value v.
Further two reduction procedures will be used:
Thus, the middle row of graphs of
Now, the Calibrator C:
In this example, it is assumed that the computed reconstructability degrees for 1:2-dimensionality-reduction and 1:4-dimensionality-reduction were 95% and 55%, respectively, while the reconstructability degrees for f(x) and g(x) were 80% and 70%, respectively:
In summary the present invention enables determination which data to forward, which to filter and which to cache based on the reconstructability of time series data points. Further the present invention enables using time series compression procedures or techniques before the time series are actually collected upon a frontend samples. The present invention further enables to apply a phase-change procedure based on an analysis of data streams comprising a calibration phase/operation phase and to trigger by the main specific events captures with the local data analytics.
The present invention preferably provides a method for filtering and forwarding of time series data in an internet-of-things environment based on data-reconstructability metrics comprising the steps of:
Embodiments of the present invention may have inter alia the following advantages: Embodiments of the present invention may enhance the efficiency in terms of storage, bandwidth and average data quality, preferably in an internet-of-thing system simultaneously maintaining the desired level for the reconstructability of the original data from the subset that has been stored in a data center.
While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive. It will be understood that changes and modifications may be made by those of ordinary skill within the scope of the following claims. In particular, the present invention covers further embodiments with any combination of features from different embodiments described above and below.
The terms used in the claims should be construed to have the broadest reasonable interpretation consistent with the foregoing description. For example, the use of the article “a” or “the” in introducing an element should not be interpreted as being exclusive of a plurality of elements. Likewise, the recitation of “or” should be interpreted as being inclusive, such that the recitation of “A or B” is not exclusive of “A and B,” unless it is clear from the context or the foregoing description that only one of A and B is intended. Further, the recitation of “at least one of A, B and C” should be interpreted as one or more of a group of elements consisting of A, B and C, and should not be interpreted as requiring at least one of each of the listed elements A, B and C, regardless of whether A, B and C are related as categories or otherwise. Moreover, the recitation of “A, B and/or C” or “at least one of A, B or C” should be interpreted as including any singular entity from the listed elements, e.g., A, any subset from the listed elements, e.g., A and B, or the entire list of elements A, B and C.
This application is a U.S. National Stage Application under 35 U.S.C. §371 of International Application No. PCT/EP2014/076899 filed on Dec. 8, 2014. The International Application was published in English on Jun. 16, 2016 as WO 2016/091278 A1 under PCT Article 21(2).
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2014/076899 | 12/8/2014 | WO | 00 |