This application claims the benefit of priority of European Patent Application No. 18382443.2 filed Jun. 20, 2018, the contents of which are all incorporated by reference as if fully set forth herein in their entirety.
The present invention has its application within the Information and Communications Technologies (ICT) sector, more specifically, relates to the deployment of prediction models that can be adjusted dynamically to the variations and evolution of data along the time.
More particularly, the present invention refers to a method and system for optimizing event prediction in data systems in order to minimize the amount of data to be transmitted and maximize prediction accuracy.
In many network and data center infrastructures, periodic data monitoring and processing phases are done in different and separate places. Therefore, non-negligible amounts of information have to be transmitted from the place in which data is gathered to the location where this data are processed later. Even in a data center with optical links, the frequency and size of the data can generate unapproachable bandwidth requirements during data transmission or at least a highly inefficient consumption of network resources.
Nowadays, there exist many efficient solutions for predicting, encoding and transmitting data and in particular multimedia signals (e.g. images and voice). Some references found in the prior art are the following:
U.S. Pat. No. 4,953,024 is related to the family of mechanisms for efficiently encoding and transmitting images, aimed at reducing the amount of information to be transmitted for pictures. The reduction is achieved in an encoding circuit that is placed at the output of a predictor component. No mention is made to the efficiency or accuracy of predictor mechanisms and so, a generic predictor is assumed, relaying the effectiveness of the solution exclusively to the encoding phase. In particular, and with respect to other adaptive quantization methods, this solution emphasize in reducing the produced quantization errors with optimally dimensioned variable length encoders.
U.S. Pat. No. 5,907,351 is related also to the family of mechanisms for efficiently encoding and transmitting images, transmitting and remotely displaying the audio and visual portion of a person speaking so that the audio and visual signals can be synchronized. In this approach, the audio signal is constantly transmitted to the receiver and is also used to create or encode a predicted image of the lips of the person speaking in the image. This technique is referred to as cross modal or bi-modal encoding. The predictor module of this solution tries to predict lips movements in order to avoid transmitting them, by using previous phoneme spoken as input.
Prior-art solutions do not address the dynamic adaptation of predictors to the evolution of data. Additionally, should several signals be monitored in the same machine, current predictors tend to consider each variable as an isolated signal and so potential correlations emerging among locally generated variables are not exploited.
Therefore, providing that it cannot be assumed a fixed data distribution for the signals monitored in a data center, there is a need in the state of the art for designing data systems with predictors that can detect data evolution and dynamically adapt predictions to new data patterns and distributions without manual intervention.
The present invention solves the aforementioned problems and overcomes previously explained state-of-art work limitations by providing a method and system to optimize prediction components in data systems. This invention proposes to minimize the amount of data to be transmitted in data systems by deploying prediction components that can be adjusted dynamically to the variations and evolution of data along the time. Both the source (where data is collected) and the destination (where data is processed) execute identical prediction models. In case the prediction obtained at an instant of time in the source is similar to the monitored data, no data are sent from the source to the destination, as these data will be generated in the destination using a prediction module. Otherwise just the difference between the predicted data and the monitored data is sent from the source to the destination. When data evolve, the predictor can adjust dynamically its internal parameters in order to maximize prediction accuracy.
This invention can be leveraged by the application of the recently emerged deep neural network architectures and in particular convolutional neural networks (CNNs) that clearly outperform traditional time-series forecasting models when trained with big amounts of data examples.
A first aspect of the present invention refers to a method for optimizing event prediction in data systems, wherein at least one source of a data system collects periodically, at different time instants, real data values of at least one variable, the collected values being used to generate a stream of data modeled as a time series of values and each value of the time series corresponding to a time instant, and wherein the stream of data is obtained by a destination of the data system, the method comprising the following steps:
A second aspect of the present invention refers to a system configured to implement the optimization method described before by comprising the following components:
The method and system in accordance with the above described aspects of the invention has a number of advantages with respect to the aforementioned prior art, which can be summarized as follows:
These and other advantages will be apparent in the light of the detailed description of the invention.
For the purpose of aiding the understanding of the characteristics of the invention, according to a preferred practical embodiment thereof and in order to complement this description, the following Figures are attached as an integral part thereof, having an illustrative and non-limiting character:
The matters defined in this detailed description are provided to assist in a comprehensive understanding of the invention. Accordingly, those of ordinary skill in the art will recognize that variation changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, description of well-known functions and elements are omitted for clarity and conciseness.
Of course, the embodiments of the invention can be implemented in a variety of architectural platforms, operating and server systems, devices, systems, or applications. Any particular architectural layout or implementation presented herein is provided for purposes of illustration and comprehension only and is not intended to limit aspects of the invention.
The proposed method dynamically optimizes the accuracy of the predictor apparatus when this accuracy starts to decrease due to variations in the statistical distribution of the input time series. The dynamic optimization of the predictor is done by means of a process that adjusts its internal parameters without manual intervention by using as input the last ‘p’ observed values of the time series variable. This process can be implemented by iteratively optimizing an error function that measures the committed error 320 as the distance between the obtained prediction and the expected result. It should be noted that the optimization of the predictor apparatus is only triggered when the last ‘k’ errors become greater than a second predefined threshold ‘th’, which is greater than the previously mentioned first threshold. The main advantage of keeping the predictor apparatus dynamically optimized on evolutive data is that the amount of data to be transmitted to the destination 200 is minimized. A complementary advantage of this invention is that it is agnostic with respect to the encoding schema to be used. Therefore, many of the existing methods for efficiently encoding time series differences can be utilized at the output of the predictor apparatus. Furthermore, the grouped processing of time series variables generated in the same place (e.g. CPU load, network Input/Output activity and mainboard temperature in a physical machine) raises prediction accuracy for each variable by taking advantage of the potentially hidden correlations that can exist among the aggregated variables and signals.
The source 100 starts to collect 101 data values of a variable (e.g. CPU temperature) at periodic time intervals and the collected data are transferred 104 to the first forecast module 120. Each time instant ‘t’ a data value “v” is collected, the generated prediction module M1, M2, M3, . . . Mx received 102 and used by the first forecast module 120 generates a prediction “p” of this value at instant “t” using as input “r” consecutive previous values of the variable. As long as the prediction “p” is equal to the real value “v”, the source 100 does not send any data to the destination 200. Conversely, if both values “p” and “v” differ more than a predefined threshold, this difference “d” is coded and sent 105 to the destination 200. Additionally, when the source 100 detects that the accuracy of the current prediction model Mi decreases below the second threshold, a readjusting process is triggered in order to generate a new next model Mi+1 that increases the accuracy of the predictions (i.e. decreases the difference between p and v). This allows a dynamic adaptability of predictions when data evolves and hence, the number of bytes to be encoded is kept near to the minimum theoretical error. This process does not happen for each prediction, because it would require a non-negligible amount of computational resources (to readjust the model) and bandwidth consumption (model transfer to the server), but in a programmatic way only when the errors exceed the second threshold.
At each time interval t, the destination 200 obtains 201 the prediction “p” from the second forecast module 210. In case the destination 200 receives 202 a difference “d” from the source 100, a correction module 220 combines “d” and “p” values to obtain 203 the real “v” value as output by executing the inverse operation that the source 100 computed when obtaining “d” from ““v” and “p” as input, e.g. d=v-p. Otherwise, if the destination 200 does not receive any difference from source 100, the correction module 220 uses its own local prediction “p” from the second forecast module 210 to obtain 203 the real “v” value output, i.e. v=p, where “v” is the value monitored in the source 100.
A predictor model could enhance the accuracy of its predictions adding to its input other (additional) variables that might exhibit complex correlations with the variable to be predicted. Additional variables may be obtained from outside the source 100 within the data system. The process to adjust model parameters is more complex when a set of variables is utilized as input, but the final accuracy is benefited from this aggregation. The exogenous variables can be input to the predictor as time series or as simple variables. For example, when predicting a percentage value of the CPU load variable, the predictor model can consider as input not only a time series of CPU percentage values but also another time series such as RAM memory usage and network I/O (number of packets transmitted and received). It is expected that the three variables (CPU percentage, RAM memory usage and network I/O) can affect each other and so, these complex correlations can be exploited by utilizing three time series (ts1: cpu percentage, ts2: memory percentage, and ts3: network I/O) as input to the predictor model instead of a single ts1 time series. A further example is attack prediction in cybersecurity scenarios, wherein generating the predicted value 310 of a variable whose anomalous behavior is trying to be detected may use additional variables obtained from outside the (vulnerable) system which is susceptible to cyberattack, e.g., external variables to be input into the predictor may be obtained from the logs of a web server or from a firewall external to the vulnerable system. Generally speaking, an “extended telemetry” may be applied by using as input to the predictor model all the variables/data that are likely to have some influence on the variable to be predicted.
Existing variable length code schemas can be applied to efficiently send 105 the difference “d” from a client or source 100 to the server or destination 200. In addition, a plurality of variables can be locally monitored at the source 100 and sent together to the destination 200 using this schema.
In order to achieve the dynamic predictor adjustment, predictor accuracy is continuously measured in the source 100 by computing the error distance between the collected data values and the predicted ones. When errors become greater that the second predefined threshold value for a sustained period of time, an iterative optimization process is triggered for adjusting predictor model parameters. This iterative process fits these parameters using as input the last observed values of the time series including the ones that are producing the inaccurate predictions. The cost function to optimize at each iteration is computed as the mean of the distances between each real and predicted value. At each iteration the slope of the cost function is computed for each model parameter, and then each parameter is updated at the end of the iteration process by subtracting a percentage of the computed slope to it. The rationale of this update is to obtain a new parameter value that generates a lower cost. When a zero value is obtained for the slope, it means that this parameter is producing a minimum value in the cost function and therefore there is no room for further improvements in the parameter. The selected percentage value that is applied to each slope value modulates the converge speed of the process. The iterative process ends when a fixed number of iterations have been reached or when the cost value obtained at the end of the current iteration does not improve with respect to the previous iteration.
Providing that the signal values to be predicted are likely to contain complex non-linear dependencies between present and past events, traditional time series techniques (e.g. autoregressive integrated moving average model or ARIMA, generalized autoregressive conditional heteroskedasticity or GARCH) do not obtain accurate predictions. Even traditional machine learning (ML) models (e.g. support vector regression or SVR, Random Forest) are not likely to benefit from such dependencies. However, deep neural networks and in particular convolutional neural networks have the ability to model non-linear relationships in input data and in particular in time series. In the context of data center scenarios, convolutional neural networks are capable of predicting with accuracy complex time series (e.g. short term forecasting of traffic load in data center core networks).
Distributed versions of these optimization algorithms that have recently emerged can be used to this purpose. In addition, the same CNN model can be adjusted only one time and shared by a set of clients monitoring data with the same statistical distribution. For example, if working with a farm of similar machines with the same manufactured hardware, the same software running and equivalent computer load (quite common in Cloud and data-centers environments), it is expected a similar behaviour of the disk space, CPU load, CPU temperature, etc. Adjusting the model only one time, and exporting to multiples clients can simplify the management and optimize the use of computational resources in similar machines.
Note that in this text, the term “comprises” and its derivations (such as “comprising”, etc.) should not be understood in an excluding sense, that is, these terms should not be interpreted as excluding the possibility that what is described and defined may include further elements, steps, etc.
Number | Date | Country | Kind |
---|---|---|---|
18382443.2 | Jun 2018 | EP | regional |