The disclosure relates to analysis of time series data in general and more specifically to using neural networks for identification of anomalies in time series data.
Time-series data is generated and processed in several contexts. Examples of time series data include sensor data, data generated by instrumented software that monitors utilization of resources such as processing resources, memory resources, storage resources, network resources, application usage data, and so on. Anomaly detection is typically performed to identify issues with systems that generate time series data. For example, anomalies in computing resource utilization may be an indication of server failure that is likely to happen in near future. Similarly, anomalies in network resource utilization may be an indication of network failure that is likely to happen in near future. Accurate and timely detection of anomalies in time series data allows such failures to be predicted in advance so that preventive actions can be taken.
Various techniques are used for anomaly detection including, clustering analysis, random forest techniques, and machine learning based models, for example, neural networks. Conventional neural network based techniques for anomaly detection require large amount of training data and significant computing resources for training the neural network. Furthermore, if the characteristics of the time series data being analyzed are different from the time series data used for training the neural networks, the neural network based techniques have low accuracy.
The disclosed embodiments have other advantages and features which will be more readily apparent from the detailed description, the appended claims, and the accompanying figures (or drawings). A brief introduction of the figures is below.
The Figures (FIGS.) and the following description describe certain embodiments by way of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein. Reference will now be made in detail to several embodiments, examples of which are illustrated in the accompanying figures.
A system performs anomaly detection for time series data using machine learning based models, for example, neural networks. The system trains the neural network for a fixed number of iterations using data from a time window of the time series. The system uses the loss value at the end of the fixed number of iterations for identifying anomalies in the time series data. The loss value may represent a difference between the predicted data value and the actual data value of the time-series corresponding to the time value. For example, the system compares the loss value to a predetermined threshold value. The system uses the loss value to adjust parameters of the neural network, for example, using back propagation. The system also uses the loss values determined during the training phase to determine whether a data point of the time series represents an anomaly. If the loss value for a time point exceeds the threshold value, the system determines that the time value corresponds to an anomaly. In an embodiment, the anomaly is a point anomaly. The system performs the above steps for a new time interval. The system reinitializes the neural network for the new time interval and repeats the above steps.
Conventionally a neural network is trained using training dataset and the trained neural network is used at inference time to predict results. In contrast, the system according to various embodiments, determines anomalies during the training phase rather than through use of a trained neural network for making predictions.
The system trains the neural network using data within a time window and detects anomalies for data points within the time window based on the loss value determined during the time window. Conventional systems train a neural network until convergence, for example, until the loss value reaches below a threshold value. In contrast, the system according to various embodiments trains the neural network for a fixed number of iterations. After the fixed number of iterations, the system compares the loss values for data points within the time window with a threshold value. The system identifies data points having loss value exceeding the threshold value as anomalous data points. The system repeats the process for subsequent time intervals.
When the process is repeated for the next time window, the system discards the neural network trained using data of the previous time window. Accordingly, for each time window the system reinitializes the neural network, for example, using random values. Accordingly, the system does not train the neural network for future use as a predictor at inference time. The system simply runs the training process for using the loss determined during the training process for identifying point anomalies. After the anomalies are detected for a time window during the training phase of the neural network, the system discards the neural network and reinitializes the neural network using random values for the next time interval.
Furthermore, the system performs training of the neural network for a fixed number of iterations. Conventionally, the training of a neural network is performed until some convergence criteria is met, for example, until a loss value is below a threshold indicating convergence is reached. The system does not attempt to reach convergence but uses the training dynamics to determine anomalies. Accordingly, the system does not aim to generate a fully trained neural network.
As a result, the process used for detecting anomalies in time series data is computationally efficient as the neural network is trained only for a few iterations and not until convergence. The accuracy of the techniques disclosed is either better or at least as good as other techniques that fully train the neural networks. Accordingly, the system achieves high accuracy with fewer computational resources. Therefore, the disclosed techniques improve the computational efficiency of the process of detecting anomalies in time series data and provide a technological advantage over conventional techniques.
The computing system 130 includes a time series processing module 140, a listener module 145, and an action module 160. The listener module 145 receives time series data 135 from one or more sources, for example, external systems 120. The time series processing module 140 performs anomaly detection on the time series data 135 to detect anomalies, for example point anomalies 155. The action module 160 takes an action based on the detected anomaly 155, for example, by sending an alert message to a user or taking an automated remedial action. In some embodiments, the computing system 130 itself may be the source of time series data.
Anomaly detection may be performed for system maintenance, for example, to detect system problems in advance. For example, anomalies in computing resource utilization may be an indication of server failure that may happen in near future. Similarly, anomalies in network resource utilization may be an indication of network failure that may happen in near future. Therefore, accurate and timely detection of anomalies if important for such time series data analysis.
The computing system 130 receives time series data 135 from sources, for example, from the external system 120. For example, the external system 120 includes computing resources 125 that generate time series data. Examples of computing resources including memory resources, processing resources, storage resources, network resources, and so on. The external system 120 may execute instrumented software that generates time series data representing resource usage of one or more resources. For example, the external system 120 may execute instrumented software that monitors the network usage and reports metrics indicating network usage on a periodic basis. The reported data represents a time series 135 that is received by the computing system 130. The time series processing module 140 may detect anomalies 155 that represent potential issues with a computing resource, for example, a potential failure that is likely to occur. The action module 160 may take appropriate action responsive to detection of the anomaly 155, for example, by sending an alert to a system administrator or by taking an automatic remedial action, for example, by allocating additional computing resources for a task or process if the system determines that the anomaly 155 indicates shortage of a particular computing resource allocated for the task or process. For example, the computing system 130 may determine that a point anomaly detected in a time series representing network usage indicates lack of sufficient network resources for a communication channel, the action module 160 may reallocate network resources to provide additional network bandwidth to the communication channel. As another example, the time series data may represent a number of pages swapped by a process and the anomaly 155 may be caused by an increase in the number of pages swapped indicating a shortage of storage resources. The action module 160 in response to detection of the anomaly 155 may allocate additional storage to the process.
Time series data 135 may be reported by other sources for example, sensors that monitor some real-world data and report it on a periodic basis, for example, temperature, pressure, weight, light intensity, and so on. For example, a sensor may monitor temperature or pressure of an industrial process that performs a chemical reaction and report it on a periodic basis as time series data 135. The action module 160 may perform an action that controls the industrial process in response to detection of the anomaly 155, for example, by controlling the industrial process to adjust the rate the chemical reaction.
The time series data 135 may represent user actions, for example, user interactions with an online system. For example, the computing system 130 may monitor user interactions with an online system to detect anomalies in the user interaction. The point anomaly may be an indication of a change in user behavior or an issue with the online system receiving the user interactions. The action module 160 may take appropriate action based on detection of a point anomaly 155, for example, by sending an alert message to a user. The alert message may provide a recommendation of an action that the user may take to adjust the online system parameters in response to the anomaly detection. For example, if the anomaly 155 is determined as an indication of increase in demand for a specific product, the online system may initiate an online campaign for the product to provide additional users with information describing the product.
The client devices 110 are computing devices such as smartphones with an operating system such as ANDROID® or APPLE® IOS®, tablet computers, laptop computers, desktop computers, electronic stereos in automobiles or other vehicles, or any other type of network-enabled device on which digital content may be listened to or otherwise experienced. Typical client devices 110 include the hardware and software needed to connect to the network 150 (e.g., via Wifi and/or 4G or other wireless telecommunication standards).
The network 150 provides a communication infrastructure between the client devices 110, external systems 120, and computing system 130. The network 150 is typically the Internet, but may be any network, including but not limited to a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a mobile wired or wireless network, a private network, or a virtual private network. Portions of the network 150 may be provided by links using communications technologies including WiFi based on the IEEE 802.11 standard, the BLUETOOTH short range standard, and the Wireless Universal Serial Bus (USB) standard.
In an embodiment, the neural network 320 is a multi-layer perceptron.
The loss determination module 340 determines a loss value based on the predictions of a neural network being trained. The loss value represents a difference between the predicted data value and the corresponding known data value of the time-series corresponding to the time value. For example, if the time series data value is D1 for a time value T1, the neural network predicts a value D1′ and determines the loss value as based on a difference between D1′ and D1. The loss value may be determined using any of various possible metrics, for example, root mean square, mean absolute value, and so on.
The training module 330 performs the training process of a neural network. The training module 330 initializes the neural network, for example, by setting the parameters of the neural network to random values. The training module 330 predicts values of the time series data using the neural network and determines the loss values by invoking the loss determination module 340. The training module 330 adjusts the parameters of the neural network based on the loss value, for example, using back propagation, to minimize the loss value.
The threshold determination module 350 determines the threshold value used for determining the point anomalies. The anomaly detection module 310 compares loss values of the neural network with the threshold value determined by the threshold determination module 350 to determine whether a point anomaly exists at a time value in the time series data. In an embodiment, the threshold determination module 350 adjusts a threshold value based on comparison of anomalies identified and known anomalies. For example, the point anomalies may be presented to a user to receive feedback describing whether the point anomalies were identified accurately.
If the anomaly detection module 310 receives feedback indicating that one or more known point anomalies were not detected by the anomaly detection module 310, the threshold value may be reduced so that point anomalies similar to the missed point anomalies can be identified subsequently. If the anomaly detection module 310 receives feedback indicating that one or more known point anomalies were identified by the anomaly detection module 310 but were not actual point anomalies, the threshold value may be increased so that point anomalies similar to the spurious point anomalies previously identified get filtered out subsequently and are not detected. The threshold determination module 350 uses the adjusted threshold value for identifying anomalies for subsequent time windows.
The time series processing module 140 receives 610 a time-series comprising a sequence of data values. Each data value of the time series is associated with a time value. The time series processing module 140 processes different time windows of the time series data to determine point anomalies in each time window. A time window represents a range of time values. Accordingly, the time series processing module 140 trains a neural network based on data values of the time window and detects point values within the time window based on the loss values determined during the training phase of the neural network based on the time series data of the time window. The time series processing module 140 repeats the following steps for each time window.
The time series processing module 140 identifies 620 a time window representing a range of time values. The time series processing module 140 initializes the neural network 630 for the time window. The time series processing module 140 trains the neural network for a predetermined number of iterations, by repeating the following steps 660, 670, and 680. For each iteration, the time series processing module 140 repeats the steps 660 and 670 for time values within the time window. For a time value of the time window, the time series processing module 140 executes 660 the neural network to predict a data value for the time value. The time series processing module 140 determines 670 a loss value based on the predicted data value. After repeating the steps 660 and 670 for a set of time values of the time window, the time series processing module 140 determines an aggregate loss value across the set of time values.
The time series processing module 140 adjusts 680 parameters of the neural network based on the aggregate loss value. The steps of determining the aggregate loss value and adjusting the parameters of the neural network are repeated for each iteration. After the predetermined number of iterations, the time series processing module 140 identifies 690 point anomalies in the time window as follows. If a loss value corresponding to a particular time value within the time window exceeds a threshold, the time series processing module 140 identifies the corresponding data value as an anomaly. The computing system 130 may store information describing the data values identified as point anomalies. The action module 160 may take actions based on the identified point anomalies, for example, sending an alert message to a user describing a point anomaly, sending the information describing the point anomaly for displaying via a user interface, recommending a remedial action based on the point anomaly, or performing a remedial action based on the point anomaly.
The time series processing module 140 initializes the neural network for each time window and discards the neural network at the end of processing of the time window. The neural network may be initialized by setting the parameter values of the neural network to random values. Accordingly, the time series processing module 140 performs the steps of training the neural network but does not use the trained neural network for any processing. The time series processing module 140 uses the loss value determined during the training process to detect point anomalies in the time series data of the time window and then repeats the process for the next time window. Furthermore, the time series processing module 140 trains the neural network for a predetermined number of iterations rather than training the neural network until an aggregate loss value is below a threshold value. The predetermined number of iterations may be configurable by a user or set to a default value. The predetermined number of iterations is set to a value that is less than the number of iterations required to ensure that the aggregate loss value reaches below a threshold value. This ensures that the anomaly detection process is executed efficiently since the goal of the time series processing module 140 is not to generate a trained model that can be used at inference time for making predictions but only to go through a partial training process so that the loss value during the partial training process can be used to identify the point anomalies.
Experimental data shows improvement in performance obtained by using the techniques disclosed herein. The following table shows F1 scores obtained by executing various models on different datasets. The F1 score is calculated as F1=2*precision*recall/(precision+recall). Each column represents a particular dataset and each row represents a particular model. The disclosed techniques were compared against other models including WinStats, ISF, RRCF (robust random cut forest), and Prophet. The first row represents the data for a system according to an embodiment as disclosed and the remaining rows represent other models that do not use the techniques disclosed, for example, (1) WinStats (Window statistics) a technique that uses statistics of data in the time series to determine which specific points are anomalous, (2) ISF (isolation forest): a technique based on decision tree algorithm, (3) RRCF (robust random cut forest): a technique similar to isolation forest but modified to work on streaming data, and (3) Prophet: a regression model based approach.
As shown in the table above, the F1 scores of the system based on the disclosed techniques performed either better than all the models tested or close to the best model although the system predicts the anomaly without requiring any retraining. For example, the “average” column at the end represents the average performance of all the models for various data sets and shows that the average performance of the system as disclosed is better than all the models that were studied. Accordingly, the system disclosed is efficient computationally since it requires significantly fewer computing resources used in training of the model compared to other techniques while performing at least as well as the other techniques or better. The average performance of the disclosed techniques across all data sets was better than all the other techniques tested.
The storage device 708 is a non-transitory computer-readable storage medium such as a hard drive, compact disk read-only memory (CD-ROM), DVD, or a solid-state memory device. The memory 706 holds instructions and data used by the processor 702. The input interface 714 is a touch-screen interface, a mouse, track ball, or other type of pointing device, a keyboard, or some combination thereof, and is used to input data into the computer 700. In some embodiments, the computer 700 may be configured to receive input (e.g., commands) from the input interface 714 via gestures from the user. The graphics adapter 712 displays images and other information on the display 718. The network adapter 716 couples the computer 700 to one or more computer networks.
The computer 700 is adapted to execute computer program modules for providing functionality described herein. As used herein, the term “module” refers to computer program logic used to provide the specified functionality. Thus, a module can be implemented in hardware, firmware, and/or software. In one embodiment, program modules are stored on the storage device 708, loaded into the memory 706, and executed by the processor 702.
The types of computers 700 used by the entities of
It is to be understood that the Figures and descriptions of the disclosed invention have been simplified to illustrate elements that are relevant for a clear understanding of the present invention, while eliminating, for the purpose of clarity, many other elements found in a typical distributed system. Those of ordinary skill in the art may recognize that other elements and/or steps are desirable and/or required in implementing the embodiments. However, because such elements and steps are well known in the art, and because they do not facilitate a better understanding of the embodiments, a discussion of such elements and steps is not provided herein. The disclosure herein is directed to all such variations and modifications to such elements and methods known to those skilled in the art.
Some portions of above description describe the embodiments in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Furthermore, it has also proven convenient at times, to refer to these arrangements of operations as modules, without loss of generality. The described operations and their associated modules may be embodied in software, firmware, hardware, or any combinations thereof.
As used herein any reference to “one embodiment” or “an embodiment” means that a particular element, feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
Some embodiments may be described using the expression “coupled” and “connected” along with their derivatives. It should be understood that these terms are not intended as synonyms for each other. For example, some embodiments may be described using the term “connected” to indicate that two or more elements are in direct physical or electrical contact with each other. In another example, some embodiments may be described using the term “coupled” to indicate that two or more elements are in direct physical or electrical contact. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other. The embodiments are not limited in this context.
As used herein, the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having” or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Further, unless expressly stated to the contrary, “or” refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).
In addition, use of the “a” or “an” are employed to describe elements and components of the embodiments herein. This is done merely for convenience and to give a general sense of the invention. This description should be read to include one or at least one and the singular also includes the plural unless it is obvious that it is meant otherwise.
Upon reading this disclosure, those of skill in the art will appreciate still additional alternative structural and functional designs for a system and a process for displaying charts using a distortion region through the disclosed principles herein. Thus, while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims.