As the world become increasingly complex, networks offer an abstract representation for organizing the relationships between entities of interest in distributed systems. The entities are represented as nodes, while edges connecting pairs of nodes represent the existence of relationships between the entities. In these distributed systems, a functional network that facilitates reliable and consistent flow of entities through the edges is necessary for the distributed system to achieve its objectives. The building blocks of the distributed systems can deteriorate non-uniformly over time, leading to occasional anomalous behavior in certain parts of the system.
Anomalies in a distributed system can disrupt normal operations and prevent the distributed system from meeting its objectives in a timely manner. Some anomalies are critical anomalies, which lead to catastrophic failures and cause major disruptions to a distributed system. Accordingly, critical anomalies are highly noticeable by the stakeholders of the distributed system and are thus quickly identified and localized for corrections to restore the functions of the distributed system. Other anomalies are non-critical anomalies, which may result in a lower than optimal efficiency of a distributed system. Since the distributed system can continue to function without corrections to these non-critical anomalies, these non-critical anomalies are often ignored and its location unknown within the distributed systems. However, non-critical anomalies that are ignored and not corrected could aggravate over time into critical anomalies and cause catastrophic failures to the distributed systems in the unforeseeable future.
Accordingly, the systems and methods described herein aim to recognize the non-critical anomalies in a distributed systems. An example anomaly detection system can include a non-transitory memory to store machine readable instructions and a processing resource (e.g., one or more processor cores) to execute the machine readable instructions. A receiver can receive network data. A statistical model component that employs a statistical model of the network based on the network data to determine a statistical deviation of a flow. A statistically deviated flow component can discover a number of statistically deviated flows connected to the flow. An output can specify a location and a strength of an anomaly in the distributed system.
As an example, the anomaly detection system 10 can detect an anomaly in the distributed system 28 in a non-intrusive manner. A network 18 can connect the distributed system 28 and the anomaly detection system 10. The network 18 can include wired connections and/or wireless connections. In some examples, the anomaly detection system 10 can be part of the distributed system 28. In other examples, the anomaly detection system 10 can be external to the distributed system 28. For example, the anomaly detection system 10 can be executed by a server or other computing device.
The anomaly detection system 10 can include a non-transitory memory 12 to store machine-executable instructions. Examples of the non-transitory memory 12 can include volatile memory (e.g., RAM), nonvolatile memory (e.g., a hard disk, a flash memory, a solid state drive, or the like), or a combination of both. The anomaly detection system 10 can include a processing unit 14 (e.g., one or more processing cores) to access the non-transitory memory 12 and execute the machine-executable instructions to implement functions of the anomaly detection system 10 (e.g., to detect an anomaly in the distributed system 28). In some examples, the anomaly detection system 10 can also include a display 16 (e.g., a monitor, a screen, a graphical user interface, speakers, etc.) that can illustrate the anomaly in the distributed system 28 in a user-perceivable manner. In some examples, although not illustrated, the anomaly detection system 10 can also include a user interface that can include a user input device (e.g., keyboard, mouse, microphone, etc.). The anomaly detection system 10 can be coupled to the network 18 to exchange data with the distributed system 28 via a transceiver (Tx/Rx) (not illustrated). In some examples, the transceiver can send a request for information to one or more components of the distributed system 28 and/or an external component coupled to the network including information for nodes of interest in the distributed system 28 for further processing by the anomaly detection system 10. The transceiver can receive the information over the network 18. In some instances, the information can include the information for the nodes of interest in the distributed system 28.
The anomaly detection system 10 can include a receiver to receive network data related to the distributed system 28. The network data can include the information received over the network 18 requested by the transceiver. For example, the receiver 20 can perform preprocessing of the information received over the network 18. In some examples, the network data can include source points and end points of a plurality of flows in the distributed system. In other examples, the network data can include times associated with a portion of the flows.
The anomaly detection system 10 can also include a statistical model component 22 that can employ a statistical model of the network based on the network data. For example, the statistical model component 22 can determine a statistical deviation of a flow of the plurality of flows. The statistical model component 22 can apply a statistical model that can use all the available information in the data with assumptions from domain and contextual knowledge of the flow to infer the missing information during the flow. Using the statistical model, the information that should be observed at the destination of the flow can be estimated in terms of its mean and variance. By comparing the observation with the estimation (mean and variance), it can reveal whether a flow is statistically deviated or not.
The anomaly detection system 10 can also include a statistically deviated flow component 24 that can, for each flow in the data, discover a number of statistically deviated flows from the plurality of flows connected to the flow. The determination can be based on a time and a location related to each statistically deviated flow. The statistically deviated flow component 24 can address the insufficiency of statistical deviations as sole indicators of anomalies by finding relations between flows (e.g., by examining flows connected to a flow). In other words, in addition to the statistical deviation of each flow, for each flow a number of statistically deviated flows connected to the flow can be derived. The derivation depends on the context and nature of the distributed system. For example, the relation can be defined in terms of the time and the physical location of the flow. An indication of whether the flow is an anomaly can be obtained by positively correlating to the number of statistically deviated flows that are related to the flow. Using the end (source and destination) points of an anomalous flow, the physical location of the anomaly within the distributed system can be isolated.
The anomaly detection system 10 can also include an output 26 that can output the location and strength of the anomalies in the distributed system can be output. For example, the strength of the anomalies can be a quantification (e.g., a number of standard deviations from a mean) of an amount of disruption caused to the network by the anomalies with respect to the other anomalies. As one example, the output can include a plurality of flows with the associated location and strength of the anomaly for each of the plurality of flows. As another example, the output can include a single flow with the associated location and strength of the anomaly for the flow. In either example, the output can be displayed (e.g., by display 16 or on another computing device) so that further actions can be undertaken.
The anomaly detection system 10 of
For example, the statistical model component 22 of the anomaly detection system of
Building on the network transmission model, the statistically deviated flow component 24 of anomaly detection system 10 of
For example, an edge-based network transmission model can be used to infer the flow speeds of the edges within the networks of distributed systems. With the model, the expected time necessary for an entity to complete its flow can be determined. The localization algorithm can be applied to measure the relationship of each record to all other records with large deviations. For example, a record can be deemed anomalous by comparing the difference between the observed time and the expected time with the standard deviation (e.g., measuring the degree of deviation). In some examples, a value (e.g., one or more standard deviations) may be selected as a cut-off to determine whether the path has a significantly larger observed time than expected. The number of related records can allow the exact path taken by the entity flow to be known or easily inferred.
In view of the foregoing structural and functional features described above, example methods will be better appreciated with reference to
The method 70 can include two phases. The first phase, at 72, can include a statistical model (e.g., applied by statistical model component). The statistical model can use all the available information in the data with assumptions from domain and contextual knowledge of the flow to infer the missing information during the flow. Using the statistical model, the information that should be observed at the destination of the flow can be estimated in terms of its mean and variance, for example. By comparing the actual observation with the estimation (mean and variance), it can reveal whether a flow is statistically deviated or not.
The second phase, at 74, can address (e.g., by statistically deviated flow component 24) the insufficiency of statistical deviations as sole indicators of anomalies by finding relations between flows (e.g., by examining flows connected to a flow). In other words, in addition to the statistical deviation of each flow, for each flow a number of statistically deviated flows connected to the flow can be derived. The derivation depends on the context and nature of the distributed system. For example, the relation can be defined in terms of the time and the physical location of the flow. An indication of whether the flow is an anomaly can be obtained by positively correlating to the number of statistically deviated flows that are related to the flow. Using the end (source and destination) points of an anomalous flow, the physical location of the anomaly within the distributed system can be isolated. At 76, information about the physical location of the anomaly and/or the strength of the anomaly may be output (e.g., by output 26).
At 92, the information that should be observed at the destination of the flow can be estimated. At 94, the actual information that was observed at the destination of the flow can be determined. The missing information during the flow can be inferred. The inference can be completed using the statistically model. For example, the statistical model can use all the available information in the data with assumptions from domain and contextual knowledge of the flow to infer the missing information during the flow. For example, the inference can be in terms of the mean and variance. At 96, whether the flow is statistically deviated can be determined. For example, the observation can be compared to the estimated mean and variance to determine whether a flow is statistically deviated or not.
At 102, a number of statistically deviated flows related to a flow can be determined (e.g., by statistically deviated flow component 24). For example, a plurality of flows connected to a flow can be examined. In other words, in addition to the statistical deviation of each flow, for each flow a number of statistically deviated flows connected to the flow can be derived. The derivation depends on the context and nature of the distributed system. In some examples, a time and a location related to each statistically deviated flow can be determined. For example, the relation can be defined in terms of the time and the physical location of the flow.
At 104, an indication of whether the flow is an anomaly can be obtained (e.g., by statistically deviated flow component 24). For example, an indication of whether the flow is an anomaly can be obtained by positively correlating to the number of statistically deviated flows that are related to the flow. In some examples, the indication of whether the flow is an anomaly is based on the number of statistically deviated flows that are related to the flow. Using the end (source and destination) points of an anomalous flow, the physical location of the anomaly within the distributed system can be isolated.
At 106, the indication of whether the flow is an anomaly can be output (e.g., by output 26). For example, the output can include a location of the anomaly and the strength of the anomaly. As one example, the output can include a plurality of flows with the associated location and strength of the anomaly for each of the plurality of flows. As another example, the output can include a single flow with the associated location and strength of the anomaly for the flow. In either example, the output can be displayed (e.g., by display 16 or on another computing device) so that further actions can be undertaken.
What have been described above are examples. It is, of course, not possible to describe every conceivable combination of components or methods, but one of ordinary skill in the art will recognize that many further combinations and permutations are possible. Accordingly, the disclosure is intended to embrace all such alterations, modifications, and variations that fall within the scope of this application, including the appended claims. Additionally, where the disclosure or claims recite “a,” “an,” “a first,” or “another” element, or the equivalent thereof, it should be interpreted to include one or more than one such element, neither requiring nor excluding two or more such elements. As used herein, the term “includes” means includes but not limited to, and the term “including” means including but not limited to. The term “based on” means based at least in part on.