DEVELOPMENT
This invention was developed with support in part from the National Science Foundation grants numbered ANI-9986397 and CCR-0325701.
A network anomaly is an unusual event in a network that is of interest to an entity such as a network provider, a network user, a network operator, or a law enforcement agency. A network anomaly may be created unintentionally as a result of normal network traffic conditions, such as a breakdown in a network resource. A network anomaly may also be created intentionally by a malicious attack by a hacker or a person acting to damage the network or impair the performance of the network.
Typically, a network anomaly is monitored, or analyzed, by collecting data from a network element such a single link or a single router of the network. Such data collection is done in isolation from other network data or other network elements. In other words, finding a network anomaly is closely related to a link-level traffic characterization.
Another approach to monitor or analyze a network anomaly treats a network anomaly as a deviation in traffic volume. This enables detection of a network anomaly that visually stands out, but a low-rate network anomaly (e.g., worms, port-scans, small outage events., etc.) are not detected by an approach based on traffic volume.
Still another approach to monitor or analyze a network anomaly is a manual method where a rule is developed. A match or a violation of the rule decides whether a network anomaly has been encountered. However, rule-based methods cannot detect new, previously unseen anomalies.
Many current methods provide a solution, for an element of the network, for each class of a network anomaly, whereas a solution for many elements of a network is preferable.
The present invention relates to methods and apparatus for detecting, monitoring, or analyzing an unusual network event or a network anomaly in a communication network and the business of so doing for the benefit of others. Embodiments of the present invention can detect, monitor, or analyze the network anomaly by applying many statistical and mathematical methods. Embodiments of the present invention include both methods and apparatus to detect, monitor, or analyze the network anomaly. These include classification and localization.
The invention is a general technique for detecting and classifying unusual events (anomalies) in a network in an efficient, continuous manner. The technique is founded on analyzing the distributional properties of multiple features (addresses, ports, etc.) of network-wide traffic. This distributional analysis of traffic features has two key elements for the classification of network anomalies into meaningful clusters.
Network traffic is analyzed for distributions of multiple traffic features (addresses, ports, protocol, etc.) simultaneously. Anomaly detection using feature distribution is highly-sensitive and augments volume-based detections, by exposing low-rate important anomalies that cannot be detected by volume-based methods.
Feature distributions of network traffic are created to extract structural knowledge about an anomaly. This structural knowledge of anomalies is used to classify anomalies into distinct clusters that are structurally and semantically meaningful. The classification of anomalies is achieved by an unsupervised approach, so no human intervention or a priori knowledge is needed to categorize anomalies. This unsupervised approach allows the invention to recognize and classify novel (previously unseen) anomalies (e.g., new worms).
Moreover, the invention analyzes multiple features of network-wide data, i.e., data that is collected from multiple resources in a network. Network-wide analysis enables the detection of anomalies that span across a network. Network-wide analysis, coupled with the feature distribution analysis, allows the invention to detect and classify network-wide anomalies, augmenting detections by current schemes that are predominantly volume-based analysis of single-resource data.
Systematic analysis of data collected from multiple network resources (i.e., network links, routers, etc.) is a key feature of the invention. By leveraging whole-network data, the invention is able to diagnose a wide-range of anomalies, including those that may span throughout a network. Diagnosis allows identification of the time an anomaly is present, identification of the location of the anomaly in the network, and identification of the anomaly type.
Anomalies can arise for a variety of reasons from abuse (attacks, worms, etc.) to unintentional (equipment failure, human error, etc.). The technique is not restricted to point solutions for each type of anomaly. Instead, by treating anomalies as substantial deviations from established normal behavior, the invention provides a general solution for diagnosing a large class of anomalous events.
One embodiment describes forming a time series having at least one dimension corresponding to communication network traffic handled by network elements and decomposing the time series into several communication network traffic patterns existing in those network elements.
Another embodiment forms a uni- or multi-variate model having at least one dimension corresponding to communication network traffic handled by network elements and detecting an anomaly in a pattern in the communication network traffic.
Still another embodiment finds a deviation in a feature of communication network traffic.
Still another embodiment generates at least a distribution of a communication network traffic feature, estimates an entropy of the communication network traffic feature, sets a threshold of the entropy of the communication network traffic feature, and designates the communication network traffic feature to be anomalous when the entropy of the communication network traffic feature is found to be different from the set threshold of the entropy of the communication network traffic feature.
These and other features of the invention are described below in the Detailed Description and the accompanying Drawing of which:
The present invention relates to methods and apparatus for detecting, monitoring, or analyzing an unusual network event or a network anomaly in a communication network. Embodiments of the present invention illustrate specific statistical techniques to detect, monitor, or analyze the network anomaly, other known techniques can be used. Embodiments of the present invention include both methods and apparatus to detect, monitor, or analyze the network anomaly. As used herein the term whole network when applied to the basis for data collection means at least a substantial part of the network such that the data is meaningful in anomaly detection and analysis.
As an illustration, network node j has been shown to be made up of a lower level communication network having a sub-network, a LAN (Local Area Network), personal computers, and mobile agents. Such lower level communication network, shown as represented by network element 104, is made up of (sub)network elements, 106 that may be similar or different in scope, servers, routers or other means. These have network linkages 108 as know in the art. Each sub network element will typically be composed of similar or distinct personal computers 120, or mobile agents 122. These are linked by network links 110 which could be wireless or conventional.
One computational facility 124 in that network could be used to up load the programming 112 of the invention via media 114 to accomplish the data mining for data and/or analysis used in the invention. The analyzes of the invention could be done there on data 116 received from the nodes or other elements via paths 130 or elsewhere such as processor 120 that the date 116 is sent. Access to the data is in the hands of the network provided so obtaining the data is possible. If third parties are performing the analysis, access authorization is needed.
It should be noted that the above description is just for an illustrative architecture of the communication network 100. There could be more or fewer of any of the components of the communication network 100 and there could be many layers of lower level for a given network element and a given network link.
In practicing the method, in a step 202, a process of forming a time series is started. The time series is to have at least one dimension corresponding to communication network 100 traffic on several network elements, such as the network nodes in the flow 118 for each of several periods of time. The elements are termed sources for the purpose of this illustration. In step 204, data for the time series is decomposed into several communication network 100 traffic patterns existing in several network elements nodes a-m. Element 206 illustrates the mathematical form of this data once decomposed into a matrix 208, representing a time series.
The matrix 208 has a separate source for each column and each row is data collected for one period of time over which the data was collected. The data includes information on such variables of network traffic as the number of bytes of traffic, the number of packets and the number of records. The data includes the information of the Internet Provider (IP) used to carry the traffic in each link and the port address, such as a PC 110, within each node. The data reveals a number of features such as source IP and source port as well as destination IP and destination port. All of this data is available in the blocks of network traffic. It is collected on a link basis, that is on and origin to destination, OD, basis.
In
This is done time period by time period so that a step 214 is used to step through all the time periods in the matrix 208. At each iteration, step 214 decides whether the entire set of source data at each time period is above (normal) or below (possibly anomalous) a threshold. The threshold can be preset or updated over time from the data mining results.
When a volume figure exceeds the threshold at one time period, processing turns to
A step 218 analyzes the anomaly found in step 216 by comparing the volume for the suspected source in normal traffic to the anomaly volume. This will give a value in the number of bytes, packets or records for the anomaly at that source. From there a step 220 provides to authorized users the anomaly time, location and quantity. From step 220 a step 222 returns processing back to step 214 for evaluation of the next time period.
The volume difference in the distribution of normal and anomalous traffic is shown in
To reach this point mathematically, some form of dimensional analysis is typically used. One form used in the invention is PCA (Principle Component Analysis), described below.
PCA is a coordinate transformation method that maps a given set of data points onto new axes. These axes are called the principal axes or principal components. When working with zero-mean data, each principal component has the property that it points in the direction of maximum variance remaining in the data, given the variance already accounted for in the preceding components. As such, the first principal component captures the variance of the data to the greatest degree possible on a single axis. The next principal components then each capture the maximum variance among the remaining orthogonal directions.
We will apply PCA on our link data matrix 208, treating each row of Y. It is necessary to adjust Y so that that its columns have zero mean. This ensures that PCA dimensions capture true variance, and thus avoids skewing results due to differences in mean link utilization. Y will denote the mean-centered link traffic data.
Applying PCA to Y yields a set of m principal components, {vi}im=1. The first principal component v1 is the vector that points in the direction of maximum variance in Y:
where ∥Yv∥2 is proportional to the variance of the data measured along v. Proceeding iteratively, once the first k−1 principal components have been determined, the k-th principal component corresponds to the maximum variance of the residual. The residual is the difference between the original data and the data mapped onto the first k−1 principal axes. Thus, we can write the k-th principal component vk as:
An important use of PCA is to explore the intrinsic dimensionality of a set of data points. By examining the amount of variance captured by each principal component, ∥Yv∥2, it is possible to ask whether most of the variability in the data can be captured in a space of lower dimension. If only the variance along the first r dimensions is non-negligible, then it is concluded that the pointset represented by Y effectively resides in an r-dimensional subspace of R.
Once the principal axes have been determined, the dataset can be mapped onto the new axes. The mapping of the data to principal axis i is given by Yvi-. This vector can be normalized to unit length by dividing it by ∥Yv1-∥. Thus, for each principal axis i,
The ui are vectors of size t and are orthogonal by construction. The above equation shows that all the link counts, when weighted by v1, produce one dimension of the transformed data. Thus vector ui captures the temporal variation common to the entire ensemble of the link traffic timeseries along principal axis i. Since the principal axes are in order of contribution to overall variance, u1 captures the strongest temporal trend common to all link traffic, u2 captures the next strongest, and so on. The set {ui}i4=1 captures most of the variance and hence the most significant temporal patterns common to the ensemble of all link traffic timeseries.
The subspace method works by separating the principal axes into two sets, corresponding to normal and anomalous variation in traffic. The space spanned by the set of normal axes is the normal subspace S and the space spanned by the anomalous axes is the anomalous subspace S. This is shown in
The Ux projection of the data exhibits significant anomalous behavior. Traffic “spike” 230 indicates unusual network conditions, possibly induced by an anomaly. The subspace method treats such projections of the data as belonging to the anomalous subspace.
A variety of procedures can be applied to separate the two types of projections into normal and anomalous sets. Based on examining the differences between typical and atypical projections a simple threshold-based separation method works well in practice. The separation procedure examines the projection on each principal axis in order, maximum spread to minimum spread as would be expected. As soon as a projection is found that exceeds the threshold (e.g., contains a 3σ deviation from the mean), that principal axis and all subsequent axes are assigned to the anomalous subspace. All previous principal axes then are assigned to the normal subspace. This procedure results in placing early principal components in the normal subspace.
Having separated the space of all possible link traffic measurements into the subspaces S and {tilde over (S)}, the traffic on each link is decomposed into its normal and anomalous components.
The methods used for detecting and identifying volume anomalies draw from theory developed for subspace-based fault detection in multivariate process control.
Detecting volume anomalies in link traffic uses the separation of link traffic y at any timestep into normal and anomalous components. These as the modeled and residual parts of y.
In the subspace-based detection step, once S and {tilde over (S)} have been constructed, this separation can be effectively performed by forming the projection of link traffic onto these two subspaces. The set of link measurements at a given point in time y is decomposed:
y=ŷ+{tilde over (y)} (4)
such ŷ that corresponds to modeled and {tilde over (y)} to residual traffic. It is possible to form ŷ by projecting y onto S, and {tilde over (y)} by projecting y onto {tilde over (S)}.
The set of principal components corresponding to the normal subspace (v1, v2, . . . , vr) is arranged as columns of a matrix P of size m×r where r denotes the number of normal axes k. ŷ and {tilde over (y)} are:
ŷ=PP
T
y=Cy and {tilde over (y)}=(I−PPT)y={tilde over (C)}y (5)
where the matrix C=PPT represents the linear operator that performs projection onto the normal subspace S, and {tilde over (C)} likewise projects onto the anomaly subspace {tilde over (S)}.
Thus, ŷ contains the modeled traffic and {tilde over (y)} the residual traffic. In general, the occurrence of a volume anomaly will tend to result in a large change to {tilde over (y)}. A useful statistic for detecting abnormal changes in {tilde over (y)} is the squared prediction error (SPE):
SPE=∥{tilde over (y)}∥
2
=∥{tilde over (C)}y∥
2 (6)
network traffic is normal if
SPE≦δα2 (7)
where δα2 denotes the threshold for the SPE at the 1−α confidence level. A statistical test for the residual vector known as the Q-statistic given as:
and where λj is the variance captured by projecting the data on the j-th principal component (∥Yvj∥2), and cα is the 1−α percentile in a standard normal distribution. The result holds regardless of how many principal components are retained in the normal subspace.
In this setting, the 1−α confidence limit corresponds to a false alarm rate of α, if the assumptions under which this result is derived are satisfied. The confidence limit for the Q-statistic is derived under the assumption that the sample vector y follows a multivariate Gaussian distribution. However, i t is pointed out that the Q-statistic changes little even when the underlying distribution of the original data differs substantially from Gaussian.
In the subspace framework, a volume anomaly represents a displacement of the state vector y away from S. The particular direction of the displacement gives information about the nature of the anomaly. The approach to anomaly identification is to ask which anomaly out of a set of potential anomalies is best able to describe the deviation of y from the normal subspace S.
The set of all possible anomalies is (Fi, i=1, . . . , I). This set should be chosen to be as complete as possible, because it defines the set of anomalies that can be identified.
For simplicity of illustration, only one-dimensional anomalies are considered; that is, anomalies in which the additional per-link traffic can be described as a linear function of a single variable. It is straightforward to generalize the approach to multi-dimensional anomalies as shown infra.
Then each anomaly Fi has an associated vector θi which defines the manner in which this anomaly adds traffic to each link in the network. Assuming that θi has unit norm, then in the presence of anomaly Fi, the state vector y is represented by
y=y*+θ
i
f
i (10)
where y* represents the sample vector for normal traffic conditions (and which is unknown when the anomaly occurs), and fi represents the magnitude of the anomaly.
Given some hypothesized anomaly Fi, form an estimate of y* by eliminating the effect of the anomaly, which corresponds to subtracting some traffic contribution from the links associated with anomaly Fi. The best estimate of y* assuming anomaly Fi is found by minimizing the distance to the normal subspace S in the direction of the anomaly:
where {tilde over (y)}={tilde over (C)}y and {tilde over (θ)}i={tilde over (C)}θi. This gives fi=({tilde over (θ)}iT{tilde over (θ)}i)−1{tilde over (θ)}iT{tilde over (y)}
Thus the best estimate of y* assuming anomaly Fi is:
To identify the best hypothesis from the set of potential anomalies, chose the hypothesis that explains the largest amount of residual traffic. That is, chose the Fi that minimizes the projection of y*i onto {tilde over (S)}.
Thus, in summary, the identification algorithm consists of:
1. for each hypothesized anomaly Fi, i=1, . . . , I, compute y*i using Equation (1)
2. choose anomaly Fj as j=arg mini ∥{tilde over (C)}y*i∥.
The possible anomalies are (Fi, i=1, . . . , n) where n is the number of OD flows in the network. In this case, each anomaly adds (or subtracts) an equal amount of traffic to each link it affects. Then θi is defined as column i of the routing matrix A, normalized to unit norm: θi=Ai/∥Ai∥_.
With an estimate of the particular volume anomaly, Fi, the number of bytes that constitute this anomaly are estimated. The estimated amount of anomalous traffic on each link due to the chosen anomaly Fi is given by
y′=y+y*
i (13)
Then the estimated sum of the additional traffic is proportional to θiTy′. Since the additional traffic flows over multiple links, one must normalize by the number of links affected by the anomaly.
In the current case, where anomalies are defined by the set of OD flows, our quantification relies on A. We use A to denote the routing matrix normalized so that each column of A has unit sum, that is:
Then given identification of anomaly Fi, our quantification estimate is:
ĀiTy′ (15)
Some anomalies may lie completely within the normal subspace S and so cannot be detected by the subspace method. Formally, this can occur if {tilde over (C)} θi=0 for some anomaly Fi. In fact this is very unlikely as it requires the anomaly and the normal subspace S to be perfectly aligned. However, the relative relationship between the anomaly θi and the normal subspace can make anomalies of a given size in one direction harder to detect than in other directions.
The principles described above are used in another aspect of the invention to produce a multi-feature (multi-way), multi source (multivariate) distribution of traffic flow data. The process begins in step 310 of
In a step 344 of
Here i occurs ni times and S is the total number of observations in the matrix. The new matrices 336, in
The process of statistical simplification of two different distributions by an entropy metric is illustrated in
In subsequent step 338, the matrices 336 are “unwrapped” into a large 2D matrix 342 in which the rows of each matrix 336 are assembled into long rows such as row 348 in
The matrix 342 is then processed in step 350 and 352 by a subspace clustering technique on the principles as previously described. This is an iterative process in that it steps repeatedly through the procedures in step 352, looping via steps 370 and 380. The following describes the net result of the iteration.
In a step 354 of an anomaly classification process for each detected anomaly the residual components are found for of K each features. A detected anomaly yields a set of “K” numbers, one for each of the features in the matrix 340. The K numbers represent a point in K-dimensional space and is so treated in step 356. That is the K numbers are treated as positions along K axes in K space and they are so plotted in step 358. This plotting occurs in a processor such as such as processor 120 and an associated database.
Clustering techniques are then applied in step 360 to identify clusters of points that are near to each other according to a threshold value for nearness. Such a value for threshold is determined directly from the datapoints and also adjusted over time for more accurate result as a part of a learning from use process. The clustering may be performed in a lower dimensional space such as, for example, projecting them onto a 2D space as in
The resulting clusters (an example with K=2 dimensions is shown in 362 ) as in
In this manner various service providers for networks (e.g., Service provider networks or cable providers) that subscribe to or use the invention may be able to take remedial steps to deal with anomalies and provide assurances to their subscribers of that ability. This will potentially make their service more appealing. The providers may also contract this function to independent analysts by giving them the necessary access to network elements, thereby creating a new business opportunity.
The invention has been illustrated for use in service provider networks but can equally be used in other types of networks such as transportation highway networks, postal service networks, and sensor networks.
This application claims the benefit under 35 U.S.C. §119(e) of U.S. Provisional Patent Application No. 60/694,853 and 60/694,840 both filed on Jun. 29, 2005 and the disclosures of both of which are incorporated by reference herein.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US06/25398 | 6/29/2006 | WO | 00 | 12/3/2009 |
Number | Date | Country | |
---|---|---|---|
60694853 | Jun 2005 | US | |
60694840 | Jun 2005 | US |