The disclosure relates to apparatuses and methods for processing time series, in particular in the context of anomaly detection.
Anomaly detection in time series is a set of techniques aimed at finding outliers or rare events in data varying with time. Supervised approaches, such as neural networks, or unsupervised approaches, such as clustering, may be used.
In telecommunications, whether a pattern in a time series is anomalous is often highly dependent on several parameters, such as a temporal or topological context. Existing solutions for anomaly detection in complex systems often rely on human experts, which is costlier and more difficult. The existing solutions are also confronted with high rates of false positives.
Thus, there is a need for apparatuses and techniques to detect anomalies in time series without the need of a human expert and with limited false positive rates.
In some embodiments, the disclosure provides an apparatus for anomaly detection. The apparatus comprises means for:
Collecting a measurement time-series relating to a performance indicator, wherein the measurement time-series relates to a communications network resource, wherein the measurement time-series is collected over a predetermined timeframe,
Thanks to these features, anomalies in time-series may be detected in an unsupervised fashion, which is more cost-effective than supervised learning and may result in the detection of more complex anomalies.
The apparatus for anomaly detection may comprise one or more of the following features.
In an embodiment, the apparatus further comprises means for:
Thanks to these features, a false positive in the primary anomaly label may be detected and the secondary anomaly label may correct the false positive.
In an embodiment, the apparatus further comprises means for:
Thanks to these features, the primary anomaly label is influenced by the cluster anomaly labels of clusters which satisfy a proximity condition and a size condition.
In an embodiment, the apparatus further comprises means for transmitting the primary anomaly label to a correction module, wherein the correction module performs root cause analysis and at least one corrective action relating to the communications network resource.
Thanks to these features, a detected anomaly may be further analyzed and corrected.
In an embodiment, a temporal attribute is associated with said or each of said measurement time-series, wherein each cluster comprises a cluster temporal attribute, and wherein the external similarity condition is a function of a second distance between the cluster temporal attribute and the temporal attribute associated with the measurement time-series.
Thanks to the use of temporal attributes, some anomalies linked to patterns happening periodically (on week-ends or on nights) may be detected.
In an embodiment, the representative value is a median value of said or each of said measurement time-series.
In an embodiment, the apparatus further comprises means for:
Thanks to the feature vectors, the primary anomaly label may be computed using data extracted from communications network resources having similar behaviours.
In an embodiment, said similarity criterion consists in that the feature vectors associated to the measurement time-series within the time-series subset are identical.
In an embodiment, the means for providing a clustering model are configured for:
Thanks to these features, the clusters may be computed in an unsupervised fashion and each cluster may regroup similar extracts of time-series from the plurality of time-series.
In an embodiment, the apparatus further comprises means for setting the cluster anomaly label associated with a cluster to encode an anomalous cluster in response to determining that a size of the cluster is lower than a threshold of size.
Thanks to these features, larger clusters may be associated with a normal behaviour and smaller clusters may be associated with an anomalous behaviour.
In an embodiment, the apparatus further comprises means for setting the threshold of size as a function of a size distribution of the set of clusters.
Thanks to these features, the threshold of size may be computed automatically.
In some example embodiments, the disclosure also provides a method for anomaly detection, the method comprising the steps of:
In an embodiment, the method further comprises the steps of:
In an embodiment, the method further comprises the steps of:
In an embodiment, the method further comprises the steps of transmitting the primary anomaly label to a correction module, wherein the correction module performs root cause analysis and at least one corrective action relating to the communications network resource.
In an embodiment, a temporal attribute is associated with said or each of said measurement time-series, wherein each cluster comprises a cluster temporal attribute, and wherein the external similarity condition is a function of a second distance between the cluster temporal attribute and the temporal attribute associated with the measurement time-series.
In an embodiment, the representative value is a median value of said or each of said measurement time-series.
In an embodiment, the method further comprises the steps of:
In an embodiment, said similarity criterion consists in that the feature vectors associated to the measurement time-series within the time-series subset are identical.
In an embodiment, the steps of providing a clustering model are configured for:
In an embodiment, the method further comprises the steps of setting the cluster anomaly label associated with a cluster to encode an anomalous cluster in response to determining that a size of the cluster is lower than a threshold of size.
In an embodiment, the method further comprises the steps of setting the threshold of size as a function of a size distribution of the set of clusters.
In some embodiments, the invention provides a non-transitory computer readable medium comprising program instructions for causing an apparatus to perform the above method.
In some example embodiments, the means in the apparatus further comprises:
At least one processor; and
At least one memory including a computer program code, the at least one memory and computer program code configured to, with the at least one processor, cause the operations of the apparatus.
The at least one memory and the computer program code may be configured to, with the at least one processor, cause the apparatus to:
Collect a measurement time-series relating to a performance indicator, wherein the measurement time-series relates to a communications network resource, wherein the measurement time-series is collected over a predetermined timeframe,
The at least one memory and the computer program code may further be configured to, with the at least one processor, cause the apparatus to:
The at least one memory and the computer program code may further be configured to, with the at least one processor, cause the apparatus to:
The at least one memory and the computer program code may further be configured to, with the at least one processor, cause the apparatus to transmit the primary anomaly label to a correction module, wherein the correction module performs root cause analysis and at least one corrective action relating to the communications network resource.
The at least one memory and the computer program code may further be configured to, with the at least one processor, cause the apparatus to:
The at least one memory and the computer program code may further be configured to, with the at least one processor, cause the apparatus to:
The at least one memory and the computer program code may further be configured to, with the at least one processor, cause the apparatus to set the cluster anomaly label associated with a cluster to encode an anomalous cluster in response to determining that a size of the cluster is lower than a threshold of size.
The at least one memory and the computer program code may further be configured to, with the at least one processor, cause the apparatus to set the threshold of size as a function of a size distribution of the set of clusters.
In some embodiments, the disclosure provides an apparatus comprising:
In an embodiment, the apparatus further comprises:
In an embodiment, the apparatus further comprises:
In an embodiment, the apparatus further comprises means a transmitting circuitry configured to transmit the primary anomaly label to a correction module, wherein the correction module performs root cause analysis and at least one corrective action relating to the communications network resource.
In an embodiment, the apparatus further comprises:
In an embodiment, the means for providing a clustering model comprise:
In an embodiment, the apparatus further comprises a first setting circuitry configured to set the cluster anomaly label associated with a cluster to encode an anomalous cluster in response to determining that a size of the cluster is lower than a threshold of size.
In an embodiment, the apparatus further comprises a second setting circuitry configured to set the threshold of size as a function of a size distribution of the set of clusters.
These and other aspects of the invention will be apparent from and elucidated with reference to example embodiments described hereinafter, by way of example, with reference to the drawings.
With reference to
The communications network resources 101, 102, . . . , 10j may comprise telecommunications network equipment such as Base Station Controllers, Base Station Control Functions, Base Station Transceiver Stations, Transceivers. The communications network resources 101, 102, . . . , 10j may comprise physical or logical entities, on hardware or software.
The remote infrastructure 12 may be on premise or may be deployed in a cloud. The remote infrastructure 12 may receive several telemetry data streams and may contribute to monitor the communications network resources 101, 102, . . . , 10j. The remote infrastructure 12 comprises an anomaly detection module 20, which may be a software embedded in the remote infrastructure 12.
The anomaly detection module 20 may be configured to detect different types of anomalies pertaining to the communications network resources 101, 102, . . . , 10j. Examples of anomalies may comprise a low Call Setup Success rate in a rural zone or a high Drop Call rate during off-peak times.
A number of communications network resources monitored by the remote infrastructure may range up to millions. The communications network resources 101, 102, . . . , 10j are associated with resource metadata 13, which comprise attributes relating to the physical features of the communications network resources and an environment of said communications network resources 101, 102, . . . , 10j.
The resource metadata 13 may include a resource type, a network slice type (e.g.: ultra-reliable low latency, enhanced mobile broadband, . . . ), a geographical area size category (e.g.: low, medium, high), a geographical area density type (e.g.: low, medium, high), a geographical area usage type (e.g.: industrial, commercial, residential, healthcare) or a local network topology.
The remote infrastructure 12 may have access to the resource metadata 13 of the communications network resources 101, 102, . . . , 10j.
The communications network resources 101, 102, . . . , 10j measure values of a plurality of key performance indicators at regular intervals. A reporting period between two measures may range from a minute to more than a day.
The streams of data 111, 112, . . . , 11j transmitted from the communications network resources 101, 102, . . . , 10j to the remote infrastructure 12 comprise the values of the plurality of key performance indicators.
The key performance indicators are variables linked to performance, for example linked to network quality or service quality. The plurality of key performance indicators may comprise a network capacity, a network usage rate, a data rate, a throughput rate, a Call Setup Success rate or a Drop Call rate. The key performance indicators may be univariate or multivariate.
According to an embodiment, the plurality of key performance indicators may be measured and transmitted simultaneously. The reporting period is common to the plurality of key performance indicators.
The remote infrastructure 12 receives new measurements of the plurality of key performance indicators for the plurality of communications network resources 101, 102, . . . , 10j.The remote infrastructure 12 also stores past measurements of the plurality of key performance indicators for the plurality of communications network resources 101, 102, . . . , 10j. Thus, the remote infrastructure 12 may build measurement time-series for the plurality of key performance indicators and the plurality of communications network resources 101, 102, . . . , 10j.
With reference to
A length of the measurement time-series 14 is determined by a sliding window parameter. The sliding window parameter may range from a few minutes to a few months. Small values of the sliding window parameter may be used to detect punctual anomalies. Larger values of the sliding window parameter may be used to detect long-term changes in behavior or new patterns.
The sliding window parameter may take several values over time. Different values of the sliding window parameter may also be used in parallel to detect different kinds of anomalies at the same time.
The anomaly detection module 20 may also extract derived temporal attributes from timestamps associated with values from the measurement time-series 14. The derived temporal attributes may be Boolean variables (e.g.: a variable encoding whether the timestamps are associated with specific days of the week, for example the weekend) or categorical variables (e.g.: a variable encoding the day of the week or a variable encoding the time of the day, which may be categorized in blocks of several hours each).
The anomaly detection module 20 also has access to the resource metadata 13 associated with the communications network resources 101, 102, . . . , 10j. The anomaly label 30 of the measurement time-series 14 may depend on values of the resource metadata 13.
A clustering model is embedded within the anomaly detection module 20. The clustering model has been trained on a pre-existing dataset of training time-series. The training time-series are associated with the plurality of key performance indicators and the plurality of communications network resources 101, 102, . . . , 10j.
With reference to
A cluster of the set of clusters 80 is associated with a cluster anomaly label 31. The cluster anomaly label 31 encodes whether the partial training time-series displays an anomalous behavior.
The clustering model takes as input a measurement time-series, which are then compared to the set of clusters. An anomaly label is computed based on the cluster anomaly label 31 of clusters of the set of clusters 80 which meet a proximity condition.
The set of clusters 80 may depend on training resource metadata 15 and the anomaly detection module 20 may rely on a plurality of clustering models, with one of a plurality of clustering models being associated with a group of network resources.
With reference to
The anomaly detection module 20 comprises a multi-label classification unit 210, a preliminary classification unit 220 and a false-positive pruning unit 230.
The measurement time-series may be processed in parallel. An input measurement time-series 140 is received by the multi-label classification unit 210. The multi-label classification unit 210 computes a subset 141 of the set of clusters 80 that satisfies a proximity condition.
The multi-label classification unit 210 computes a representation of the input measurement time-series 140, comprising the derived temporal attributes and a centroid of values of the input measurement time-series 140. According to an embodiment, the centroid is a value associated with a timestamp placed at half the sliding window. The centroid may also be any kind of average or weighted average or a median value. The centroid may also be a specific value, such as the first value or last value acquired during the sliding window.
The multi-label classification unit also computes centroids of values and derived temporal attributes within the clusters in the set of clusters 80. Centroids of the clusters are averages of the median values of the elements of the clusters, wherein the elements of the clusters are partial time-series.
A number of clusters is chosen based on a proximity criterion. According to an embodiment, the number of clusters is a hyperparameter k. The hyperparameter k may be chosen using any method in the state of the art such as the elbow method or the silhouette method. The hyperparameter k may also be chosen empirically, for example on the basis of geographic features or an area density. In the case of a deployment of the communications network resource in a rural setting, the hyperparameter k may be very low (for example two or four). In the case of a deployment of the communications network resource in an urban area, there might be a high number of clusters, for example dozens.
According to another embodiment, the number of clusters may be variable and depend on an absolute proximity criterion.
An ensemble of clusters 141, which meet a distance condition with the input measurement time-series 140 is selected. The distance condition may consist in selecting the k closest clusters in accordance with some metric. An algorithm such as the k-nearest neighbors algorithm can be used. A distance between the centroid of the input measurement time-series 140 and the centroids of the clusters can be computed as the metric.
A proximity of values is computed using a distance, for example the Euclidian distance or a Minkowski distance of order p, where p is a chosen integer.
The k closest clusters in the ensemble of clusters 141 are each associated with a cluster anomaly label 31, which may be a positive cluster anomaly label or a negative cluster anomaly label. The ensemble of clusters 141 and the cluster anomaly labels are transmitted to the preliminary classification unit 220.
The preliminary classification unit 220 computes a primary anomaly label 142 based on the ensemble of clusters 141 and the cluster anomaly labels associated with the clusters in the ensemble of clusters 141.
Decision weights are associated with the clusters in the ensemble of clusters 141, according for example to the following equation, where z designates the centroid of the input measurement time-series 140, Ci designates the i-th cluster and where |Ci| designates a number of elements in the i-th cluster:
w
i
=|C
i|*similarity (Ci, z)
The decision weights may be computed differently. According to an embodiment, the decision weights are the similarity between the clusters and the centroid of the input measurement time-series 140.
The similarity may be computed as follows, as a function of the distance:
similarity=1−distance
Thus, the decision weights are a function of the input measurement time-series 141 and are highest for large clusters with elements which are similar to the input measurement time-series 141.
A voting strategy is then implemented based on the decision weights and an aggregated abnormality score is computed. The voting strategy may rely on plurality voting, majority voting, dictatorship, or any voting strategy existing in the state of the art.
The voting strategy may depend on the decision weights. The clusters may be sorted by decreasing decision weights and the voting strategy may rely on a subset of the ensemble of clusters 141 with the highest decisions weights (for example, the primary anomaly label may be the same as the cluster anomaly label of the cluster with the highest decision weight). A weighted majority voting may take place.
The preliminary classification unit 220 outputs the primary anomaly label 142 associated with the input measurement time-series 140. The primary anomaly label 142 may be positive if the aggregated abnormality score exceeds an abnormality threshold (which can be fixed empirically). The primary anomaly label may also be positive depending on a result of the voting strategy
The primary anomaly label 142 is then transmitted to the false-positive pruning unit 230. Although the multi-label classification unit 210 and the preliminary classification unit 220 process measurement time-series relating to each key performance indicator and each resource group independently, the false-positive pruning unit 230 carries out a collaborative process across the measurement time-series associated with a plurality of key performance indicators of a communications network resource. The false-positive pruning unit 230 computes a secondary anomaly label 144 for the measurement time-series.
The false positive pruning unit 230 corrects erroneous labels by comparing the primary anomaly label 142 with other primary anomaly labels 143 relating to measurement time-series of other key performance indicators for the same communications network resource. An objective of this step is to correct primary anomaly labels which would be false positives.
According to an embodiment, a proportion of positive primary anomaly labels is computed. If the proportion of positive anomaly labels is below a false-positive threshold, the positive primary anomaly labels are deemed false positives. The false-positive pruning unit 230 may then output negative secondary anomaly labels 144.
The false-positive threshold may be fixed empirically and may be absolute or relative.
According to an embodiment, primary anomaly labels relating to correlated key performance indicators may also be analyzed. If a first key performance indicator and a second key performance indicator have a causality link, and the first key performance indicator is associated with a positive primary anomaly label and the second key performance indicator is associated with a negative primary anomaly label, the false-positive pruning unit associates a negative secondary anomaly label to the first key performance indicator.
According to an embodiment, the false-positive pruning module may also confirm a presence of an anomaly by checking measurement time-series of normally independent key performance indicators for new correlations.
With reference to
The model-building module 40 comprises a resource-grouping unit 401, a contextual clustering unit 402 and a categorization unit 403.
The resource-grouping unit 401 takes as input the training resource metadata 15 associated with a plurality of training network resources. The resource-grouping unit 401 outputs a set of resource groups 17. A resource group of the set of resource groups 17 comprises at least one communications network resource. The at least one communications network resource satisfies a similarity criterion, wherein the similarity criterion depends on the training resource metadata 15 of the communications network resources in the resource group.
According to an embodiment, an exact metadata matching is employed. This may be the case if there is a relatively limited number of metadata variables and if the metadata variables are all categorical. In this embodiment, a resource group contains communications network resources with identical values of the metadata variables.
According to an embodiment, an exact metadata matching with priority may be applied. A limited set of categorical variables is chosen. The set of resource groups 17 are built so that a resource group contains communications network resources with identical values of the limited set of categorical variables, despite having possibly different values for different variables.
According to an embodiment, the set of resource groups 17 may be computed using a multiclass classification algorithm (such as, for example, Naïve Bayes, Support Vector Machines, Random forest classifiers or the K-nearest neighbors) or clustering algorithms (such as k-Means or affinity propagation). The clustering algorithms may rely on a similarity metric, computed as a function of a chosen distance, such as the Euclidian distance or a Minkowski distance.
Hence, the set of resource groups 17 comprises resources having similar or identical metadata vectors.
The set of resource groups 17 is transmitted to the contextual clustering unit 402. The contextual clustering unit 402 also receives the training time series 16.
The training time-series 16 are a plurality of time-series extracted from the plurality of training network resources.
The contextual clustering unit 402 computes a clustering model for a resource group, as a function of a subset of the training time-series corresponding to said resource group. The contextual clustering unit 402 may comprise a plurality of clustering models corresponding to the plurality of key performance indicators. Hence, the training time-series 16 relate to one key performance indicator and to one resource group.
A plurality of clustering models may be computed sequentially or in parallel by the model-building module 40, for the plurality of resource groups and of key performance indicators.
The contextual clustering unit 402 splits the training time-series 16 into partial training time-series and clusters the partial training time-series into a set of clusters 80, wherein a cluster of the set of clusters is defined by the values of the partial training time-series but also the derived temporal attributes. The partial training time-series in one cluster all satisfy a similarity condition. Suitable similarity conditions are for example a maximal distance value, where a distance between partial training time-series can be computed as a Euclidian distance or a Minkowski distance.
The set of clusters 80 is then transmitted to the categorization module 403. The categorization module 403 outputs the cluster anomaly labels 31 associated with the set of clusters.
According to an embodiment, the cluster anomaly labels 31 may be computed based on an analysis of a size distribution of the clusters. A threshold may be fixed, so that below said threshold a cluster is considered anomalous and associated with a positive anomaly label.
The threshold may be an absolute size or a relative threshold compared to a size distribution. Indeed, small clusters may indicate outliers and therefore anomalies in a given network resource.
For example, a cluster of only one time-series indicates that there is a network resource with a behavior which differs from that of other network resources in the resource group, which would indicate an anomaly.
The model-building module 40 outputs clustering models used afterwards in the exploitation phase. However, the model-building module 40 may be further solicited afterwards to update the clustering models, in case of network changes, such as an integration of new communications network resources. The clustering models may be updated on a regular basis.
The clustering models may be updated automatically, for example when the anomaly detection module displays a drop in performance or fails to satisfy a chosen metric, which may be a success rate in reporting anomalies.
An update of the clustering models may also be automatically triggered when the new time-series measurements do not satisfy a similarity condition with the training time-series anymore. The similarity condition may be based on a similarity metric to compare time-series.
With reference to
The contextual clustering unit 402 comprises a formatting module 4021, a row-clustering module 4022 and a resource-clustering module 4023. The contextual clustering module receives the training time series 16 relating to a key performance indicator and to a resource group.
The formatting module 4021 outputs tabular data 60, wherein a row corresponds to a timestamp. The row comprises the derived temporal attributes and values of the key performance indicator for the communications network resources in the resource group.
The row-clustering module 4022 then clusters rows with similar values together into a set of intermediate clusters 70. An intermediate cluster contains rows (consecutive or not) put together in the same table.
The set of intermediate clusters 70 is transmitted to the resource-clustering module 4023, wherein the training network resources are grouped together into the set of clusters 80 based on a similarity of the values within an intermediate cluster.
Thus, a cluster of the set of clusters 80 comprises extracts of time-series for a key performance indicator for a plurality of training network resources, wherein the extracts of time series satisfy a similarity condition.
With reference to
The invention is not limited to the described example embodiments. The appended claims are to be construed as embodying all modifications and alternative constructions that may occur to one skilled in the art, and which fairly fall within the basic teaching as set forth herein.
As used in this application, the term “circuitry” may refer to one or more or all of the following:
This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor (or multiple processors) or portion of a hardware circuit or processor and its (or their) accompanying software and/or firmware. The term circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in server, a cellular network device, or other computing or network device.
Elements such as the apparatus and its components could be or include e.g. hardware means like e.g. an Application-Specific Integrated Circuit (ASIC), or a combination of hardware and software means, e.g. an ASIC and a Field-Programmable Gate Array (FPGA), or at least one microprocessor and at least one memory with software modules located therein, e.g. a programmed computer.
The use of the verb “to comprise” or “to include” and its conjugations does not exclude the presence of elements or steps other than those stated in a claim. Furthermore, the use of the article “a” or “an” preceding an element or step does not exclude the presence of a plurality of such elements or steps. The example embodiments may be implemented by means of hardware as well as software. The same item of hardware may represent several “means”.
In the claims, any reference signs placed between parentheses shall not be construed as limiting the scope of the claims.
Number | Date | Country | Kind |
---|---|---|---|
22206456.0 | Nov 2022 | EP | regional |