This disclosure generally relates to signal processing and control applications, and more specifically to detecting anomalies by analyzing signals indicative of the operation of a machine performing a task.
Diagnosis and monitoring of a machine's operating performance are beneficial for a wide variety of applications. Physical machines comprise several components working in tandem to perform tasks. During operation, the machines and/or their components operate in some predefined manner and certain observations with regards to the operation of such machines are helpful in discerning whether the machines and/or their sub-parts are operating normally or anomalously. One example of such observations is acoustic or sound signals produced by the machines during operation. For example, moving parts of some electric machines produce acoustic signals of a certain type which is distinguishable from the acoustic signals produced when there is some immediate or upcoming fault indicating anomalous operation. In some scenarios, automated diagnosis of a machine may be performed to detect anomalous signals of the operation of the machine performing a task, based on deep learning-based techniques.
While deep learning-based techniques available in the art offer better insights in the analysis of the anomaly signal, there are serious limitations to conventional approaches in this field of endeavor. To achieve the desired accuracy for anomaly detection, in all conventional solutions, the audio signal is usually transformed into high-dimensional embeddings residing in a so-called high-dimension space though it remains much smaller in dimension than the dimension of the audio measurements themselves. Such a space is high-dimensional because the number of dimensions is still too high for humans to manipulate with any ease. The appropriate number of dimensions needed for optimal performance must balance competing constraints: it needs to be high enough to correctly capture the diversity of anomalous behaviors but needs to be small enough to keep the computational burden of the method under control and avoid capturing superfluous spurious patterns that are not actually indicative of anomalous behavior. A major drawback of existing anomalous detection systems based on audio/video representation of an operation of a machine is the complexity and richness of the features to be analyzed for the practical accuracy of anomaly detection. In these available solutions, such a high-dimensional embedding may exceed the computational time and memory requirements on systems, such as embedded systems, with limited computational power embodied in various factory automation settings.
Accordingly, there is a need to overcome the above-mentioned problems associated with detecting anomalous operation of machines through signal processing. More specifically, there is a need to develop methods and systems that are feasible and efficient for detecting signal snippets indicative of anomalous operation of the machine.
Various embodiments of the present disclosure disclose systems and methods for detecting anomalous operation of a machine performing a task. Additionally, it is the object of some embodiments to perform anomalous sound detection using deep learning techniques.
It is also an object of some embodiments to provide unsupervised anomalous sound detection by learning a model that can detect anomalies when only data in normal operating conditions is available for training the model parameters. A typical application is condition monitoring and diagnosis of machine sounds in applications such as predictive maintenance and factory automation.
Automated diagnosis of machines may be used to detect anomalous signals using training data that corresponds to normal operating conditions of the machine. The anomaly detection based on such training data is an unsupervised approach. For example, unsupervised anomalous sound detection may be suitable for detecting specific types of anomalies, such as abrupt transient disturbances or impulsive sounds that may be detected based on abrupt temporal changes. Furthermore, these approaches can capture much more complex anomalous patterns in the data, patterns that might be too subtle or too complicated for human operators to detect, describe and/or diagnose.
Some example embodiments are based on the realization that in order to achieve the desired accuracy for anomaly detection, one approach may be to transform the audio signal into high-dimensional embeddings residing in a so-called high-dimension space. Although such a space still remains much smaller in dimension than the dimension of the audio measurements themselves, it is characterized as high dimension because the number of dimensions is still too high for humans to manipulate with any ease. It is also a realization of some example embodiments that the suitable number of dimensions needed for optimal performance must balance competing constraints: it needs to be high enough to correctly capture the diversity of anomalous behaviors but needs to be small enough to keep the computational burden of the method under control and avoid capturing superfluous spurious patterns that are not actually indicative of anomalous behavior. For example, an embedding network embeds the features extracted from a signal including one or a combination of an audio signal and a video signal into the space having 128 dimensions to capture the diversity of anomalies well and reach satisfactory performance. However, some example embodiments are based on the realization that such a high-dimensional embedding may exceed the computational time and memory requirements on systems, such as embedded systems, with limited computational power embodied in various factory automation settings.
It is a realization of some example embodiments that unsupervised anomalous detection remains a complex and challenging task. Some example embodiments realize that among these challenges, one major drawback of existing anomalous sound detection algorithms is that they fail when presented with domain shift. For example, in the context of sound detection, acoustic characteristics may change between the normal data collected for training and normal data collected at inference time due to factors such as different background noise, different operating voltages, etc. This failure is typically caused by algorithms that are unable to distinguish between anomalous signal changes caused by an anomalous sound and normal signal changes caused by domain shift. It is also a realization of some example embodiments that one possible solution to building effective methods resilient to these task complexities is to increase the dimensionality of the vector quantities manipulated inside the algorithm to detect anomalies. However, such an increase may cause additional computational time and memory requirements on systems, such as embedded systems, with limited computational power embodied in various factory automation settings. Additionally, with increase in dimensionality, the risk of capturing spurious behaviors rather than representative behaviors also increases. Accordingly, some example embodiments are directed towards providing alternate measures for processing feature embeddings that incurs less resources, can therefore be implemented on smaller and embedded devices, and also minimizes spurious behaviors in anomaly detection tasks.
Some example embodiments are based on the realization that hyperbolic spaces demonstrate the ability to encode hierarchical relationships much more effectively than Euclidean space when using those embeddings for classification. A corollary of that property is that the distance of a given embedding from the hyperbolic space origin encodes a notion of classification certainty, naturally mapping inlier class samples to the space edges and outliers near the origin. As such, some example embodiments are based on the realization that the hyperbolic embeddings generated by a deep neural network pre-trained to classify short-time Fourier transform frames computed from normal machine sound are more distinctive than Euclidean embeddings when attempting to identify unseen anomalous data.
Some example embodiments are based on recognizing an improvement due to the hyperbolic space in the context of anomaly detection of the operation of machines. This new additional property gives the hyperbolic spaces an advantage over Euclidean spaces that has not been recognized before. Specifically, to carry information sufficient to detect anomalies with practical reliability, the encodings of a signal into hyperbolic space need to have fewer dimensions than the encodings of the same signal into Euclidean space. For example, in the context of sound anomaly detection, the 128-dimensional encoding of the sound signal in the Euclidean space carries the same information as the 2-dimensional encoding of the same signal in the hyperbolic spaces.
Some embodiments discover this property empirically in the context of sound anomaly detection. Hence, it was unclear initially whether this new property is the property of the hyperbolic spaces or the property of acoustic signals. However, some additional experiments justify the conclusion that this is a property of the hyperbolic space rather than an audio signal. Hence, additionally or alternatively to anomalous sound detection using hyperbolic embeddings, some embodiments use hyperbolic embeddings of different kinds of signal indicative of the operation of the machine to detect the anomaly.
Some example embodiments are thus directed towards providing systems, methods, and programs for performing unsupervised anomaly detection using embeddings from a trained neural network architecture with a hyperbolic embedding layer, using the embeddings generated from a test sample to generate an anomaly score.
In order to achieve the aforesaid objectives and advancements, some example embodiments provide systems, methods, and computer program products for anomaly detection of a machine operating to perform a task.
Some example embodiments provide an anomaly detection system for detecting an anomaly of an operation of a machine based on a signal indicative of the operation of the machine performing a task. The system comprises at least one processor and memory having instructions stored thereon. The instructions when executed by the at least one processor, cause the anomaly detection system to collect hyperbolic embeddings of the signal indicative of the operation of the machine. The hyperbolic embeddings lie in a hyperbolic space defined by a model for n-dimensional hyperbolic geometry in which the points of the hyperbolic geometry lie in an n-dimensional open subset of the real coordinate space of dimension n. The processor is also configured to perform the detection of the anomaly of the operation of the machine based on the hyperbolic embeddings to determine an anomaly score and render the anomaly score.
In yet some other example embodiments, a computer-implemented method for detecting anomaly of an operation of a machine based on a signal indicative of the operation of the machine performing a task is provided. The method comprises collecting hyperbolic embeddings of the signal indicative of the operation of the machine. The hyperbolic embeddings lie in a hyperbolic space defined by a model for n-dimensional hyperbolic geometry in which the points of the hyperbolic geometry lie in an n-dimensional open subset of the real coordinate space of dimension n. The method further comprises performing the detection of the anomaly of the operation of the machine based on the hyperbolic embeddings to determine an anomaly score and rendering the anomaly score.
In yet some other example embodiments, a non-transitory computer readable medium having stored thereon computer executable instructions for performing the method for detecting anomaly of an operation of a machine is provided.
According to some example embodiments, measurements of the signal or features extracted from the measurements are processed with an embedding neural network to produce Euclidean embedding of the signal into Euclidean space. The Euclidean embedding is then projected into the hyperbolic space to produce the hyperbolic embeddings. The hyperbolic embeddings are processed by a classifying neural network to produce at least a portion of the anomaly score.
In some example embodiments, the classifying neural network is trained with training data generated from non-anomalous operations of the machine. The non-anomalous operations of the machine are defined relative to standard operating performance data of the machine. For example, the standard operating performance data of the machine may define values of one or more operational parameters of the machine. According to some example embodiments, the embedding neural network is jointly trained with the classifying neural network such that weights of the embedding and classifying neural networks are interdependent on each other.
According to some example embodiments, the anomaly score is a function of a combination of a distance between the hyperbolic embeddings and an origin of the hyperbolic space and a probability of correct classification returned by the classifying neural network. A control system operatively connected to the anomaly detection system may control the operation of the machine based on the rendered anomaly score.
The presently disclosed embodiments will be further explained with reference to the following drawings. The drawings shown are not necessarily to scale, with emphasis instead generally being placed upon illustrating the principles of the presently disclosed embodiments.
While the above-identified drawings set forth presently disclosed embodiments, other embodiments are also contemplated, as noted in the discussion. This disclosure presents illustrative embodiments by way of representation and not limitation. Numerous other modifications and embodiments can be devised by those skilled in the art which fall within the scope and spirit of the principles of the presently disclosed embodiments.
The following description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the following description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing one or more exemplary embodiments. Contemplated are various changes that may be made in the function and arrangement of elements without departing from the spirit and scope of the subject matter disclosed as set forth in the appended claims.
Specific details are given in the following description to provide a thorough understanding of the embodiments. However, understood by one of ordinary skill in the art can be that the embodiments may be practiced without these specific details. For example, systems, processes, and other elements in the subject matter disclosed may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known processes, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments. Further, like-reference numbers and designations in the various drawings may indicate like elements.
Also, individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process may be terminated when its operations are completed but may have additional steps not discussed or included in a figure. Furthermore, not all operations in any particularly described process may occur in all embodiments. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, the function's termination can correspond to a return of the function to the calling function or the main function.
Furthermore, embodiments of the subject matter disclosed may be implemented, at least in part, either manually or automatically. Manual or automatic implementations may be executed, or at least assisted, through the use of machines, hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine-readable medium. A processor(s) may perform the necessary tasks.
Automatically detecting faulty equipment, i.e., anomaly detection, is an essential task in the modern industrial society. Anomaly detection in the context of a machine performing a task may be considered as the detection of some observations that provide attributes and inferences different from those when the machine performs normal operations. While it is difficult to exactly predict the fault or anomaly when it occurs, there have been several attempts at automating the task of detecting an occurred anomaly using measurements pertaining to the machine. Some signals indicative of the operation of the machine could be of various types, such as audio signals, video signals, vibration, and the like. The choice of signal in consideration for detection of anomalous operation varies based on the type of the machine, environment in which it is operated, and measurement mechanisms available for acquiring such signals. In some scenarios, anomalous sound detection is advantageous because can be adapted to a vast variety of machines and operations.
With regards to physical machines such as those that comprise moving parts, sound or acoustic signals are good indicators of anomalous operation for such machines. Also, performing anomaly detection from sound, i.e., anomalous sound detection, is appealing due to factors such as sensor cost and ability to measure signals without line of sight. However, conventionally training neural networks to predict the anomaly requires anomalous samples. This is a difficult task since anomalies do not occur when desired and are often rare. Secondly, deliberately inducing a fault in the operation of machines to cause anomalous operation is not practically feasible due to potential damage to the machines due to such faults. Furthermore, anomalies are governed by several operating parameters of the machine and as such generating a rich set of training data that caters to all types of anomalies would be a long and often an impossible task. Therefore, audio or not, practical anomaly detection design is hampered by the difficulty of collecting anomalous samples which, beyond the cost of labeling, is further affected by issues such as the rare occurrence of anomalies or the cost associated with deliberately provoking them. As such, unsupervised approaches are particularly of interest in the field.
For example, unsupervised anomalous sound detection techniques are useful when only data in normal operating conditions (i.e., non-anomalous operation of the machine) is available for training the model parameters. Typical approaches for unsupervised anomalous sound detection include those based on autoencoder-like architectures, where a model trained only on normal data to reconstruct its input should exhibit a large reconstruction error when presented with an anomalous example at inference time. Another class of approaches, referred to as surrogate task models, uses an alternative supervised training task to learn a model of normality, and then measure deviations from normal to predict anomalies. Example surrogate tasks include: (1) outlier exposure, where sounds that are known to be quite different from the machine of interest are used as synthetic anomalies, (2) predicting/classifying metadata (e.g., machine instance) or attributes (e.g., operating load), or (3) learning to predict what augmentations (e.g., time-stretching or pitch-shifting) are applied to an audio clip.
Predicting or classifying metadata (e.g., machine instance) or attributes (e.g., operating load) of an audio clip involves learning to correctly classify these metadata or attributes for normal machine sounds by learning a representation of these metadata or attributes through the distribution of learned embeddings (i.e., output vectors from the last hidden layer). An anomaly detector is built on top of that “normal” embedding distribution learned from solely normal machine sounds, using the relative position of an unseen sample's embedding from the “normal” embedding distribution to determine the likely condition, normal or anomalous, of that sample. For example, the distance of the embedding to its K-nearest inlier neighbors in the embedding space can be used as criterion, setting a distance threshold above which a sample is deemed anomalous.
Some example embodiments realize that hyperbolic spaces demonstrate the ability to encode hierarchical relationships much more effectively than Euclidean space when using those embeddings for classification. Some example embodiments are based on recognizing a new property of the hyperbolic space relevant to the context of anomaly detection of the operation of machines. This new additional property gives the hyperbolic spaces an advantage over Euclidean spaces that has not been recognized before. Specifically, to carry information sufficient to detect anomalies with practical reliability, the encodings of a signal into hyperbolic space need to have fewer dimensions than the encodings of the same signal into Euclidean space. For example, in the context of sound anomaly detection, the 128-dimensional encoding of the sound signal in the Euclidean space carries the same information as the 2-dimensional encoding of the same signal in the hyperbolic spaces.
Accordingly, some example embodiments attempt to train and analyze embeddings as vectors in hyperbolic space rather than the typical vectors in Euclidean space. A hyperbolic neural networks approach is practically appealing due to the fact that hyperbolic spaces can embed tree structures with arbitrary low distortion, as volume grows exponentially as a function of the distance to the space origin. Shallower tree nodes are then positioned closer to the origin, and deeper nodes farther. As such, it may more naturally accommodate hierarchical aspects of many audio tasks and datasets. Example embodiments described herein attempt to leverage such aspects, including through the use of hyperbolic neural networks. A particularly appealing aspect in the context of surrogate-task anomaly detection methods is the corollary behavior of embeddings in hyperbolic space shown in such that, as information gets organized hierarchically in space, the distance of an embedding to the origin expresses something alike some notion of certainty regarding the characteristics of the input. Some example embodiments explore the benefits of replacing the Euclidean space by a hyperbolic space for learning embeddings in a surrogate-task-based method that makes for a simple and effective detection method.
The detected anomaly is reliable to a good extent given that the anomaly is quantized as a score which in turn is aggregated based on two strong parameters indicative of the underlying anomaly-first is the probability of a signal belonging to a certain type and second is the geodesic distance of embeddings of the signal from an origin of the hyperbolic space. As such, the computed anomaly score is reliable enough to engineer control solutions for maintenance and/or some control actions in view of the detected anomaly. Accordingly, some example embodiments provide measures for controlling one or more aspects of the machine for which an anomaly is detected. These and several other advantages and improvements are substantiated through the description that follows.
The anomaly detection system 100 comprises an embedding neural network 20 that produces a Euclidean embedding corresponding to each segment of the input signal 102. In this regard, when the input signal 102 is an acoustic signal, the neural network 20 maps the values of the time-frequency bins into high dimensional embeddings (i.e. in Euclidean space). For example, when the input signal 102 is an acoustic signal, the embedding neural network 20 generates an embedding as a vector of numbers, for each frame of sound, where a frame typically has a fixed time duration. For Euclidean embeddings in 2 dimensions, for example, the network 20 outputs two numbers for two coordinates (say x and y) in 2-D. For Euclidean embeddings with more dimensions, the network 20 outputs as many numbers as there are dimensions, for each frame. There typically isn't a straightforward intrinsic geometric property to exploit in Euclidean space. For example, the origin of the space (x=0 and y=0) has no particular meaning. It can be moved without impacting the system. Accordingly, the anomaly detection system 100 comprises a Euclidean to hyperbolic converter 25 that converts the Euclidean embeddings to hyperbolic embeddings. In this regard, the converter 25 projects each of the Euclidean embeddings into the hyperbolic space. The advantage of using hyperbolic embeddings over Euclidean embeddings stems from the fact that hyperbolic embeddings require lesser resources and time for processing since in the context of sound anomaly detection, the 128-dimensional encoding of the sound signal in the Euclidean space carries the same information as the 2-dimensional encoding of the same signal in the hyperbolic spaces.
The hyperbolic embeddings are provided to a classifying neural network 30 of the anomaly detection system 100. According to some example embodiments, the classifying neural network 30 is a linear classifier in hyperbolic space that learns to organize the embeddings in such a way that the ancillary aspects of non-anomalous operation of the machine can be identified through some geometric criterion based on their respective locations. By the nature of the hyperbolic space's geometry, “linear” boundaries (i.e., hyperplanes) are not lines but arcs in 2-D, and they are not “flat” but “bended” in higher dimensions, when considered from the perspective of a Euclidean space. According to some example embodiments, the classifying neural network 30 is a deep neural network trained to classify hyperbolic embeddings as belonging to a certain attribute type. In this regard, for each hyperbolic embedding, the classifying neural network 30 computes a probability that the corresponding segment of the input signal 102 belongs to a certain attribute type. In some example embodiments, the attribute types may be defined in relation to the machine for which anomaly detection is being performed. For example, the attribute type may indicate an identifier such as a model number of the machine, a type identifier of the machine, running speed, or running mode and the like. The goal of the classifying neural network 30 is to find geometric boundaries between the different types in hyperbolic spaces and it is trained beforehand in such a manner that it learns boundaries for which a certain attribute type of the machine is detected correctly as often as possible. The classifying neural network 30 outputs the probabilities that a particular signal segment of the input signal 102 belongs to the different known types of attributes, depending on the position of the corresponding embeddings with respect to the trained type boundaries.
The anomaly detection system 100 also comprises an anomaly scoring system 40 for quantifying the extent of anomaly detected in the input signal 102. In this regard, the anomaly scoring system 40 computes an anomaly score 104 for each segment of the input signal 102 (or each hyperbolic embedding) and outputs an anomaly score for the input signal 102 based on aggregation of the anomaly scores of the individual embeddings. According to some example embodiments, the scoring of anomaly is performed based on the probabilities output by the classifying neural network 30. According to some example embodiments, those probabilities and/or some intrinsic geometric aspect of the location of an embedding in the embedding space are aggregated to generate an anomaly score 104 for each hyperbolic embedding. The higher the score, the more likely the signal segment has anomalous data. The anomaly detection system 100 outputs the anomaly score 104 as a measure of the underlying anomaly in the operation of the machine.
According to some example embodiments, the anomaly score 104 may be output for each segment of the input signal 102. The anomaly score 104 may be rendered to a display device through an output port of the anomaly detection system or to a control system 50 of the machine for further processing. The control system 50 may be internal or external to the anomaly detection system 100. The control system 50 compares the anomaly score 104 with a threshold to initiate some control actions. The threshold may be defined in relation to the operating conditions of the machine, the type of the machine, or some other suitable parameter to deem the operation of the machine as anomalous or non-anomalous. In an event where the anomaly score 104 is more than or equal to the threshold, the control system 50 may output one or more control commands 106 to control the operation of the machine. The control commands 106 may include firing an alert or alarm, changing one or more operational parameters of the machine, initiating a maintenance action for the machine, or a combination thereof.
In this way, the anomaly detection system 100 is able to detect anomaly in the operation of the machine without incurring significant resources and time for computation. Also, since the neural networks 20 and 30 are trained only with training data corresponding to normal operations of the machine, the anomaly detection system 100 is able to categorize any data that deviates from the normal operation of the machine as anomalous.
The classifying neural network 230 is trained in an unsupervised manner to classify some ancillary aspect of the training data (e.g., type of machine, model number, running speed, running mode). The classifier neural network 230 is also trained to learn to organize the hyperbolic embeddings 222 in such a way that those ancillary aspects can be identified through some geometric criterion based on their respective locations in the hyperbolic space. As such, the classification is learned from embeddings in hyperbolic geometry. The embedding neural network 220 generates an embedding as a vector of numbers for each segment of an input training signal from the training data 202A . . . 202N. The embedding is fed to the classifier neural network 230 which finds geometric boundaries 232 between the different types of the sources. These different types are defined through various attributes, each indicative of one more properties of each of the signal sources 1, 2, . . . N.
The geometric boundaries may allow an interpretation of hyperbolic embeddings as classification probabilities of an ancilliary aspect of training data samples associated with these embeddings to be of a specific kind (or value). Geometric boundaries may be determined by finding boundaries that minimize, as best as possible, a loss or cost function. The cost function may correspond to a cross-entropy loss between, on the one hand, the classification probabilities interpreted from the position of the embeddings with respect to the geometric boundaries and, on the other hand, the true kind (or value) of the ancillary aspect of the training data samples associated with those embeddings. Finding the geometric boundaries that minimize a loss function may be done using a stochastic gradient optimization algorithm. The stochastic gradient optimization algorithm may be an adaptive optimization algorithm. The adaptive optimization algorithm may be a Riemannian Adam optimization algorithm, i.e., an algorithm generalizing the Adam optimization algorithm to Riemannian manifolds (as in spaces) which covers hyperbolic spaces.
A variety of examples are presented in the training process, to optimize both the embedding and classifier networks such that the classifier 230 learns boundaries for which the type of signal source is detected correctly as often as possible. In some example embodiments, the output of the classifying network 230 may be checked with a validation set. According to some example embodiments, the embedding neural network is jointly trained with the classifying neural network such that weights of the embedding and classifying neural networks are interdependent on each other.
One example training process and configuration is explained with respect to unsupervised anomalous sound detection where the input signal is an acoustic signal provided as a file. Each input signal file is processed using a STFT with a 1024-sample Hann window and a 256-sample hop size, resulting in 313 frames from which the magnitudes are taken. For each epoch, the network is trained with a block of 32 consecutive STFT frames from each file, i.e., 6000 blocks of size 32×1025, selected randomly for each file. These blocks are grouped in batches of size 32. At testing, input file's magnitude STFT is broken in overlapping blocks of 32 frames with a step size of 1 frame and then the logits are gathered for each of the blocks as basis for the score of that input file.
Several operational and structural aspects of the anomaly detection system will now be described considering the input signal to be an acoustic signal. However, it may be contemplated that the description of such aspects is non-limiting and can be extended to other types of input signals without deviating from the scope of this disclosure.
An example run-time configuration of the anomaly detection system of
As baselines, a system is trained using a regular multinomial regression classifier block (“Euclidean”), i.e., we have an identity mapping and the logit layer which functions as the anomaly scoring system 40 of
Some examples of the classifier backbone 404 and the classifier block 410 of
For example, the signal 252 may be an acoustic signal from a machine. In this embodiment, the classifier backbone i.e., the embedding neural network 220 of
The hidden data sample output from the layer 504 may be further processed by a 2-dimensional depthwise convolutional layer 506 which may output another hidden vector. The depthwise convolutional layer 506 may comprise a succession of three operations, which may be a depthwise convolution operation, a batch normalization operation and a pointwise parametric rectified linear operation.
The hidden data sample output by the layer 506 may be further processed by a succession of N bottleneck layer 508. The input of the first bottleneck layer 508A may be the second hidden data sample. For each of the nth bottleneck layers (i.e., from the one succeeding 508a to 508N), the hidden data sample input to that layer operation may be the hidden data sample output from the previous ((n−1)th) bottleneck layer operation. The hidden data sample output by the Nth bottleneck 508N may be further processed by a 2-dimensional convolutional layer 510, which may output another hidden data sample. The layer 510 may comprise a succession of three operations, which may be a convolution operation, a batch normalization operation and a pointwise parametric rectified linear operation.
The hidden data sample output from the layer 510 may be further processed by a 2-dimensional global depthwise convolutional layer 512 which outputs another hidden data sample. The depthwise convolutional layer 512 may comprise a succession of two operations, which may be a depthwise convolution operation and a batch normalization operation.
These hidden data samples may be further processed by a linear layer 514. The linear layer 514 may output an embedding data sample 516. The linear layer comprises a succession of two operations, which may be a linear operation and a batch normalization operation.
Each convolution and depthwise convolution in the sequence of aforementioned operations may be characterized by a set of parameters. Those parameters may be a pair of integer numbers corresponding to the span of a 2-dimensional convolution operation kernel, an integer number corresponding to a stride of the convolution operation stride and an integer number corresponding to a number of channels. The aforementioned linear operation may further be characterized by a set of parameters. Those parameters may be a single integer number corresponding to a number of channels (which in some embodiments may be called an embedding dimension).
Each bottleneck layer operation may process an input (hidden) data sample 522 with a succession of operations. Each of the bottleneck layers 508 may be characterized by a set of parameters. Those parameters may be an integer number corresponding to a bottleneck stride, an integer number corresponding to a bottleneck number of output channels, an integer number corresponding to a bottleneck expansion factor. The input data sample 522 may be processed by a 2-dimensional convolutional layer 524 which may output an internal hidden data sample. The convolutional layer 524 may comprise a succession of three operations, which may be a convolution operation, a batch normalization operation and a pointwise parametric rectified linear operation.
The internal hidden data sample output by the layer 524 may be further processed by a 2-dimensional depthwise convolutional layer 526 which may output another internal hidden data sample. The depthwise convolutional layer 526 may also comprise a succession of three operations, which may be a depthwise convolution operation, a batch normalization operation and a pointwise parametric rectified linear operation. This internal hidden data sample may be further processed by a 2-dimensional convolutional layer 528 which may output another internal hidden data sample. The convolutional layer 528 may comprise a succession of two operations, which may be a convolution operation and a batch normalization.
In some embodiments, the aforementioned 2-dimensional depthwise convolutional layer operation may be set with a stride of 1 or 2 depending on the bottleneck layer operation preferred settings. If the stride is set to 1 and the input hidden data sample and the last internal hidden data sample have the same number of channels, this last internal hidden data sample may be added to the input hidden data sample of the bottleneck layer operation to form another internal hidden data sample. The last internal hidden data sample that is computed may be an output data sample 530 of the bottleneck layer operation. Each convolution and depthwise convolution operation may have a kernel size of span 1×1. The first convolution layer 524 may output an internal hidden data sample whose dimension may be the dimension of the input hidden data sample of the block multiplied by a bottleneck expansion factor. The depthwise convolution layer 526 may take as input an internal hidden data sample whose dimension may be the dimension of the input hidden data sample of the bottleneck block multiplied by a bottleneck expansion factor and output an internal hidden data sample whose dimension may be the dimension of the input data sample to the bottleneck block multiplied by a bottleneck expansion factor. The depthwise convolution operation may further have a stride corresponding to a bottleneck stride. The second convolution operation may take as input an internal hidden data sample whose dimension may be the dimension of the input hidden data sample of the bottleneck block multiplied by a bottleneck expansion factor and output an internal hidden data sample whose dimension may be a bottleneck number of output channels. Each aforementioned convolution operation, depthwise convolution operation and linear operation may not use a biasing component. The hidden data samples as well as the embedding data sample may be expressed as coordinates in a Euclidean space.
According to some example embodiments, the parameters defined with reference to
In some embodiments, the mapping function operation 554 may correspond to an mapping between coordinates in an Euclidean space and coordinates in an hyperbolic space. In some other embodiments, the mapping between coordinates in an Euclidean space and coordinates in an hyperbolic space may be approximated by a mapping between coordinates in an Euclidean space and coordinates in a Poincaré ball space. In this context, the output embedding data sample from the mapping function operation 554 may be an embedding data sample expressed as coordinates in hyperbolic space, or Poincaré ball space as appropriate. In some embodiments, the logit function operation 558 may correspond to an hyperbolic multilinear regression function operation.
A hyperbolic space of dimension n is the unique simply connected, n-dimensional Riemannian manifold of a constant sectional curvature. There are many ways to construct a hyperbolic space as an open subset of n (the real coordinate space of dimension n) with an explicitly written Riemannian metric; such constructions are referred to as models. Some example embodiments, define a hyperbolic space as a unique simply connected, n-dimensional complete Riemannian manifold with a constant negative sectional curvature equal to −c. Thus, the hyperbolic spaces are constructed as models for n-dimensional hyperbolic geometry in which the points of the hyperbolic geometry lie in an n-dimensional open subset of the real coordinate space of dimension n. Some examples of the hyperbolic geometry model is a Poincaré disk, a Poincaré ball or sphere. While some concepts described herein are done in the context of a Poincaré ball, it may be contemplated that example embodiments described herein are not limited to any specific model of hyperbolic geomtery nor any value of the constant negative curvature.
Riemannian geometry generalizes Euclidean geometry, by which an n-D Riemannian manifold is any pair of an n-D differentiable manifold and a so-called metric tensor field. In the context of that theory, a Euclidean manifold is simply a differentiable manifold whose metric tensor field is the identity everywhere. On the other hand, a hyperbolic manifold is any Riemannian manifold with negative constant sectional curvature. Interestingly, even though hyperbolic spaces are not vector spaces in the traditional sense, the ability to find equivalents in hyperbolic space is a task of interest to many typical vector operations found in deep learning. Due to the challenges of embedding isometrically a hyperbolic space into Euclidean space, models of hyperbolic geometry should be used by which a subset of Euclidean space is endowed with a hyperbolic metric. One common model is the n-D Poincaré ball with unit curvature defined on the manifold of the n-D unit ball n={x∈
n:∥x∥<1} endowed with the metric tensor field
with In the identity. Other hyperbolic spaces with non-unit curvature may also be considered.
A hyperbolic hyperplane Hp(z, Hp
The “exponential map” may be defined as a mapping from the Euclidean space n to the Poincaré ball space, by associating a vector u∈
n to the point reached from 0∈
n in unit time following the geodesic with initial tangent vector u. The reverse map is the “logarithmic map” expressed as
A common practice in deep learning approaches for classification tasks is to designate the vectors generated by some hidden layer such as the deepest hidden layer, as embeddings. The embeddings obtained from neural networks trained for general classification have often been found to be useful alternative representations of the input data, where it is expected that the embeddings' distribution starts encoding high-level characteristics of the data as they become better representations for classification. As such, they are often leveraged for downstream tasks different from the original classification task.
Hyperbolic spaces possess geometric properties that make them specifically appealing in that context. In particular, their volume grows exponentially as we get further from the origin, unlike in Euclidean space where volume grows polynomially. As a result, it is possible to embed tree structures in a hyperbolic space with arbitrary low distortion. At the same time, the geodesic distance between two points behaves similarly as the distance between two nodes in a tree. As such, hierarchical characteristics may be expected to be effectively encoded in that space. Concurrently, high-level aspects of many typical datasets may be expected to exhibit natural hierarchies. Hence, some example embodiments realized the benefit in many applications in mapping the embeddings generated by a deep neural network to a hyperbolic space before performing the geometric equivalent of a multinomial regression in that space using hyperplanes such as the one shown in
According to some example embodiments, each hyperbolic embedding corresponds to a vector indicative of a unique attribute type associated with the machine. The unique attribute type may be an identifier of the machine, a model, make, year of manufacture, class of the machine, a section identifier of the machine and the like.
With reference to the configurations of +32×1025 generating an output logit vector y(i)∈
6 whose ground truth class/section is ki, the loss may be expressed as:
For the hyperbolic model, some example embodiments use the Riemannian Adam optimizer. For the baselines, the regular PyTorch 1.10 Adam optimizer may be used. A learning rate of 10−4 may be used as default parameters otherwise, and each system may be trained for 1000 epochs, with checkpoints every 25 epochs.
The score of a given signal segment is based on the negative logit corresponding to the ground truth section of that signal segment. In the case where a given signal is split in multiple segments, the score of the signal becomes the segment-average of the score. In other words, if ψs(xk) denotes the predicted probability that the kth segment of a signal x, of ground truth section or machine t, belongs to section or machine s, the score may be given as:
For hyperbolic embeddings, the opposite segment-average geodesic distance to the origin in the Poincaré ball is also considered for computing anomaly score, on correlating that distance with an idea of classifier uncertainty. In other words, the score may be written as:
The two scores and
* may be ensembled. Since the range of
(resp.
*) is]−∞, ∞[(resp.]−∞, 0]), we first apply its typical mapping to [0, 1], i.e., the sigmoid (resp. 1+ tanh). Then, using a weight w tuned at validation, the score
ens may be given as:
However, in order to improve the reliability of the anomaly score, some example embodiments aggregate the probabilities 832, 834 and/or some intrinsic geometric aspect of the embedding location in the embedding space, to generate an anomaly score. In this regard, some example embodiments recognize that unlike in Euclidean space, the origin (center) of the hyperbolic space has a unique meaning. Some embodiments are based on recognition that additionally or alternatively to the anomaly score determined based on the classification of the hyperbolic embedding, the location of the hyperbolic embedding with respect to the origin of the hyperbolic space provides additional clues relevant for anomaly detection. This is because the results of classifying depend not only on the data but on the training of the classifier, but the location with respect to the origin depends on the quality of the data and can be independent of the classification. Due to training primarily on normal data, such an indicator about the quality of the input data can be advantageous. Notably, the anomaly indicator dependent on the distance to the origin of the embedding space is not available for Euclidean spaces.
Referring to
The processor 902 may include suitable logic, circuitry, and/or interfaces that may be configured to execute program instructions associated with different operations to be executed by the anomaly detection system 900. The processor 902 may include one or more specialized processing units, which may be implemented as an integrated processor or a cluster of processors that perform the functions of the one or more specialized processing units, collectively. The processor 902 may be implemented based on a number of processor technologies known in the art. Examples of implementations of the processor 902 may be an x86-based processor, a Graphics Processing Unit (GPU), a Reduced Instruction Set Computing (RISC) processor, an Application-Specific Integrated Circuit (ASIC) processor, a Complex Instruction Set Computing (CISC) processor, a microcontroller, a central processing unit (CPU), and/or other computing circuits.
The memory 904 may include suitable logic, circuitry, and/or interfaces that may be configured to store program instructions to be executed by the processor 902. The memory 904 may be further configured to store the trained neural networks such as the embedding neural network or the embedding classifier. Without deviating from the scope of the disclosure, trained neural networks such as the embedding neural network and the embedding classifier may also be stored in a database. Example implementations of the memory 904 may include, but are not limited to, Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Hard Disk Drive (HDD), a Solid-State Drive (SSD), a CPU cache, and/or a Secure Digital (SD) card.
The output interface 906 may include suitable logic, circuitry, interfaces, and/or code that may be configured to receive an input and provide an output based on the received input. The output interface 906 may include various input and output devices, which may be configured to communicate with the processor 902. For example, the anomaly detection system 900 may receive a user input via the input interface 908 to select a region on the hyperbolic space or a hyperplane on the hyperbolic space. Examples of the input interface 908 may include, but are not limited to, a touch screen, a keyboard, a mouse, a joystick, or a microphone.
The input interface 908 may also include suitable logic, circuitry, interfaces, and/or code that may be configured to facilitate the processor 902 to communicate with a database and/or other communication devices, via the communication network 910. The input interface 908 may be implemented by use of various known technologies to support wireless communication of the anomaly detection system 900 via communication network 910. The input interface 908 may include, for example, an antenna, a radio frequency (RF) transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a coder-decoder (CODEC) chipset, a subscriber identity module (SIM) card, a local buffer circuitry, and the like.
The input interface 908 may be configured to communicate via the communication network 910. The network 910 comprises wired or wireless communication with networks, such as the Internet, an Intranet, a wireless network, a cellular telephone network, a wireless local area network (LAN), or a metropolitan area network (MAN). The wireless communication may be configured to use one or more of a plurality of communication standards, protocols and technologies, such as Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), wideband code division multiple access (W-CDMA), Long Term Evolution (LTE), code division multiple access (CDMA), time division multiple access (TDMA), Bluetooth, Wireless Fidelity (Wi-Fi) (such as IEEE 802.11a, IEEE 802.11b, IEEE 802.11g or IEEE 802.11n), voice over Internet Protocol (VoIP), light fidelity (Li-Fi), or Worldwide Interoperability for Microwave Access (Wi-MAX).
The following description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the following description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing one or more exemplary embodiments. Contemplated are various changes that may be made in the function and arrangement of elements without departing from the spirit and scope of the subject matter disclosed as set forth in the appended claims.
Specific details are given in the following description to provide a thorough understanding of the embodiments. However, understood by one of ordinary skill in the art can be that the embodiments may be practiced without these specific details. For example, systems, processes, and other elements in the subject matter disclosed may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known processes, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments. Further, like reference numbers and designations in the various drawings indicate like elements.
Also, individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process may be terminated when its operations are completed but may have additional steps not discussed or included in a figure. Furthermore, not all operations in any particularly described process may occur in all embodiments. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, the function's termination can correspond to a return of the function to the calling function or the main function.
Furthermore, embodiments of the subject matter disclosed may be implemented, at least in part, either manually or automatically. Manual or automatic implementations may be executed, or at least assisted, through the use of machines, hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine readable medium. A processor(s) may perform the necessary tasks.
Various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of a number of suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments. The above-described embodiments of the present disclosure can be implemented in any of numerous ways. For example, the embodiments may be implemented using hardware, software, or a combination thereof. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers. Such processors may be implemented as integrated circuits, with one or more processors in an integrated circuit component. Though, a processor may be implemented using circuitry in any suitable format.
Embodiments of the present disclosure may be embodied as a method, of which an example has been provided. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts concurrently, even though shown as sequential acts in illustrative embodiments.
Although the present disclosure has been described with reference to certain preferred embodiments, it is to be understood that various other adaptations and modifications can be made within the spirit and scope of the present disclosure. Therefore, it is the aspect of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the present disclosure.