The present invention relates to the field of management of normal and abnormal data. More specifically, it relates to a neural network based anomaly detector, a method of neural network based anomaly detection and method of training a neural network based anomaly detector.
The distinction between normal and abnormal data is a growing field of search that has a number of applications.
One of them is anomaly detection and localization. Its purpose is to detect automatically if a sample of data is “normal” of “abnormal”, and, when an anomaly is detected, localize it. A concrete application of this is the detection, in a production line, of normal or abnormal products. This can be done by taking a picture of each product, and automatically detecting if the picture corresponds to a normal and abnormal product.
The automatic detection of what is “normal” and what is “abnormal” is a notoriously difficult problem, which has been addressed in different ways, which generally rely on learning and generating one or more data models.
A first approach to tackle this issue consists in performing supervised learning. Supervised learning consists in learning models from labeled input data: each learning sample is associated with a label indicating if the sample is normal and abnormal. Abnormal samples may also be associated with labels indicating a type of anomaly. Once the model is trained, it can be used to classify new samples either as normal or abnormal. The problem with such approaches is that the model can only learn anomalies which have already been encountered. Therefore, they present a strong risk that a sample which is abnormal, but whose anomaly has not been learnt previously will be classified as normal.
On the other hand, unsupervised learning can detect anomalies without needing labeled abnormal learning data. In order to do so, some solutions learn a generative model of the data using a set of learning sample representing normal data: the purpose of such a model is to output a sample that could be considered to be part of the original data distribution, given an input in some compressed data space. In image processing, typical values can be to generate 256*256 pixel images from a 64 dimensions compressed data space. Such models are mainly generative adversarial networks (GAN), variational auto-encoders (VAE), PixelCNN, and hybrids of those models. Given a sample, to detect an anomaly, existing solutions encode the sample into their compressed data space, then decode the compressed representation to obtain a new, generated, sample that we call the “reconstruction”. They also allow localizing the anomaly, by comparing the reconstruction to the input sample, for example pixel per pixel, or using more global filters, and considering that a zone of the sample that is different from the reconstruction is the localization of an anomaly. A characteristic of prior art systems is a tendency to flag as anomalies deviations that a human assessor would deem insignificant, whilst overlooking other deviations that a human assessor would consider unacceptable.
The article by Paul Bergmann et al. entitled«uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings” published by MVTec Software GmbH presents a mechanism based on a group of regressive models processing respective parts of an image.
There is therefore the need of a method and device which is able to effectively identify anomalies based on a limited training phase.
In accordance with the present invention in a first aspect there is provided a method of constructing an anomaly detector for detecting an anomaly in a digital sample of a predetermined type and predetermined first resolution. The method comprises exposing a teacher neural network trained to extract features from digital data sets, to a plurality of digital samples of a training dataset of the predetermined type, to extract features representing each said digital sample at one or more level, exposing an auto-encoder to each digital sample to reconstruct features representing the digital sample at one or more levels, determining a difference value reflecting the difference between the extracted features and respective reconstructed features for each said sample and repeating the steps of reconstructing features representing said training dataset with further said parameters until a minimal difference value is obtained across the training dataset.
In a development of the first aspect, the training dataset of the neural network is greater than the training dataset of the anomaly detector.
In a further development of the first aspect, the method comprises the further step of selecting a threshold indicating the presence of an anomaly with reference to the distribution of difference values obtained across the training dataset.
In a further development of the first aspect, the minimum difference value obtained across the training dataset is selected as the threshold indicating the presence of an anomaly.
In a further development of the first aspect, the method comprises the further steps of identifying a subset of the datasets of the training dataset as constituting anomalous datasets, and isolating the difference values output by the anomaly detector for the anomalous datasets to derive a characteristic difference value, and selecting a threshold indicating the presence of an anomaly with reference to the characteristic difference value.
In a further development of the first aspect, the method comprises the further step of adjusting the resolution of the features output by the teacher neural network or the auto-encoder or the output of one or more error determinations to a standard resolution.
In a further development of the first aspect, the method comprises the further steps of adjusting the resolution of the features output by each said error determination to a standard resolution, wherein said step of determining a difference value comprises up-sampling each said set of features to a predetermined resolution, consolidating the up-sampled sets of features and then summing over the consolidated dataset to obtain said difference value.
In a further development of the first aspect, the method comprises the further steps of exposing a teacher neural network trained to extract features from digital data sets said digital sample to extract features representing said digital sample at one or more levels, exposing an auto-encoder trained to reconstruct said features of a training dataset of the predetermined type to the digital sample to reconstruct features representing the digital sample at one or more levels, determining a difference value reflecting the difference between each extracted feature and a respective reconstructed feature and comparing said difference value to a threshold, and in a case where said difference value exceeds said threshold, identifying said digital sample as anomalous.
In a further development of the first aspect, the method comprises the further step of adjusting the resolution of the features output by the teacher neural network or the auto-encoder or the output of one or more error determinations to a standard resolution.
In a further development of the first aspect, the method comprises the further steps of adjusting the resolution of the features output each error determination to a standard resolution, wherein said step of determining a difference value comprises up-sampling each set of features to a predetermined resolution, consolidating the up-sampled sets of features and then summing over the consolidated dataset to obtain a difference value map, and comparing each value of said difference value map to a second threshold, and flagging values in an anomaly map exceeding said threshold as anomalous.
In accordance with the present invention in a second aspect there is provided an anomaly detector for detecting anomalies in digital samples. The anomaly detector comprises a teacher neural network trained to extract features from digital data sets at one or more levels, an auto-encoder trained to reconstruct features representing the digital sample at one or more levels, a difference calculator adapted to determine a difference value reflecting the difference between said extracted features and a respective said reconstructed feature, and to compare said difference value to a threshold, and in a case where said difference value exceeds said threshold, to identify said digital sample as anomalous.
In a further development of the second aspect the anomaly detector further comprises an adaptor unit configured to adjust the resolution of the features output by the teacher neural network or the auto-encoder or by said difference calculator to a standard resolution.
In a further development of the second aspect the adaptor unit is configured to adjust the resolution of the features output by said difference calculator to a standard resolution, said anomaly detector further comprising an error mapper comprising an up-sampler configured to up-sample each set of features to a predetermined resolution, to consolidate the up-sampled sets of features and then sum the error values over the consolidated dataset to compile a difference value map, and to compare each value of the difference value map to a second threshold, and to flag values in an anomaly map exceeding said threshold as anomalous.
In a further development of the second aspect the teacher neural network comprises a trained convolutional neural network.
In accordance with the present invention in a second aspect there is provided a computer program comprising instructions implementing the steps of the first aspect.
The invention will be better understood and its various features and advantages will emerge from the following description of a number of exemplary embodiments provided for illustration purposes only and its appended figures in which:
As shown in
The digital samples 121, 122, 123, 124 etc. will generally be of a particular type. For example, the digital samples may comprise images, sound data, data from an electronic nose, and the like. For the purposes of the following examples, the digital samples will be described generally in terms of image data, however the skilled person will appreciate that embodiments may process data samples of any consistent type.
In particular, data samples may represent samples of an industrial product, for example on a production line, whereby the detection of anomalies may contribute to a quality control or other manufacturing process.
The teacher neural network 110 may comprise any convolutional neural trained to extract features from digital data sets at one or more levels as discussed in more detail below. The training of this teacher neural network 110 is outside the scope of the present invention. The teacher neural network is trained to classify data samples of the type to be processed as discussed above, but need not be specifically trained for the specific expected content of the data samples. For example, if an embodiment is intended to identify structural anomalies in engine parts on the basis of image data, a teacher neural network trained to classify general image data may be selected, but need not be specifically trained to classify engine parts.
A teacher neural network trained to classify general image data may be selected that is not specifically trained to classify the expected content of the data. By using a teacher neural network trained to classify general image data, the output of the anomaly detector in accordance with embodiments has been found to attach significance to certain features in the anomaly detection process in a manner more closely aligned with the degree of significance that a human assessor would assign to these same features.
As shown in
An auto-encoder is a type of artificial neural network that consists in encoding samples to a representation, or encoding of lower dimension, then decoding the sample into a reconstructed sample, and is described for example in Liou, C. Y., Cheng, W. C., Liou, J. W., & Liou, D. R. (2014). Auto-encoder for words. Neurocomputing, 139, 84-96. The principle of auto-encoder is described in more details with reference to
On this basis, as shown, the auto-encoder 130 comprises an encoding section 131 and a decoding section 132.
A method of training the auto-encoder 130 is described in further detail with reference to
As shown, the anomaly detector further comprises a difference calculator 140 adapted to determine a difference value reflecting the difference between the extracted features and a respective reconstructed feature.
More particularly, as shown, the teacher neural network 110 outputs extracted features at two levels indicated by arrows 112 and 113. It will be understood that in the context of convolutional neural networks and as discussed in more detail with reference to
As shown, the outputs 112, 113 of the teacher neural network, as well as the original data sample are provided to the difference calculator 140. In other embodiments, the original data sample may not be provided to the difference calculator 140, and optionally one or more additional levels output by the teacher neural network may be used instead.
As shown, the difference calculator 140 further receives outputs from the encoder section 132 of the auto-encoder 130. The auto-encoder outputs a final encoded representation of the data sample at output 1321. Lower resolution intermediate outputs are also retrieved. As for the teacher neural network. Auto-encoder 130 outputs extracted features at two intermediate levels indicated by arrows 2323 and 1322. As shown the auto-encoder outputs extracted features three levels including the final encoded representation, represented by arrows 1321 and 1322 and a further level, as represented by the arrow 1323. If the original data sample is not provided to the difference calculator, the final encoded representation may also not be required. It will be appreciated that features may be output at any number of levels to the extent that these are supported by the underlying structure of the auto-encoder 130. For the sake of simplicity it assumed in the present embodiment that each feature output from the teacher neural network is matched with a corresponding feature output from the auto-encoder, and that the final encoded output from the auto-encoder is matched with the original data sample, and that the respective resolutions of each matched pair of features is the same. As discussed below, in other embodiments the number of feature outputs from the auto-encoder and the teacher neural network need not necessarily be the same, and the resolution of matched features as output by the auto-encoder and the teacher neural network need not be identical. In certain embodiments at least two respective levels may be taken into consideration by the difference calculator. Optionally, one of these levels may correspond to the comparison of the original data sample with a final encoded representation determined by the auto-encoder.
As show, on this basis the difference calculator 140 performs e.g. in error calculators 141, 142, 143 a value by value comparison of each matched pair of feature outputs. For example, as shown, error calculator 141 performs a value by value comparison of the original input sample with the final encoded output 1321 of the auto-encoder 130. Error calculator 142 performs a value by value comparison of the first intermediate feature output 112 of the teacher neural network 110 with a corresponding intermediate feature output 1322 of the auto-encoder 130. Error calculator 143 performs a value by value comparison of the second intermediate feature output 113 of the teacher neural network 110 with a corresponding intermediate feature output 1323 of the auto-encoder 130.
In the context of a digital image sample, the value by value comparison may comprise a pixel by pixel comparison.
The sum of the value by value error for each matched pair of extracted features is determined by each error calculator, and output as a level error value. The level error values are then summed in summer 145, possibly with level weighting factors as discussed below, to obtain a difference value, representing the degree of deviation of the input sample.
The difference value is then compared to a stored threshold 150. In a training phase, the difference value is used to determine optimisation of the auto-encoder parameters as described with reference to
As mentioned above, the teacher neural network 110 may comprise any convolutional neural trained to extract features from digital data sets at one or more levels as discussed in more detail below.
As shown, a convolutional neural network 200 comprises a convolutional part 210 and a fully connected/output layer 220.
Sample data 201 is provided to the left of the neural network, and in operation the data generally flows from left to right, for the final categorisation information for the input sample data to be output from the output section 220 on the right.
As shown, the convolutional neural network comprises a series of convolutional layers 211, 212, 213, 214, 215. The first convolutional layer processes the input sample data at its native resolution, while the subsequent layers 212, 213, 215 each comprise an initial Pooling layer 212a, 213a, 214a, 215a, which down-samples the output of the preceding layer to a new, lower resolution for processing in the current layer.
By way of example, the neural network may be a VGG16 neural network, as developed by the Oxford University Visual Geometry Group. The VGG16 neural network is a high performance deep convolutional neural network developed for image classification. This neural network is available “off the shelf” for free, ready-trained for the general classification of images, for example as represented by the imagenet image database. The structure shown in
In particular, the VGG16 Neural network is trained to extract features from digital data sets of the digital image type, and the layers 211, 212, 213, 215 are trained to extract features 231, 232, 233, 234, 235 representing said training dataset at one or more levels as described above.
The skilled person will appreciate that many other Neural Networks can be conceived having the general structure of
The Teacher neural network may be a generic, pre-trained neural network as described above, or may be developed and/or trained for the purposes of the present invention, however as stated above the development and training of the teacher neural network is outside the scope of the present invention.
The
Auto-encoders have been described for example in Liou, Cheng-Yuan; Huang, Jau-Chi; Yang, Wen-Chie (2008). “Modeling word perception using the Elman network”. Neurocomputing. 71 (16-18), and Liou, Cheng-Yuan; Cheng, Wei-Chen; Liou, Jiun-Wei; Liou, Daw-Ran (2014). “Auto-encoder for words”. Neurocomputing. 139: 84-96. Auto-encoders are a type of neural networks which are trained to perform an efficient data coding in an unsupervised manner.
An auto-encoder consists in a first neural network 320, that encodes the input vector xt into a compressed vector noted zt (t representing the index of the iteration), and a second neural network that decodes the compressed vector zt into a decompressed or reconstructed vector. {circumflex over (x)}t. The compressed vector zt has a lower dimensionality than the input vector xt and the reconstructed vector {circumflex over (x)}t: It is expressed using a set of variables called latent variables, that are considered to represent essential features of the vector. Therefore, the reconstructed vector {circumflex over (x)}t is similar, but in general not strictly equal to the input vector xt.
It is thus possible, at the output of the decoding, to compute both a reconstruction error, or loss function, and a gradient of the loss function.
The loss function is noted L(xt, {circumflex over (x)}t), and can be for example a quadratic function:
L(xt,{circumflex over (x)}t)=∥xt−{circumflex over (x)}t∥2 (Equation 1)
The gradient of the loss function can be noted ∇x
An auto-encoder will typically be trained in a training phase, with a set of reference vectors. The training phase of an auto-encoder consists in adapting the weights and biases of the neural networks 320 and 330, in order to minimize the reconstruction loss of for the training set. By doing so, the latent variables of the compressed vectors p are trained to represent the salient high-level features of the training set. Stated otherwise, the training phase of the auto-encoder provides an unsupervised learning of compressing the training samples into a low number of latent variables that best represent them.
Therefore, the training of the auto-encoder with a training set of normal samples results in latent feature which are optimized to represent normal samples. Therefore, after the training phase, when the auto-encoder encodes and decodes a normal sample, the compressed vector provides a good representation of the sample, and the reconstruction error is low. On the contrary, if the input vector represents an abnormal sample, or more generally a sample which is not similar to the samples of the training, set, the dissimilarities will not be properly compressed, and the reconstruction error will be much higher.
The training set of reference samples can thus be adapted to the intended training. For example:
It should be noted that, although an auto-encoder will work with a training set which is generally suited to the intended purpose, the results can typically be further improved by selecting training samples which are as representative as possible to the samples to process. For example:
The skilled man could thus select the training set that best suits its need according to the intended application. However, the input vector and vectors of the training set need to be of the same type, that is to say have the same dimension, and the corresponding elements of the vectors need to have the same meaning. For example, the input vectors, and vectors of the training set may represent images of the same dimension with the same color representation and bit depth, audio tracks of the same duration, with the same bit depth, etc.
In a number of embodiments of the invention, the auto-encoder is a variational auto-encoder (VAE). The variational auto-encoders are described for example by Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, or Diederik P. Kingma and Volodymyr Kuleshov. Stochastic Gradient Variational Bayes and the Variational Auto-encoder. In ICLR, pp. 1-4, 2014. The variational auto-encoder advantageously provides a very good discrimination of normal and abnormal samples on certain datasets. The invention is however not restricted to this type of auto-encoder, and other types of auto-encoder may be used in the course of the invention.
In a number of embodiments of the invention, the loss of the variational auto-encoder is calculated as:
L(xt,{circumflex over (x)}t)=∥xt−{circumflex over (x)}t∥2−DKL(q(zt|xt),p(zt)). (Equation 2)
This function allows ensuring a generative model is used, that is to say that the model is able to produce samples that have never been used for training.
In the VAE, a decoder model tries to approximate the dataset distribution with a simple latent variables prior p(z), with z∈l, and conditional distributions output by the decoder p(x|z). This leads to the estimate p(x)=∫p(x|z)p(z)dz that we would like to optimize using maximum likelihood estimation on the dataset. To render the learning tractable with a stochastic gradient descent (SGD) estimator with reasonable variance it is possible to use importance sampling, introducing density functions q(z|x) output by an encoder network, and Jensen's inequality to get the variational lower bound:
The reconstruction of the VAE can thus be defined as the deterministic sample fVAE(x) obtained by encoding x, decoding the mean of the encoded distribution q(z|x), and taking again the mean of the decoded distribution p(x|z).
It may be noted that variational encoder have the additional ability to model an uncertainty (e.g. in terms of a variance) on their reconstruction.
In order to produce more detailed reconstructions, it is possible to learn the variance of the decoded distribution p(x|z) for example as proposed by Bin Dai and David P. Wipf. Diagnosing and enhancing VAE models. CoRR, abs/1903.05789, 2019. In such cases, one variance parameter may be learned per feature channel (independently of their position). Providing an improved representation of normal features, better anomaly detection by effectively distinguishing anomalies that may be more apparent in a particular channel, and thereby improving anomaly detection. The variance parameter may act as a weight per feature channel so that they “properly” scale with one another.
On this basis, in a method of constructing an anomaly detector as described herein the operation of exposing an auto-encoder to each digital sample to reconstruct features representing said digital sample at one or more levels as described herein may further comprise modelling uncertainty as a variance parameter per feature, and the operation of determining a difference value reflecting the difference between said extracted features and respective said reconstructed features for each said sample may be weighted by the respective variance parameters.
Correspondingly, in a method of detecting an anomaly as described herein, in the operation of exposing an auto-encoder trained to reconstruct the features of a training dataset of predetermined type to said digital sample to reconstruct features representing said digital sample at one or more levels, the features may be associated with a variance parameter per feature, and in the operation of determining a difference value reflecting the difference between each extracted feature and a respective reconstructed feature, the difference value for each extracted feature may be weighted by the respective variance parameter.
Correspondingly, in an anomaly detector as described herein, the auto-encoder may be trained to reconstruct features associated with a variance parameter per feature; representing said digital sample at one or more levels, and the difference calculator may be adapted to determine a difference value reflecting the difference between each said extracted feature and a respective said reconstructed feature weighted by the respective variance parameters.
As shown, the method starts at step 400 before proceeding to steps 405 and 410.
At step 405 a teacher neural network trained to extract features from digital samples of the predetermined type is exposed to digital samples from a training dataset.
From step 405 the method proceeds to step 415, at which features representing each digital sample of the training dataset are extracted at one or more levels. It should be borne in mind that as discussed above the teacher neural network is pre-trained, and the training data a not exposed to the teacher neural network for the purpose of training it, but to elicit feature outputs for the purpose of optimising the training of the auto-encoder as described below.
The teacher neural network of the method of
At step 410, an auto-encoder is exposed to the same respective digital samples of the training dataset.
From step 410 the method proceeds to step 420 of reconstructing features representing each respective digital sample at one or more levels with a first set of parameters. For example, if the teacher neural network extracts features at three levels (e.g. 111, 112 and 113 in
The operation of the auto-encoder to generate reconstructed features is substantially as described with respect to
It will be appreciated that while steps 405/415 and 410/420 are shown as being performed in parallel, they may equally be performed in series. Whether performed in series or parallel, they may be performed in a number of different sequences, e.g. “405, 410, 420, 415”, “410, 405, 415, 420”, “405, 415, 410, 420” or “410, 420,405, 415”.
From steps 415/420 the method proceeds to step 430, at which a difference value is determined, reflecting the sum across all samples, of the difference between each extracted feature of each sample and the respective reconstructed feature.
It will be appreciated that while
The determination of the difference value may comprise summing the difference values obtained across the training set, determining an average, or any other suitable operation. Still further, the dataset may be processed in sub-batches, with an adjustment of parameters between each batch. For example, the difference for 128 samples may be summed or averaged and a gradient descent step taken. This approach reduces the memory requirements for storage of the intermediate computation necessary for gradient descent determination.
The method next proceeds to step 440 at which it is determined whether the difference value in minimised. Determining whether the difference value is minimised may comprise determining whether the difference value has plateaued for a number of epochs (one epoch being the time taken to process the entire dataset), or determining that a fixed number of epochs has expired, comparing the difference value to a predetermined minimum acceptable difference threshold, determining that the best difference level has not improved over a certain number of iterations by more than a minimum improvement threshold, and the like.
In a case where it is determined at step 440 that the difference value is not minimised, the method adjusts the parameters of the auto-encoder at step 450 and loops back to steps 405/410 to repeat the steps of reconstructing features representing said training dataset with new auto-encoder parameters until a minimal said difference value is obtained at step 450.
The adjustment of the parameters of the auto-encoder may be performed in any of the manners known to the skilled person, for example on the basis a stochastic gradient descent algorithm, where model weights are updated each iteration using the back-propagation of error algorithm.
If it is determined that the difference value is minimised, the method terminates at step 460.
By this means, the parameters of the auto-encoder, that is to say, the weights and biases of the neural networks 320 and 330, are optimized not only to best reflect the training data, but to do so in a way best aligned with the output of the teacher neural network for the same sample values.
Since the teacher neural network is trained to categorize general datasets, rather than whatever specific content is present in the training dataset, it may be considered to better reflect general human conceptions of the relative importance of difference sample features, meaning that the auto-encoder trained in this way will not only identify anomalies in an abstract sense, but give greater significance to anomalies that a human being might also consider to be most significant.
As discussed above, the teacher neural network is pre-trained. Nevertheless, the characteristics of the data used to train the teacher neural network will typically be known. In many cases the size the data used to train the teacher neural network may be very great, and much greater that the amount of training data available for the training process described with respect to
As discussed above, a minimal difference value is obtained at step 440. This value may be retained as the basis of a threshold for anomaly detection in accordance with methods of anomaly detection in accordance the embodiments of the invention for example as described below. Accordingly, there may be provided a further step of selecting a threshold indicating the presence of an anomaly with reference to the distribution of difference values obtained across said training dataset. For example the minimum said difference value obtained across said training dataset may be selected as a threshold indicating the presence of an anomaly. Other statistical characteristics may equally be used to select the threshold, for example taking a value corresponding to a certain number of standard deviations from the average difference, and the like, as appropriate.
In accordance with certain embodiments, the training dataset may comprise a number of samples pre-identified as representing anomalous data. These may be detected and identified in an existing dataset, or deliberately injected. On this basis, the method of
As discussed above, it has generally been assumed that the resolution of the features output by the teacher neural network is the same as the resolution of the features output by the auto-encoder at each pair of corresponding levels. It will be appreciated that this need not necessarily be the case—the resolution is dictated by the structure of the underlying neural networks, and in some cases it may expedient to use an available neural network which offers good performance, but for technical reasons outputs features at resolutions different to those available from the other neural network. Where this is the case, there may be provided a further step of adjusting the resolution of the features output by said teacher neural network or said auto-encoder or the output of one or more said error determinations to a standard resolution.
It may be borne in mind that difference calculations are performed for features at each of the levels output by the neural networks, and that depending on the manner in which difference values are expressed, this may naturally lead to difference levels at higher resolutions having a higher value than those obtained at lower resolutions. According to certain embodiments, this may be compensated by multiplying difference values by a resolution correction factor, or otherwise. Alternatively, the features themselves may be up sampled so that all difference values are calculated at the same reference resolution. On this basis, the method may comprise the further steps of adjusting the resolution of the features output by each said error determination to a standard resolution, wherein said step of determining a difference value comprises up-sampling each said set of features to a predetermined resolution, consolidating the up-sampled sets of features and then summing over the consolidated dataset to obtain said difference value.
As shown in
At step 505 a teacher neural network trained to extract features from digital samples of the predetermined type is exposed to a digital sample.
From step 505 the method proceeds to step 515, at which features representing the digital sample are extracted at one or more levels.
The teacher neural network of the method of
At step 510, an auto-encoder trained to reconstruct said features of a training dataset of said predetermined type is exposed to the same respective digital samples of the training dataset.
The auto-encoder may have been trained to reconstruct said features of a training dataset of said predetermined type by means of the method described above with regard to
From step 510 the method proceeds to step 520 of reconstructing features representing each respective digital sample at one or more levels. That is to say, if the teacher neural network extracts features at three levels (e.g. 112 and 113 in
It can be noted that even though the different features of one level represent the different parts of the image and this permits anomaly localisation, these different features are reconstructed in parallel or simultaneously and from a common global context so that all parts of the data sample (e.g. an image) are processed together, and not independently. This parallel processing at multiple levels means that certain anomalies in parts of the data sample are only apparent with reference to information concerning of other parts of the data sample.
For example, a pixel of a particular colour may not be identified as an anomaly as such in isolation, but when a pixel (or other sub-division of the data sample) of that colour occurs in a field of pixels of some other colour, it may be validly identified as an anomaly. The mechanism of certain embodiments inherently incorporates this approach, and thereby facilitates the process of identifying anomalies of this kind.
The operation of the auto-encoder to generate reconstructed features is substantially as described with respect to
It will be appreciated that while steps 505/515 and 510/520 are shown as being performed in parallel, they may equally be performed in series. Whether performed in series or parallel, they may be performed in a number of different sequences, e.g. “505, 510, 520, 515”, “510, 505, 515, 520”, “505, 515, 510, 520” or “510, 520, 505, 515”.
From steps 515/520 the method proceeds to step 530, at which a difference value is determined, reflecting the respective difference values obtained for each level comparison performed for a pair of features as output by the teacher neural network and auto-encoder respectively as described above for the input sample, that is to say, of the difference between each extracted feature the sample and the respective reconstructed feature, and between the input sample and the respective said reconstructed feature.
The method next proceeds to step 540 at which the difference value obtained at step 540 is compared to a threshold, and in the case where the difference value exceeds the threshold, the sample is identified as anomalous at step 550.
In a case where a sample is identified as anomalous some further steps may be implemented as required, for example halting a production line, diverting an anomalous article to a waste bin or for further inspection, performing some remedial action, issuing an alarm, marking the article corresponding to the anomalous determination in some way, or otherwise. The method may then terminate, or as shown loop back to steps 505 and 510 for a new sample.
In a case where a sample is identified as not anomalous the method proceeds to step 560 of identifying the sample as normal, and some further steps may be implemented as required, for example moving an article to a next processing step in a production line, issuing a chime or other indication of approval, marking the article corresponding to the non-anomalous determination in some way for example affixing a quality control marking, or otherwise. The method may then terminate, or as shown loop back to steps 505 and 510 for a new sample. It will be appreciated that in some embodiments, steps 560 or 550 may be performed tacitly, for example a sample may be identified as normal simply by the fact that it is not identified as anomalous, and allowed to proceed in the production chain, etc.
As discussed above, it has generally been assumed that the resolution of the features output by the teacher neural network is the same as the resolution of the features output by the auto-encoder at each pair of corresponding levels. It will be appreciated that this need not necessarily be the case—the resolution is dictated by the structure of the underlying neural networks, and in some cases it may expedient to use an available neural network which offers good performance, but for technical reasons outputs features at resolutions different to those available from the other neural network. Where this is the case, there may be provided a further step of adjusting the resolution of the features output by said teacher neural network or said auto-encoder or the output of one or more said error determinations to a standard resolution.
It may be borne in mind that difference calculations are performed for features at each of the levels output by the neural networks, and that depending on the manner in which difference values are expressed, this may naturally lead to difference levels at higher resolutions having a higher value than those obtained at lower resolutions. According to certain embodiments, this may be compensated by multiplying difference values by a resolution correction factor, or otherwise. Alternatively, the features themselves may be up sampled so that all difference values are calculated at the same reference resolution. By up-sampling the features and/or error calculations, it becomes possible to superimpose the features or error sets to obtain an overall mapping of the location of error values across a sample, by adding errors bitwise, pixel-wise, or generally on a value by value basis, so as to obtain an anomaly map. On this basis, the method may comprise the further steps of adjusting the resolution of the features output each said error determinations to a standard resolution, wherein the step of determining a difference value comprises up-sampling each said set of features to a predetermined resolution, consolidating the up-sampled sets of features and then summing over the consolidated dataset to obtain a difference value map, and comparing each value of said difference value map to a second threshold, and flagging values in an anomaly map exceeding said threshold as anomalous.
As discussed above, it has generally been assumed that the resolution of the features output by the teacher neural network is the same as the resolution of the features output by the auto-encoder at each pair of corresponding levels. It will be appreciated that this need not necessarily be the case—the resolution is dictated by the structure of the underlying neural networks, and in some cases it may expedient to use an available neural network which offers good performance, but for technical reasons outputs features at resolutions different to those available from the other neural network. Where this is the case, there may be provided a further step of adjusting the resolution of the features output by said teacher neural network or said auto-encoder or the output of one or more said error determinations to a standard resolution.
The anomaly detector of
As shown the adaptor unit 600 comprises adaptor sub-units 651, 652 and 653, adjusting the resolution of the three sets of features output by auto-encoder 130 so as to correspond to the resolution of the corresponding level features output by the teacher neural network 110.
It will be appreciated that in certain embodiments it may be necessary to adjust some outputs in the manner, and not others, depending on the structure and configuration of the respective neural networks.
It will be appreciated that while as shown the adapter unit is part of the auto-encoder unit such that from the point of view of the difference calculator 140 the auto-encoder outputs features at the required resolution directly, the adaptor unit may be physically and/or logically separate from the auto-encoder.
Furthermore, it will be appreciated that the adaptor unit may equally be implemented so as to adjust the output of the teacher neural network instead of, or as well as, the output of the auto-encoder. In some embodiments the adaptor unit will up-sample feature sets to the native resolution of the input data sample or samples, but in other cases some other convenient common resolution may be selected.
The anomaly detector of
As such, as shown in
The difference value map may be presented graphically to a human user, or used to direct additional process steps for example to remediate the anomaly, or subjected to further analysis for example with a view to determining the likely cause of the anomaly, or to trace the corresponding back through the manufacturing process, supply chain or the like.
As shown in
Certain samples exhibit anomalies in the form of scratches and other blemishes. The column B presents the output of a conventional neural network. On the basis of the respective sample image. It may be observed that generally the output in the second row does not effectively highlight or isolate anomalies.
Column C presents an example of an difference value map as may be obtained as discussed above for example by unit 740. It may be seen in each case that a heat map representing a difference level is superposed over the original sample, with high energy heatmap levels over the areas of each sample exhibiting anomalies.
Column D presents an example of an difference value map as may be obtained as discussed above for example by unit 740. It may be seen in each case that a heat map representing a difference level is superposed over the original sample, with high energy heat-map levels over the areas of each sample exhibiting anomalies, and further comprising in some cases a manually inscribed white marking, representing the location of anomalies as determined by a human assessor. As discussed above, in certain embodiments training datasets may comprise samples known to comprise anomalies. The images in column D may comprise such known anomalous samples. Furthermore, by indicating the location of the anomalies, training may be extended in the case of embodiments capable of indicating the location of anomalies to assessing the degree to which the system effectively determines the location of the anomalies, and taking this into account in optimising the auto-encoder parameters.
It will be appreciated that embodiments may be implemented wholly or partially in software. Software embodiments include but are not limited to application, firmware, resident software, microcode, etc. The invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or an instruction execution system. Software embodiments include software adapted to implement the mechanisms discussed above with reference to
In some embodiments, the methods and processes described herein may be implemented in whole or part by a user device. These methods and processes may be implemented by computer-application programs or services, an application-programming interface (API), a library, and/or other computer-program product, or any combination of such entities.
The user device may be a mobile device such as a smart phone or tablet, a drone, a computer or any other device with processing capability, such as a robot or other connected device, including IoT (Internet of Things) devices, head mounted displays with or without see through technology, glasses or any device allowing the display of lines or the like.
Accordingly, as described an anomaly detector uses two neural networks, the first, a general purpose classifying convolutional neural network operates as a teacher neural network, while a second neural network in an auto-encoder type configuration. Each of the two neural networks receives the same input stream, and generates respective feature outputs at different levels, corresponding to different resolutions for image data. The respective outputs of the two neural networks are compared at each level, and the resulting difference values consolidated across the difference levels to obtain a final difference value. In a training phase this difference value is used to drive the determination of the weights and biases of the auto-encoder, so as to obtain a auto-encoder trained for a particular input type, under the influence of the teacher neural network. In an operational mode, the difference value is compared to a threshold to determine whether a particular sample is anomalous or not. In certain embodiments, difference values a different levels may be scaled so as to be superimposed at a common resolution, thereby providing an error map indicating the location of anomalous values across the sample.
The examples described above are given as non-limitative illustrations of embodiments of the invention. They do not in any way limit the scope of the invention which is defined by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
20305106.5 | Feb 2020 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2021/052467 | 2/3/2021 | WO |