The present application claims the priority of a Chinese patent application filed with the State Intellectual Property Office of China, on Nov. 4, 2020, with the application number 202011212882.X, entitled “Autoregression Image Abnormity Detection Method of Enhancing Latent Space Based on Memory”, all content of which is incorporated in the present application by reference.
The present application relates to the field of abnormity detection in computer vision, and in particular, to an autoregression image abnormity detection method of enhancing a latent space based on memory.
Abnormity detection, also known as outlier detection, novelty detection, etc., is a detection process in which objects whose behavior is significantly different from that of an expected object are found. These detected objects are also called as abnormal points or outliers. Abnormity detection is widely used in production activities and life, such as, credit card anti-fraud, advertisement click anti-cheating, network intrusion detection, and so on.
With rise of deep learning in recent years, research on the use of abnormity detection in computer vision has begun to flourish. The abnormity detection in computer vision meets the definition of abnormity detection, wherein an image, video and other information are regarded as the input object. For example, objects are found from a large number of images with the found objects not conforming to the type of images; components which are wrongly produced are detected during the industrial production; and the abnormity detection is used to surveillance videos to automatically analyze abnormal behaviors, objects and the like in surveillance videos. Due to the fiery development of computers and the rapid expansion of data, a technology capable of analyzing information such as images and videos is urgently demanded.
With the development of machine learning, especially deep learning technology, image abnormity detection technologies based on machine learning emerge continuously. Compared with traditional abnormity detection, more compact information expression needs to be extracted for images. In the stage of the traditional machine learning, abnormity detection requires manually analyzing the data distribution, designing appropriate characteristics, and then using the traditional machine learning algorithms (support vector machines, isolation forests, etc.) to model and analyze the data. Compared with traditional machine learning, deep learning can automatically learn the characteristics of the data and then perform modeling and analysis on the characteristics, which has higher robustness.
At present, the abnormity detection methods in computer vision mainly include: methods based on reconstruction loss differences, methods based on classification learning, and methods based on density estimation.
Of course, many algorithms, that are variants and combinations of the above algorithms, may be used to achieve the effect of abnormity detection, including the combination of autoencoder and generated confrontation network, and the combination of autoencoder and density estimation method.
However, the existing abnormity detection methods, since lacking clear supervision information (abnormal data is difficult to be collected, and the collection of normal data is too time-consuming and laborious to obtain complete data), are difficult to achieve good effects. Especially the model based on the deep autoencoder lacks a good solution to the problems of large data distribution and large data variance, etc.
The present application provides an autoregression image abnormity detection method of enhancing a latent space based on memory that can better determine abnormal images.
The technical solution adopted by the present application for solving its technical problems is as follows.
on to obtain ŵ, and finally making
expressed by the memory;
and finally restoring to an original size after being subjected to the up-sampling module for two times, with dimensions of the channel of the up-sampling module in number which changes from 64 to 32, and further to 16; and
Optionally, in Step 5, the model may have a loss function of:
L=L
rec
+αL
llk
+βL
mem;
represents a negative log likelihood loss;
represents an entropy of a weight coefficient of the characteristic and the memory module; and α, β respectively represent the weight coefficients of the loss function to balance a ratio of different losses. α, β are different for different data sets. As for MNIST and CIFAR10, α is equal to 1, and 0.1, respectively, and β is equal to 0.0002, and 0.0002, respectively.
The beneficial effects of the present application at least comprise the following.
Through the above-mentioned autoregression image abnormity detection method of enhancing a latent space based on memory, by constructing and training an autoregression model of enhancing a latent space based on memory, a prior distribution is not needed to be set such that the distribution of the data itself will not be damaged, and it can prevent the model from reconstructing abnormal images, and ultimately can better judge abnormal images.
The technical solutions of the present application will be described in detail below with reference to the drawings and embodiments.
This embodiment proposes an autoregression image abnormity detection method of enhancing a latent space based on memory. The flowchart is shown in
S1: selecting a data set, and dividing the data set into a training set and a test set.
In this embodiment, two mainstream image abnormity detection data sets are selected for experimentation, capable of including MNIST and CIFAR10.
The MNIST data set is a handwritten data set that many tasks will choose. It totally contains a training set of 60,000 examples and a test set of 10,000 examples. The data set can contain handwritten characters of digits 0-9, in 10 categories totally. Each image is a gray-scale image with a size of 28*28.
The CIFAR10 data set is a color image data set that is more related to universal objects. Totally, it contains training data of 50,000 images and test data of 10,000 images. It totally contains 10 categories of color RGB images: airplanes, cars, birds, cats, deer, and dogs, frogs, horses, boats, and trucks, wherein each picture is a 32*32 color image.
The above two data sets are selected for verifying the adaptability and robustness of the model with respect to different types of data sets. Both MNIST and CIFAR10 contain 10 categories. Most experiments will choose these two data sets. 10 categories can be adaptable very well to the background setting of abnormity detection, and provide the characteristics of data diversity.
S2: constructing a network structure of an autoregression model of enhancing a latent space based on memory.
As shown in
the autoregression module may be configured to model the data using the characteristics of the latent space and fit to a true distribution, with a fitting process expressed by following formula:
Here, the memory module is used to store the distributed sparse characteristic expression, which strengthens the generation effect of the autoencoder, and limits the weight, which effectively prevents the problem that the model can reconstruct abnormal images.
See
In this embodiment, the decoder network structure of the autoencoder may include a fully connected layer, an up sampling module, an up sampling module, and a convolutional layer. Each block uses a residual network structure, and is composed of three structures, whose sub-structures can be respectively transposed convolutional layer+batch normalization+activation function, convolutional layer+batch normalization+activation function, transposed convolutional layer+batch normalization+activation function, in cascade.
The network structure of the autoregression module is constructed using the structure shown in
The network structure of the memory module can be constructed by using the structure shown in
It should be pointed out that the encoder in the autoencoder is expressed by the mathematical mode of: z=en(X) and the decoder is expressed by the mathematical mode of: {circumflex over (X)}=de(z), and the autoregression module zdist=H(z) and {circumflex over (z)} act on z, at this time {circumflex over (X)}=de({circumflex over (z)}).
In a specific application process, the process that the autoencoder processes an image may include the following steps:
a. inputting one image with a size of N*N, wherein after an encoding stage of the autoencoder, its size of the autoregression model of enhancing a latent space based on memory becomes 2 times smaller after being subjected to the down-sampling module once, dimensions of a channel are in number which becomes from 1 to 32, and further to 64, and finally, it is input to the fully connected layer in the encoder after a leveling operation, and finally the latent space z∈R64 is obtained, and at this time {circumflex over (z)}∈R64;
b. sending z to the memory module to obtain a similarity between z and each block of memory, and performing, for one time, an operation of
on to obtain and finally making
expressed by the memory;
c. making z∈R64 subjected to the fully connected layer of the decoder to obtain a characteristic of size of
and finally restoring to an original size after being subjected to the up-sampling module for two times, with dimensions of the channel of the up-sampling module in number which changes from 64 to 32, and further to 16; and
d. restoring the characteristics to an original image space through a last convolutional layer.
S3: preprocessing the training set.
In the process of training the model, it is necessary to adjust the size of all images to N*N and convert them to the corresponding image space. According to the needs of the data, the operations, such as random rotation, flipping, and noise, can be appropriately used.
S4: initializing the autoregression model of enhancing a latent space based on memory.
Since the model initialization can effectively help the network to be trained and converge, the scheme adopted here is to use random initialization method for the autoencoder module and autoregression module. The process of random initialization is to ensure that the network weight is as small as possible, and the deviation is set to zero.
As for the memory module M∈RN*feature_dim, N represents the size of the memory module, feature_dim means that the size of the information stored in each block of memory is consistent with dimension of the latent space. As for ∀n∈N, the uniform distribution π˜U(0,1) is used to perform feature_dim operation for the initialization, that is: any n belongs to N, and every small block in the memory is initialized.
S5: using the preprocessed training set to train the initialized autoregression model of enhancing a latent space based on memory.
In the training process, two data sets, MNIST and CIFAR10, are mainly used.
Here, the size of the image input to the network is 28*28 and 32*32, respectively, feature_dim is all set to 64, the output dimension of the autoregression module is all 100, the numbers of memory are respectively set to 100 and 500, and the Batch_Size size is all 256. The learning rates are set to 0.0001 and 0.001, respectively. The Adam optimizer is used for learning. The total epoch is set to 100, and the learning rate is multiplied by 0.1 every 20 epochs. Here, the memory module proposes to use uniform distribution for initialization and set a separate learning rate, which effectively solves the problem that the memory module is difficult to be trained.
In addition, the loss function of the model is as follows:
L=L
rec
+αL
llk
+βL
mem;
where Lrec represents the reconstruction loss of the original image with respect to the reconstructed image,
represents the negative log likelihood loss,
represents the entropy of the weight coefficient of the characteristic and the memory module, and α, β respectively represent the weight coefficient of the loss function to balance the ratio of different losses. α, β are different for different data sets. For MNIST and CIFAR10, α are equal to 1, and 0.1, respectively, and β are 0.0002, and 0.0002, respectively.
S6: verifying the trained autoregression model of enhancing a latent space based on memory through the test set, and using the trained autoregression model of enhancing a latent space based on memory, to judge whether an input image is an abnormal image.
This embodiment mainly uses the area AUC under the ROC curve to evaluate the pros and cons of the method. Usually this indicator is calculated based on four elements of the confusion matrix of the classification problem: True Positive (TP), False Positive (FP), False Negative (FN) and True Negative (TN), wherein the confusion matrix is shown in Table 1 below:
In addition, the following formula is calculated:
The ROC curve is composed of two coordinates, i.e. the abscissa FPR and the ordinate TPR. One curve can be drawn by adjusting different thresholds. AUC is the area size of the part lower than the curve.
In addition, the performance of the model can be tested on the two data sets of MNIST and CIFAR10, respectively, which always achieves good performance compared with the current popular methods. The test comparison results are shown in
It can be seen from
The present application provides an autoregression image abnormity detection method of enhancing a latent space based on memory, which belongs to the field of abnormity detection in computer vision. The present application comprises: selecting a training data set; constructing a network structure of an autoregression model of enhancing a latent space based on memory; preprocessing the training data set; initializing the autoregression model of enhancing a latent space based on memory; training the autoregression model of enhancing a latent space based on memory; and verifying the model on the selected data set, and using the trained model to judge whether the input image is an abnormal image. The present application does not need to set a prior distribution such that the distribution of the data itself will not be damaged, and it can prevent the model from reconstructing abnormal images, and ultimately can better judge abnormal images.
In addition, it can be understood that the autoregression image abnormity detection method of enhancing a latent space based on memory of the present application is reproducible and can be used in a variety of industrial applications. For example, the autoregression image abnormity detection 5 method of enhancing a latent space based on memory of the present application can be used in applications that require the image abnormity detection.
Number | Date | Country | Kind |
---|---|---|---|
202011212882.X | Nov 2020 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2021/122056 | 9/30/2021 | WO |