The present application claims priority to Chinese Patent Application No. 202211687353.4, filed on Dec. 27, 2022, the content of which is incorporated herein by reference in its entirety.
The present disclosure relates to fields such as industrial defect detection, semantic segmentation, and abnormality detection, and in particular, to a method for detecting an abnormal defect on a steel surface based on semi-supervised contrastive learning.
The significant value of semantic segmentation algorithm is beyond doubt in the field of industrial defect detection. The industrial defect detection refers to the process of conducting certain industrial inspections before and after the production and processing process to ensure safety and quality for various products in the industry, such as aerospace vehicles and micro/nano level electronic components. Therefore, the industrial defect detection is a key link and technology for ensuring the quality of industrial products and production safety and stability.
However, the industrial defect detection technology based on semantic segmentation faces numerous challenges, which are closely related to the real environment of the industry. For example, insufficient sample resources for defects, unclear and incomplete definition of defect types, poor visibility of defects, and variable shapes of defects, etc. These problems have led to a series of technical difficulties in practical scenes, hindering the development of industrial defect detection. The specific problems are as follows: 1) it is difficult to annotate data in real industrial scenarios, and there is a large error in human annotation; 2) the amount of data annotation is limited, and the algorithm model relies heavily on the data; and 3) there are a large number of normal samples in real industrial scenes, while the number of abnormal samples is too small and the positive and negative samples are unbalanced.
An object of the present disclosure is to provide a method for detecting an abnormal defect on a steel surface based on semi-supervised contrastive learning for the shortcomings of existing semantic segmentation methods of the industrial defect detection.
The object of the present disclosure is achieved by the following technical solution: a method for detecting an abnormal defect on a steel surface based on semi-supervised contrastive learning. The method includes following steps:
Step (1), obtaining steel surface data in a real industrial scene, selecting images of machine tool area taken from different perspectives, taking data with a surface defect as an abnormal sample, and taking data without the surface defect as a normal sample, wherein the surface defect comprises edge cracks and creases.
Step (2), generating an abnormal Perlin noise simulating the real industrial scene by simulating abnormality for the normal sample, and combining with the normal sample equally scaled to obtain a simulated abnormal sample image Ic:
where Ir represents a compressed image equally scaled for a normal sample image I, A is a texture pattern of a randomly simulated abnormality, and Pt is a binary image of a random Berlin noise image P, β represents a proportion coefficient, which is a fusion ratio between the normal sample image and a simulated abnormal defect, ⊙ represents a dot multiplication operation pixel by pixel between image matrices, and
Step (3), constructing an abnormality reconstruction network based on an encoder-decoder structure for learning how to restore and reconstruct the abnormal sample into the normal sample, using the simulated abnormal sample of the step (2) as a network input, obtaining a reconstructed recovery sample, calculating a loss relative to the normal sample, and training the abnormality reconstruction network.
Step (4), constructing a contrastive discriminative network based on semantic segmentation, inputting the abnormal sample of the step (1) into the trained abnormality reconstruction network to obtain the reconstructed recovery sample, combining a channel with a input abnormal sample as an input of the contrastive discriminative network, obtaining a difference between the reconstructed recovery sample and the input abnormal sample by contrastive learning, and outputting a steel surface defect detection result.
In an embodiment of the present disclosure, a mask dilated convolution module is embedded in an encoder of the abnormality reconstruction network to expand a receptive field of the abnormality reconstruction network, and a transformer is used to replace a full connection integration operation of the mask dilated convolution module to achieve feature aggregation.
In an embodiment of the present disclosure, for the contrastive discriminative network, a self-attention mechanism module is used to obtain channel and spatial self-attention input by the contrastive discriminative network, so as to optimize the steel surface defect detection result.
In an embodiment of the present disclosure, for an input feature map of the contrastive discriminative network, attention is extracted in a channel-before-spatial manner, and the input feature map is multiplied with an original feature map to feed back after each attention extraction.
In an embodiment of the present disclosure, the texture pattern of the randomly simulated abnormality enhances a diversity of a simulated defect by randomized data augmentation, and wherein three of the following are randomly selected and combined for usage: rotation, affine transformation, image brightness, sharpness, equalization value, contrast and saturation.
In an embodiment of the present disclosure, a compressed image Ir and the abnormal binary image Pt overlap each other to obtain a simulated abnormal portion comprising original image information, while the compressed image Ir and the reverse value image of the abnormal binary image Pt overlap each other to obtain an image region portion without the simulated abnormality.
Advantages of the present disclosure are as follows.
(1) More realistic simulation of industrial abnormal defects and preservation of image defect features are achieved based on abnormal simulation of Berlin noise and industrial images equally scaled, providing a foundation for the training of abnormality reconstruction networks.
(2) Compared to the application of basic semantic segmentation methods in the industrial defect detection, adopting a semi-supervised contrastive learning structure of “abnormality reconstruction+contrastive discriminative” is capable of better utilizing the information value of normal samples, thereby reducing model data dependence.
(3) The self-supervised learning module of mask dilated convolution is used to enhance the ability of image restoration and reconstruction, providing convenient conditions for contrastive learning.
(4) A self-attention mechanism facing spatial and channel level of CNN is used to self-learn the information of contrastive images, improving the final segmentation detection effect.
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the related art, the drawings that need to be used in the description of the embodiments or the related art will be briefly described below. Obviously, the drawings in the following description are only some embodiments of the present disclosure, and for those skilled in the art, other drawings may further be obtained according to these drawings without inventive effort.
The preferred embodiments of the present disclosure will now be described with reference to the drawings. In order to clarify the structural process characteristics and functions of the present disclosure more clearly, the method of the present disclosure is divided into three parts for detailed explanation.
1. Abnormal data simulation.
(1) The steel surface data in the real industrial scene is obtained, the images of machine tool area taken from different perspectives are selected to form a dataset, the data with the surface defect is taken as the abnormal sample, and the data without the surface defect is taken as the normal sample. The surface defect comprises edge cracks and creases.
(2) A noise image P is randomly generated from Perlin noise effectively simulating various random defects and abnormal defects of different shapes and sizes. Due to not initially a binary image, the Perlin noise cannot be directly used as a defect abnormality image. It is necessary to perform a binary operation through a randomly uniformly sampled threshold to obtain an image Pt as shown in
(3) For abnormality simulation, it is not only required to use the binary image Pt to randomize the position of simulated defects, but also to randomize the texture information of defects at the same time. The present disclosure randomly selects different texture as defect textures, and A in
(4) For an original image I, since the images in the dataset are all 1440×2560 resolution size, considering that most images have small defects, when the resolution size of the image is forced to be compressed to 256×256 to match the resolution size of network input, resulting in significant loss of information on the small defects. Therefore, the present disclosure first overlays the original image with a 2560×2560 square image, and then entirely compresses the images with the resolution size of 256×256. By preserving the shape information of small defects to a certain extent for subsequent detection, the compressed image Ir as shown in
Finally, as shown in
Where β represents the proportion coefficient, which is the fusion ratio between the original image and a simulated abnormal defect. This not only designs a random noise abnormality, but also combines the information of the original image, making the final simulated defect abnormality information closer to the pixel distribution characteristics of the original image. ⊙ represents a dot multiplication operation pixel by pixel between image matrices, and
Due to the consideration of the actual range, region truncation is performed on the simulated abnormal images Ic and the simulated abnormality Pt to obtain the final results IA and PA.
2. Construction of the abnormality reconstruction network and the contrastive discriminative network.
(1) A semi-supervised contrastive learning architecture of “abnormality reconstruction+contrastive discriminative” is constructed. For the network structure, a two-stage semi-supervised learning training mode is designed herein, including first training normal samples and reconstructed network, followed by training abnormal samples and discriminative networks. The reconstructed network can be a generator of GAN or a structural model of the encoder-decoder based on a FCN network to implement convolutional networks for downsampling and upsampling of basic features. The contrastive discriminative network can be a basic semantic segmentation network such as UNet, SegNet, DeepLab, to fully utilize the information value of previously unusable normal samples, thus playing a crucial role in reconstructing abnormal samples.
(2) In first stage shown in
(3) In the second stage shown in
3. Optimization design for the abnormality reconstruction network and the contrastive discriminative network.
(1) For the abnormality reconstruction network, the mask dilated convolution module (Self-Supervised Predictive Convolutional Attention Block for Anomalous Detection) is used to expand the receptive field of the abnormality reconstruction network, achieving the ability to aggregate information around pixels, and better predicting the current pixel restoration results. At the same time, Transformer is used to replace the basic fully connected integration operation in feature fusion, better achieving feature aggregation.
The mask dilated convolution requires two parameters to be determined including a dilated rate d and a convolution size k, thus determining that the convolution used is k×k×c, where c is the number of channels in the convolutional kernel.
For one image with h×w×c, the first step is to perform pad extension on the image, extending the distance of k+d around the image to obtain an extended image of (h+2k+2d)×(w+2k+2d)×c.
The extended image is divided into four sub-image blocks: upper left corner, lower left corner, upper right corner, and lower right corner. Each sub-image block has size of (h+k+2d)×(w+k+2d)×c. The four sub-image blocks Pi∈(h+k+2d)×(w+k+2d)×c, ∀i∈{1,2,3,4} are performed convolution and fusion activation, respectively, which is corresponding to that self-supervised learning is performed on the surrounding four perspectives of each pixel in the original image, predicting the actual value of the pixel, and thus achieving self-supervised learning.
Feature map output Z∈h×w×c obtained from the mask dilated convolution represents feature map of h×w size obtained from the c mask dilated convolution. Firstly, a spatial average pooling is used to reduce the feature map to obtain the pooled feature map {circumflex over (Z)}∈
h′×w′×c, where h′≤h and w′≤w. Next, {circumflex over (Z)} is performed a reshape operation based on the channel to obtain flattened features A∈
c×n, where n=h′×w′, and A represents the flattening characteristics of different channels. Afterwards, Tokens T∈
c×d
c×d
(2) The Self-Attention Mechanism Module of the Contrastive Discriminative Network
For two samples with segmented abnormal defect differences inputted by the contrastive discriminative network, the scheme of channel attention module before spatial attention module is adopted:
For the channel attention module, the method of compressing feature map space is adopted, which is different from the commonly adopted average pooling method to obtain attention. Convolution Block Attention Module CBAM effectively calculates spatial information statistics by adopting the average pooling method and maximum pooling method, collecting attention features of different channels, and improving network representation ability.
For the spatial attention module, the average pooling method and the maximum pooling method are adopted to compress channel dimension, effectively calculating the information statistics on the channel, and finally obtaining the attention features of different spaces.
Finally, combining two attention modules, for the feature maps, attention is extracted in a channel-before-spatial manner, and the feature maps are multiplied with an original feature map to feed back after each attention extraction, thereby obtaining the final attention mechanism module and improving the final effect of industrial defect detection.
The steel surface defect dataset used in this embodiment includes 2387 training set images in which there were 338 normal samples and 2049 abnormal samples, and 175 test set images. Each type of sample includes four different perspective images, among which the normal samples were not labeled. A certain number of images are randomly selected from four different perspectives as the test data for each perspective image in the test set, respectively, ultimately obtaining 175 test set images that includes 26 normal samples and 149 abnormal samples for testing.
The abnormal samples in the training set and all samples in the testing set are equipped with two types of data annotation, representing the data annotation for defect detection, respectively, including defect annotation for creases and edge cracks. A pixel level annotation of the machine tool range is mainly used for the annotation of machine tool positioning.
The results from this experiment demonstrate that for the defect detection, the basic semantic segmentation method UNet has a MIOU (i.e., Mean Intersection over Union) of 74.5%. In contrast, the method designed by the present disclosure has a MIOU of 86.3%, and even with 50% abnormal sample data, the MIOU of the method designed by the present disclosure can still reach 77.0%, demonstrating the superiority of the method.
The above embodiments are used to explain the present disclosure, rather than limit the present disclosure, and any modifications and changes made to the present disclosure within the spirit and scope of the claims of the present disclosure will fall within the scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202211687353.4 | Dec 2022 | CN | national |