This application is based upon and claims priority to Chinese Patent Application No. 202210857092.X, filed on Jul. 20, 2022, the entire contents of which are incorporated herein by reference.
The present invention pertains to the technical field of classifying types of concrete cracks based on inverted residual (IR) 7-ECA and CBAM (EC) networks. Specifically, it focuses on a fast and intelligent classification method for vision-based concrete crack type classification.
Concrete structures such as bridges, dams, and other civil infrastructures are susceptible to degradation and damage during operation. This can have a significant impact on safety, making it crucial to diagnose potential damage. The advancement of unmanned aerial vehicle and wireless transmission technologies has enabled the efficient collection of mass data on concrete structures. This serves as a basis for the development of intelligent classification. Compared to traditional methods for classifying structural damage, vision-based deep learning technology offers several advantages. It is highly efficient, precise, and objective when it comes to classifying cracks.
Deep learning neural network is widely used to solve multi-classification problems in the computer field. Its purpose was to create a universal and efficient network that could be applied in various scenarios with high precision. However, when this type of network is used for classifying concrete cracks, it only needs to classify up to ten different types of cracks. As a result, using the universal network can lead to parameter redundancy and waste of training time and hardware memory. Therefore, it is necessary to develop a special neural network for classifying concrete cracks, with light weight, high network convergence speed, and high classification precision.
In order to solve the above problems, the present invention provides a crack type classification method based on an inverted residual (IR) 7-ECA and CBAM (EC) network model special for mass concrete image crack classification.
A quick and intelligent IR7-EC network based classification method for a concrete image crack type includes the following steps:
Preferably, the building an IR7-EC network model specifically includes the following step:
Preferably, training of the IR7-EC network model includes the following steps:
Preferably, normalization of the batch normalization layer specifically includes:
where xi is a feature map before input into batch normalization, yi is a feature map after output from batch normalization, m is the number of feature maps input into the layer in a current training batch, and γ and β are variables that change with update of a network gradient;
f(xi)=min(max(xi,0),6)
Preferably, the highlighting some channels with higher network precision by weighting network channels through the ECA attention mechanism, so as to obtain the enhanced concrete crack feature extraction map includes the following step:
where |t|odd represents a nearest odd t; C is the number of channels of data input into the ECA attention mechanism, and γ and B are two hyper-parameters, γ being 2, and b being 1; and Es(F) is the ECA attention mechanism, σ is a sigmoid operation, fk*k[⋅] represents a convolutional operation of k*k, F is an input feature map, and AvgPool( ) is average pooling.
Preferably, the inputting data after feature extraction through the 9th layer into the CBAM attention mechanism of the 10th layer, and further conducting feature extraction at a channel and space level on the data, so as to obtain a feature extraction map containing more crack information includes the following steps:
M
c(F)=σ(MLP(AvgPool(F))+MLP(MaxPool(F)))
M
s(F)=σ(f7*7[AvgPool(F),MaxPool(F)])
Preferably, the method further includes:
r
j
(l)˜Bernoulli(p)
{tilde over (y)}
(l)
=r
(l)
*y
(l)
Preferably, the computing an error through a loss function includes the following step:
Preferably, the method further includes:
f(θ)=Loss(yo,c,po,c)
g
t=∇θft(θt−1)
m
t=β1·mt−1+(1−β1)·gt
v
t=β2·vt−1+(1−β2)·gt2
{circumflex over (m)}t=m
t/(1−β1t)
{circumflex over (v)}
t
=v
t|(1−β2t)
θy=θt−1−α·{circumflex over (m)}t/(√{square root over ({circumflex over (v)}t)}+ϵ)
where Loss(yo,c, po,c) is a loss function between a network predicted value and a true value, θ is a parameter to be updated in a model, gt is a gradient obtained by conducting derivation on θ of the loss function f(θ), β1 is a first-order moment attenuation coefficient, β2 is a second-order moment attenuation coefficient, mt is an expectation of the gradient gt, vt is an expectation of gt2, {circumflex over (m)}t is an offset correction of mt, {circumflex over (v)}t is an offset correction of vt, θt□1 is a parameter before network update, and θt is a parameter after network update.
The present invention has the beneficial effects:
According to the present invention, based on the inverted residual structure and other machine vision algorithms, the IR7-EC network model special for mass concrete image crack classification is created. Compared with a current popular network of computer vision, the IR7-EC network model has a smaller number of parameters and shorter training time, and maintains higher concrete crack classification precision.
The present invention avoids disadvantages such as parameter redundancy, long training time and large hardware memory occupation of current general classification networks of alexnet, vgg16, resnet50, Google net and mobilenet_v3_large, has characteristics and advantages such as a small number of network model parameters, a high training convergence speed and high concrete crack classification precision, forms a special model for intelligently and efficiently classifying cracks from concrete image big data, and has great engineering application potential.
For making objectives, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail below in conjunction with the accompanying drawings and examples. It should be understood that specific examples described herein are merely used to explain the present invention, and are not used to limit the present invention.
The present invention provides a quick and intelligent inverted residual (IR) 7-ECA and CBAM (EC) network based classification method for a concrete image crack type. A flow diagram of using an IR7-EC network to classify concrete crack is as shown in
With reference to
Specifically, in step 1, after crack images are collected, the concrete crack images are manually classified, which include: images of a transverse crack, a vertical crack, an oblique crack, a mesh crack, an irregular crack, a hole, and a background, as shown in
Specifically, in step 2, pre-processing of the training set includes random image horizontal flipping and image normalization, where an average value required for normalization is set as [0.485, 0.456, 0.406], and a variance is set as [0.229, 0.224, 0.225]. Pre-processing of the validation set includes normalization, where parameters are set as [0.485, 0.456, 0.406] and [0.229, 0.224, 0.225].
In the example, the step of building a neural network model in step 3 includes the following step:
, 7 × 7
indicates data missing or illegible when filed
A 1st layer with a convolutional kernel of 3×3, a batch normalization layer, and a hardswish activation function, a step size being 2, the number of input channels being 3, and the number of output channels being 16.
A 2nd layer are the inverted residual-efficient channel attention (ECA) structures. Comparison between the inverted residual-ECA structure and a traditional inverted residual structure is as shown in
A 9th layer includes a convolutional layer with a convolutional kernel of 3×3, a batch normalization layer and a hardswish activation function, a step size being 1, the number of input channels being 96, and the number of output channels being 576.
A 10th layer is the CBAM attention mechanism and specifically includes two parts: a first part being a channel attention mechanism, which includes an average pooling layer, a maximum pooling layer, the fully connected layer 1, an ReLU6 activation function, the fully connected layer 2, and a sigmoid function; and a second part being a space attention mechanism, which includes an average pooling layer, a maximum pooling layer, a convolutional layer with a convolutional kernel of 7×7, and a sigmoid function.
A 11th layer is an average pooling layer.
A 12th layer is a one-dimensional convolutional layer, a hardswish activation function, and a dropout layer with a sparsity ratio of 0.2.
A 13th is a one-dimensional convolutional layer.
With reference to
Specifically, in order to prevent a vanishing network gradient, normalization of the batch normalization layer in each layer is conducted through the following formulas:
where xi is a feature map before input into batch normalization, yi is a feature map after output from batch normalization, m is the number of feature maps input into the layer in a current training batch, and γ and β are variables that change with update of a network gradient.
Specifically, data passing the ReLU6 activation function in each layer is processed nonlinearly through the following formula:
f(xi)=min(max(xi,0),6)
Specifically, data passing the hardswish activation function in each layer is processed nonlinearly through the following formula:
Specifically, cross-channel interaction is conducted on data passing the ECA attention mechanism in each layer, so as to obtain the enhanced concrete crack feature extraction map, by using the following formula:
Specifically, average pooling and maximum pooling are used to aggregate space information of feature mapping, a space dimension of an input feature map is compressed, and summation and merging are conducted element by element, so as to generate a channel attention map, through the following formula.
M
c(F)=σ(MLP(AvgPool(F))+MLP(MaxPool(F)))
Specifically, average pooling and maximum pooling are used to compress the input feature map in a space attention module, so as to obtain the feature extraction map containing more crack information through the following formula.
M
s(F)=a(f7*7[AvgPool(F),MaxPool(F)])
Specifically, sparsification is conducted on data passing the dropout layer in each layer, so as to avoid network over-fitting through the following formulas:
r
j
(l)˜Bernoulli(p)
y
(l)
=r
(l)
*y
(l)
where a Bernoulli(p) function is used to generate a vector of a probability rj(l), such that a neuron stops working with probability p, y(l) is an upper layer of an output feature map, and {tilde over (y)}(l) is a feature map output after passing the dropout layer.
Specifically, loss of a network is computed through the following formula:
Specifically, internal parameters of a network are optimized through the following formulas:
f(θ)=Loss(yo,c,po,c)
g
t=∇θft(θt−1)
m
t=β1·mt−1+(1−β1)·gt
v
t=β2·vt−1+(1−β2)·gt2
{circumflex over (m)}
t
=m
t/(1−β1t)
{circumflex over (v)}
t
=v
t/(1−β2t)
θt=θt−1−α·{circumflex over (m)}t/(√{square root over ({circumflex over (v)}t)}+ϵ)
where Loss(yo,c, po,c) is a loss function between a network predicted value and a true value, θ is a parameter to be updated in a model, gt is a gradient obtained by conducting derivation on θ of the loss function f(θ), β1 is a first-order moment attenuation coefficient, β2 is a second-order moment attenuation coefficient, mt is an expectation of the gradient gt, vt is an expectation of gt2, {circumflex over (m)}t is an offset correction of mt, {circumflex over (v)}t is an offset correction of vt, θt□1 is a parameter before network update, and θt is a parameter after network update.
An actually-shot concrete image is used as a training set and input a trained network, and finally, an classification result of a concrete crack image is output.
With reference to
A computation formula of loss is as follows:
Table 4 shows the number of parameters and operations of an IR7-EC network and other convolutional neural networks (CNNs), where Flops is the number of floating points of operations, which include all multiplication and addition operations in the network model and used to measure computational complexity of the model. According to results in Table 3, Table 4 and
Reference is made to
The recall (true positive rate) is a ratio of all positive samples predicted correctly to all actual positive samples. The higher the recall rate, the lower a possibility of network reporting missing. A computation formula of the recall rate is as follows:
The specificity (true negative rate) is a ratio of all negative samples predicted correctly to all actual negative samples. A computation formula of the specificity is as follows.
Reference is made to Table 6 for TP, TN, FP and FN. The second letter includes positive (P) and negative (N), which are used to represent a predicted condition, and the first letter includes true (T) and false (F), which are used to determine an actual condition. Specifically,
With reference to Table 5 and
The above descriptions are merely preferred examples of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principle of the present invention should fall within the protection scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
202210857092.X | Jul 2022 | CN | national |