The application claims priority to Chinese patent application No. 202111091274.2, filed on Sep. 17, 2021, the entire contents of which are incorporated herein by reference
The present application mainly relates to the technical field of image processing, and more particularly, relates to an image haze removal method and apparatus, and a device.
In recent years, with the development of society, haze has become a current atmospheric phenomenon that is prevalent in computer vision. Due to the presence of numerous suspended particles, light is reflected during propagation, resulting in blurring of outdoor images, color distortion and contrast reduction. Advanced image processing tasks, such as target detection, target recognition and industrial IoT, require clear images as input, and hazy images may affect the quality and robustness of subsequent advanced tasks. Therefore, as an image preprocessing task, image haze removal is a classic image processing problem, which has always been a hot topic for researchers.
At present, image haze removal algorithms are mainly based on deep learning methods, and deep learning-based haze removal networks treat channels and pixel features equally. However, haze is non-homogeneous, such as light haze and dense haze, and pixel weights of close and long shots should be significantly different. Therefore, as the haze removal networks treat channels and pixel features equally, a haze removal effect is poor, and images still inevitably keep deep haze after haze removal, losing details of the images.
In order to overcome the defects in the prior art, the present disclosure provides an image haze removal method and apparatus, and a device.
In a first aspect, the present disclosure provides an image haze removal method. The method includes:
acquiring a hazy image to be processed; and
obtaining a haze-free image corresponding to the hazy image to be processed by inputting the hazy image to be processed into a pre-trained haze removal model.
The pre-trained haze removal model includes a plurality of residual groups, each of the residual groups includes a plurality of residual dual attention fusion modules connected in series, each of the residual dual attention fusion modules includes a residual block, a first convolutional layer, a channel attention module, a pixel attention module, and a second convolutional layer, an output of the residual block is connected to inputs of the channel attention module and the pixel attention module via the first convolutional layer, and outputs of the channel attention module and the pixel attention module are fused for output processing, such that pixel features are obtained while global dependency of each feature map is enhanced.
Further, the haze removal model includes three residual groups, and the three residual groups are in in-channel connection according to outputs from back to front.
Further, each of the residual groups includes three residual dual attention fusion modules.
Further, the outputs of the residual dual attention fusion modules are obtained by inputting the outputs of the channel attention module and the pixel attention module and an input of the residual block into the second convolutional layer for fusion after element-by-element summation.
Further, the haze removal model further includes a feature extraction convolutional layer, a channel attention module, a pixel attention module, and an output convolutional layer, the hazy image to be processed enters the residual groups after being subjected to feature extraction by the feature extraction convolutional layer, and enters the channel attention module, the pixel attention module and the output convolutional layer in sequence for processing after being processed by the residual groups, so as to obtain output features, and the haze-free image is obtained by performing element-by-element summation on the output features and the hazy image to be processed.
Further, the haze removal model is trained by:
acquiring an RESIDE dataset, and constructing a training sample set by randomly selecting 6000 pairs of hazy images and haze-free images from the RESIDE dataset; and
training a pre-established neural network with the training sample set.
Further, a loss function L of the neural network is expressed as:
where N is the number of training samples, Jigt is a real clear image of an i-th training sample, and Ĵi is a haze-free image estimated by the neural network for the i-th training sample.
In a second aspect, the present disclosure further provides an image haze removal apparatus. The apparatus includes:
an image acquiring module, configured to acquire a hazy image to be processed; and
an image haze removal module, configured to input the hazy image to be processed into a haze removal model for processing, and output a haze-free image corresponding to the hazy image to be processed.
The haze removal model includes a plurality of residual groups, each of the residual groups includes a plurality of residual dual attention fusion modules connected in series, each of the residual dual attention fusion modules includes a residual block, a first convolutional layer, a channel attention module, a pixel attention module, and a second convolutional layer, an output of the residual block is connected to inputs of the channel attention module and the pixel attention module via the first convolutional layer, and outputs of the channel attention module and the pixel attention module are fused, such that pixel features are obtained while global dependency of each feature map is enhanced.
Further, the outputs of the residual dual attention fusion modules are obtained by inputting the outputs of the channel attention module and the pixel attention module and an input of the residual block into the second convolutional layer for fusion after element-by-element summation.
In a third aspect, the present disclosure further provides a device. The device includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor, when executing the computer program, implements the image haze removal method according to any of the first aspect.
Compared with the prior art, the present disclosure has the following beneficial effects:
First, the present disclosure improves a convolutional neural network with a fixed receptive field, and uses the residual dual attention fusion modules as basic modules, and each of the residual dual attention fusion modules is formed by fusion of the residual block, the channel attention module, and the pixel attention module. By combining relevant features of different feature maps, the pixel features are obtained while the global dependency of each feature map is enhanced, details are better preserved while the number of parameters is reduced, and the haze removal effect is improved.
Second, the present disclosure adopts an end-to-end haze removal network, and only three residual dual attention fusion modules are set inside each residual group, thus reducing the complexity of the model and improving the efficiency of model training.
The present disclosure is further illustrated below in conjunction with the accompanying drawings. The following examples are provided merely to more clearly illustrate the technical solution of the present disclosure, and are not intended to limit the scope of the present disclosure.
As shown in
Specifically, as shown in
As shown in
where Zc(x, y) represents a pixel value of an input Zc of a c-th channel at a position (x,y), and c∈{R, G, B}; after global average pooling, the dimension of a feature map is changed from C×H×W to C×1×1; δ represents the ReLU activation function, σ represents the Sigmoid activation function, ⊗ represents element-by-element multiplication; and a mapping function from an input Fc of the channel attention module to the output FCABc of the channel attention module is HCAB.
The first convolutional layer of the channel attention module uses 8 convolutional kernels with the size of 1*1, and the second convolutional layer uses 64 convolutional kernels with the size of 1*1.
As shown in
F
PA=σ(Conv(δ(Conv(F)))) (4)
F
PAB
=F
PA
⊗F (5)
F
PAB
=H
PAB(F) (6)
where FPA represents the feature weight of the output, the dimension is changed from C×H×W to 1×H×W, and a mapping function from an input F of the pixel attention module to the output FPAB of the pixel attention module is HPAB.
The first convolutional layer of the pixel attention module uses 8 convolutional kernels with the size of 1*1, and the second convolutional layer uses 1 convolutional kernel with the size of 1*1. Other convolutional layers use 64 convolutional kernels with the size of 3*3.
As shown in
F
g,m
=H
RDAFM(Fg,m-1) (7)
F
g=Conv(Fg,3)⊕Fg,0 (8)
F
g
=H
RG(Fg,0) (9)
where Fg,m-1 and Fg,m represent an input and an output of an mth residual dual attention fusion module in a gth residual group, respectively, g=1, 2, 3, and m=1, 2, 3; a mapping function from an input Fg,m-1 of the residual dual attention fusion module to an output Fg,m of the residual dual attention fusion module is HRDAFM; and a mapping function from an input Fg,0 of the residual group to the output Fg of the residual group is HRG.
As shown in
F
RB=δ(Conv(F))⊕F (10)
F*=Conv(FRB) (11)
F
RDAFM=Conv(FCAB(F*)⊕FPAB(F*)⊕F) (12)
F
RDAFM
=H
RDAFM(F) (13)
where ⊕ represents element-by-element summation, FRB represents the output of the residual block, F* represents the inputs of the attention modules, and a mapping function from the input F of the residual dual attention fusion module to the output FRDAFM of the residual dual attention fusion module is HRDAFM.
The haze removal model is trained by the following steps: acquire an RESIDE dataset, and construct a training sample set by randomly selecting 6000 pairs of hazy images and haze-free images from the RESIDE dataset; and train the neural network with the training sample set to obtain the haze removal model. During use, the hazy image to be processed is acquired and input into the haze removal model to obtain the haze-free image.
A loss function L of the neural network is expressed as:
where N is the number of training samples, Jigt is a real clear image of an ith training sample, and Ĵi is a haze-free image estimated by the neural network for the ith training sample.
In the neural network, weight parameters of the network are initialized with an Adam optimizer, where default values of β1 and β2 are 0.9 and 0.999, respectively. An initial learning rate α is set as 1×10−4. The learning rate is updated using a cosine annealing strategy, and is adjusted from the initial value to 0:
where T is the total number of batches, α is the initial learning rate, t is a current batch, and αt is an adaptively updated learning rate.
For each sample image input into the training set of the haze removal network model, the total loss of the difference between a real clear image and a haze-removed image restored by the network is first obtained using forward propagation, and then weight parameters are updated based on the Adam optimizer. The total number of training steps is 1×105, and every 200 steps is a batch, for a total of 500 batches. The above steps are repeated until the set maximum step length is reached, so as to obtain the trained haze removal network model, with expressions as follows:
F
0=Conv(I) (16)
F
g
=H
RG(Fg-1) (17)
F=
{F
3
,F
2
,F
1} (18)
Ĵ=Conv(Conv(HPAB(HCAB(F))))⊕I (19)
where I represents the input hazy image, Fg-1 and Fg represent an input and an output of the gth residual group, respectively, g=1, 2, 3, {⋅} represents the operation of in-channel connection, and Ĵ represent a restored output image.
In this example, an image haze removal apparatus is further provided. The apparatus includes:
an image acquiring module, configured to acquire a hazy image to be processed; and
an image haze removal module, configured to input the hazy image to be processed into a haze removal model for processing, and output a haze-free image corresponding to the hazy image to be processed.
The haze removal model includes a plurality of residual groups. Each of the residual groups includes a plurality of residual dual attention fusion modules connected in series. Each of the residual dual attention fusion modules includes a residual block, a first convolutional layer, a channel attention module, a pixel attention module, and a second convolutional layer. An output of the residual block is connected to inputs of the channel attention module and the pixel attention module via the first convolutional layer. Outputs of the residual dual attention fusion modules are obtained by inputting outputs of the channel attention module and the pixel attention module and an input of the residual block into the second convolutional layer for fusion after element-by-element summation, such that pixel features are obtained while global dependency of each feature map is enhanced.
In this example, a device is further provided. The device includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor. The processor, when executing the computer program, implements the image haze removal method according to Example 1.
Those skilled in the art will appreciate that the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may take the form of a full hardware embodiment, a full software embodiment, or an embodiment combining software and hardware. Besides, the present application may adopt the form of a computer program product implemented on one or more computer available storage media (including but not limited to a disk memory, a CD-ROM, an optical memory and the like) including computer available program codes.
The present application is described with reference to the flow diagram and/or block diagram of the method, device (system), and computer program product according to the embodiments of the present application. It should be understood that each flow and/or block in the flow diagram and/or block diagram and the combination of flows and/or blocks in the flow diagram and/or block diagram may be implemented by computer program instructions. These computer program instructions may be provided to processors of a general-purpose computer, a special-purpose computer, an embedded processor or other programmable data processing devices to generate a machine, such that instructions executed by processors of a computer or other programmable data processing devices generate an apparatus for implementing the functions specified in one or more flows of the flow diagram and/or one or more blocks of the block diagram.
These computer program instructions may also be stored in a computer-readable memory capable of guiding a computer or other programmable data processing devices to work in a specific manner, such that instructions stored in the computer-readable memory generate a manufactured product including an instruction apparatus, and the instruction apparatus implements the functions specified in one or more flows of the flow diagram and/or one or more blocks of the block diagram.
These computer program instructions may also be loaded on a computer or other programmable data processing devices, such that a series of operation steps are executed on the computer or other programmable devices to produce computer-implemented processing, and thus, the instructions executed on the computer or other programmable devices provide steps for implementing the functions specified in one or more flows of the flow diagram and/or one or more blocks of the block diagram.
The above are only preferred implementations of the present disclosure. It should be noted that those of ordinary skill in the art can also make several improvements and transformation without departing from the technical principle of the present disclosure, and these improvements and transformation shall also fall within the scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202111091274.2 | Sep 2021 | CN | national |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2022/107282 | Jul 2022 | US |
Child | 17987763 | US |