The present application claims priority to Chinese Patent Application No. 202311084703.2, filed Aug. 28, 2023, the entire disclosure of which is incorporated herein by reference.
The present application relates to a low-light image enhancement method and device based on wavelet transform and Retinex-Net, and belongs to the field of image enhancement technology in image processing.
Images, as a carrier of information transmission that stores, transmits, displays and analyses information, have gradually become an important means of communication, and the quality of the images is directly related to the amount of information that people can obtain. Remote sensing satellites, medical equipment, camera shots, surveillance cameras, etc. can all be used to obtain the desired image information. However, due to unavoidable environmental and technological limitations, such as insufficient light and limited exposure time, the images are often captured under sub-optimal lighting conditions. Interfered with by backlighting, non-uniform light, low light, multi-coloured light, etc., such images suffer from buried scene content, reduced contrast, strong noise and colour inaccuracies, and unsatisfactory information transfer for high-level tasks such as object tracking, recognition and detection. As a result, more and more researchers devote their efforts to the field of low-light image enhancement.
The traditional low-light image enhancement methods mainly include the methods based on histogram equalization, defogging, wavelet transform, and Retinex model. The wavelet transform method can better preserve the details of the image. The method using Retinex model decomposes the image into a reflection component and a illumination component, and the colour of the object depends on its ability to reflect light of different wavelengths, which is not affected by the light intensity. Although these methods are able to achieve low-light image enhancement to a certain extent, they are still poor in enhancement performance, speed, and accuracy, so deep learning based methods are proposed.
Deep learning based solutions offer better accuracy, robustness and speed compared to traditional methods. Deep learning methods based on Retinex enjoy better enhancement performance in most of the cases due to the physically interpretable Retinex theory. The Retinex-Net, a deep learning method based on Retinex theory, enhances the image by decomposing the low-light image into illumination and reflection components, which are then processed separately and fused to obtain the enhanced image. The luminance of the processed image is significantly improved, and the overall contour of the image is more obvious, but there are obvious colour distortion phenomenon and loss of certain edge detail information.
The present application to provide a low-light image enhancement method and device based on wavelet transform and Retinex-Net, in order to solve the defects of the related art of having more obvious colour distortion phenomenon and lose of certain edge detail information.
A low-light image enhancement method based on wavelet transform and Retinex-Net, including:
In one embodiment, a decomposition expression of the discrete wavelet transform is:
HF(a1,m,n) is the low-frequency component of the to-be-processed image, HFu(a2,m,n) is the high-frequency component of the to-be-processed image including a horizontal high-frequency component H, a vertical high-frequency component V, and a diagonal high-frequency component D,
In one embodiment, the inputting the reflection component and the illumination component of the low-frequency component into the enhancement network to enhance the illumination component of the low-frequency component, to obtain the enhanced illumination component includes:
In one embodiment, the enhancement network is an encoder-decoder architecture as a whole, including a plurality of up-sampling layers, convolutional layers and jump connections.
In one embodiment, the decomposition network includes a decomposition network loss function, and the decomposition network is trained by the decomposition network loss function, and the decomposition network loss function includes a reconstruction loss Lrecon, a invariable reflectance loss Lir, and a illumination smoothness loss Lis, and is specifically expressed as:
where λir is a invariable reflectance coefficient, λis is a illumination smoothness coefficient; ∇ is a gradient, and λg is a balanced structure-perceived intensity coefficient, when the weight is exp(−λg∇Ri), Lis reduces constraints on the smoothness in a region of the reflection component of the normal light image with a large gradient.
In one embodiment, the enhancement network includes an enhancement network loss function, and the enhancement network is trained by the enhancement network loss function, the enhancement network loss function includes a reconstruction loss Lrecon and a illumination smoothness loss Lis, which is expressed as:
In one embodiment, a calculation formula for transferring a RGB component to a HSV component is expressed as:
H is the actual hue component, H′ is the enhanced hue component, Mmax is the maximum value of the red component R, the green component G, and the blue component B, and Nmin is the minimum value of the red component R, the green component G, and the blue component B.
In one embodiment, the fusing the high-frequency component of the to-be-processed image based on regional characteristics is expressed as:
E
A(m,n)=Σ(m,n)∈wDA2(m,n)
E
B(m,n)=Σ(m,n)∈wDB2(m,n)
where D is the pixel value of the high-frequency component, w is the pixel range of the region around the pixel point (m, n), and E is the energy value in the pixel region;
in many pixel points in the pixel region, the matrix with the pixel value 1 appears only once, and the pixel point in the matrix has a maximum energy, in the calculation, the maximum energy is taken as a fusion coefficient, which is expressed as:
C
a(m,n)=Σ(m,n)∈wAh(m,n)
C
b(m,n)=Σ(m,n)∈wBh(m,n)
wherein Rh is a fusion high frequency component.
In one embodiment, the using the value fusion technology of the wavelet transform to fuse the first value component V1 and the second value component V2 to obtain the third value component V and stretching the third value component V to obtain the fourth value component V′, which is expressed as:
two coefficients a and b are introduced, a+b=1, and the stretched value component is compared with an unprocessed value component to reflect a degree of value change, which is expressed as:
γ is a value ratio, μ is the coefficient reflecting the proportionality relationship, the ratio of the value component is normalized, to obtain the value ratio reflecting the degree of image stretching, which is expressed as:
according to
the fourth value component V of the stretched image is obtained.
A low-light image enhancement device based on wavelet transform and Retinex-Net, including:
Beneficial effects achieved by the present application compared with the related art:
The present application preserves more edge and texture details in the image by adding wavelet transform and fusing the high-frequency components with regional characteristics. Since the processed low-frequency and high-frequency components are transferred to the value space from the HSV space for luminance fusion and stretching, the colour distortion phenomenon of the image is effectively improved, and the expression of the details of the image is improved, and the problem of colour distortion and lack of details in Retinex-Net processing is solved, and the enhancement effect is very satisfactory.
In order to make the technical means, the creative features, the purpose and the efficacy achieved by the present application easy to understand, the present application is further elaborated in the following with specific embodiments.
As shown in
In the embodiment, in the step S1, a decomposition expression of the discrete wavelet transform is:
HF(a1,m,n) is the low-frequency component of the to-be-processed image, HFu(a2,m,n) is the high-frequency component of the to-be-processed image comprising a horizontal high-frequency component H, a vertical high-frequency component V, and a diagonal high-frequency component D,
In the embodiment, in step S2, the inputting the reflection component and the illumination component of the low-frequency component into the enhancement network to enhance the illumination component of the low-frequency component, to obtain the enhanced illumination component comprises:
In the embodiment, in the step S3, in the embodiment of the present application, the network adjustment stage includes the denoising network of the reflection component and the enhancement network. In the step S3, the reflection component of the low-frequency component is denoised using BM3D. In the step S4, the reflection component and the illumination component of the low-frequency component are inputted into the enhancement network to enhance the illumination component of the low-frequency component. The enhancement network as a whole is an encoder-decoder architecture which includes a plurality of up-sampling layers, convolutional layers, and jump connections; and the specific process includes: firstly, a convolution layer with a convolutional kernel of 3×3 and a step size of 1 is used for feature extraction, and the extracted data is sequentially input into the first down-sampling layer, the second down-sampling layer and the third down-sampling layer, each down-sampling includes a convolutional layer with a step size of 2 and an activation function ReLU. The third down-sampling is processed by using a convolutional layer with a convolutional kernel of 3×3, and then three up-samplings are carried out. A convolutional layer with a convolutional kernel of 3×3 is used for processing after each up-sampling, and a convolutional layer with a convolutional kernel of 3×3 is used for processing after the first up-sampling and then is jump-connected with the output of the second down-sampling as the input of the second up-sampling, and the spliced data is used as the output of the first up-sampling, and so on, and a convolutional layer is used for processing after the second up-sampling, and then is jump-connected with the output of the first down-sampling as the input of the second up-sampling, and a convolutional layer is used for processing after the third up-sampling, and then is jump-connected with the output of the first down-sampling as the input of the third up-sampling. Each up-sampling layer uses resize-convolutional, i.e., the up-sampling layer includes a nearest-neighbour interpolation operation, and a convolutional layer with a step size of 1 and an activation function ReLU. The outputs of the first up-sampling, the second up-sampling and the third up-sampling are spliced, and the spliced data uses a 1×1 convolutional layer to reduce the cascade features to C channels, and finally a 3×3 convolutional layer is used to reconstruct the illumination component.
In the embodiment, the decomposition network includes a decomposition network loss function, and the decomposition network is trained by the decomposition network loss function, and the decomposition network loss function includes a reconstruction loss Lrecon, a invariable reflectance loss Lir, and a illumination smoothness loss Lis, and is specifically expressed as:
λir is a invariable reflectance coefficient, λis is a illumination smoothness coefficient; ∇ is a gradient, and λg is a balanced structure-perceived intensity coefficient, when the weight is exp(−λg∇Ri), Lis reduces constraints on the smoothness in a region of the reflection component of the normal-light image with a large gradient, e.g. constraints the smoothness on where the image structure is more complex and the light is discontinuous, maintains the smoothness of the image structure, thus a clearer light image is obtained.
In the embodiment, the enhancement network includes an enhancement network loss function, and the enhancement network is trained by the enhancement network loss function, the enhancement network loss function comprises a reconstruction loss Lrecon and a illumination smoothness loss Lis, which is expressed as:
In one embodiment, a calculation formula for transferring a RGB component to a HSV component is expressed as:
H is the actual hue component, H′ is the enhanced hue component, Mmax is the maximum value of the red component R, the green component G, and the blue component B, and Nmin is the minimum value of the red component R, the green component G, and the blue component B. After transferring to HSV space, the value and hue components of the image are relatively independent, and operations on the value component will not affect the proportionality of the original hue component, so the image colour can be better preserved.
In the embodiment, the high-frequency components of the to-be-processed image are fused based on regional characteristics in the step S6. The to-be-processed image is decomposed by wavelet decomposition to obtain three high-frequency components, and the high-frequency components contain image edges and texture details, and the current mainstream high-frequency fusion algorithm rule is to take the maximum value among the absolute values of all pixel points, which is too broad, and it is easy to mix the noise generated in the process of multi-scale decomposition in the high-frequency information, which will have an effect on the subsequent value stretching, and reduce the quality of the image. Therefore, it is proposed to add the extraction of local significant features in the process of high-frequency fusion, and the high-frequency components are fused according to the regional energy, which is expressed as follows:
the fusing the high-frequency component of the to-be-processed image based on regional characteristics is expressed as:
E
A(m,n)=Σ(m,n)∈wDA2(m,n)
E
B(m,n)=Σ(m,n)∈wDB2(m,n)
where D is the pixel value of the high-frequency component, w is the pixel range of the region around the pixel point (m, n), and E is the energy value in the pixel region;
In many pixel points in the pixel region, the matrix with the pixel value 1 appears only once, and the pixel point in the matrix has a maximum energy, in the calculation, the maximum energy is taken as a fusion coefficient, which is expressed as:
C
a(m,n)=Σ(m,n)∈wAh(m,n)
C
b(m,n)=Σ(m,n)∈wBh(m,n)
Rh is a fusion high frequency component, the formulas above give the value of the maximum energy in the pixel region, which contains the most information and achieves the effect of enhancing details and uniformity of the image.
In the embodiment, the using the value fusion technology of the wavelet transform to fuse the first value component V1 and the second value component V2 to obtain the third value component V and stretching the third value component V to obtain the fourth value component V, which is expressed as:
two coefficients a and b are introduced, a+b=1, the method can make the fusion effects of the fused value components most uniform, and the stretched value component is compared with an unprocessed value component to reflect a degree of value change, which is expressed as:
γ is a value ratio, μ is the coefficient reflecting the proportionality relationship, the ratio of the value component is normalized, to obtain the value ratio reflecting the degree of image stretching, which is expressed as:
according to
the fourth value component V′ of the stretched image is obtained.
The present application discloses a low-light image enhancement device based on wavelet transform and Retinex-Net, including:
The foregoing is only a preferred embodiment of the present application, and it should be noted that, for those skilled in the art, a number of improvements and deformations may be made without departing from the technical principles of the present application, which shall also be regarded as the scope of the present application.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202311084703.2 | Aug 2023 | CN | national |