The present disclosure relates to the field of image processing, and more particularly, to and a method and a system for processing images using cross-stage skip connections.
When images are captured under, for example, low-light conditions or underwater conditions, it may be hard to identify content of the image due to a low signal-to-noise ratio (SNR), low contrast, and/or a narrow dynamic range. Image denoising techniques remove image noise. Image enhancement techniques improve perceptual qualities such as contrast of images. Image denoising techniques and/or image enhancement techniques aim at providing images with saturated colors and fruitful details albeit being taking under, for example, low-light conditions or underwater conditions.
In a first aspect of the present disclosure, a computer-implemented method includes receiving and processing a first image, and outputting a first feature map by an encoder. The encoder includes a plurality of first convolutional stages that receive the first image and output stage-by-stage a plurality of second feature maps corresponding to the first convolutional stages. The second feature maps have gradually decreased scales. For each second convolutional stage of the first convolutional stages, a first skip connection is added between each second convolutional stage and each of at least one remaining convolutional stage of the first convolutional stages corresponding to each second convolutional stage.
In a second aspect of the present disclosure, a system includes at least one memory and at least one processor. The at least one memory is configured to store program instructions. The at least one processor is configured to execute the program instructions, which cause the at least one processor to perform steps including receiving and processing a first image, and outputting a first feature map by an encoder. The encoder includes a plurality of first convolutional stages that receive the first image and output stage-by-stage a plurality of second feature maps corresponding to the first convolutional stages. The second feature maps have gradually decreased scales. For each second convolutional stage of the first convolutional stages, a first skip connection is added between each second convolutional stage and each of at least one remaining convolutional stage of the first convolutional stages corresponding to each second convolutional stage.
In a third aspect of the present disclosure, a system includes at least one memory and at least one processor. The at least one memory is configured to store program instructions. The at least one processor is configured to execute the program instructions, which cause the at least one processor to perform steps including receiving and processing a first feature map, and outputting a first image by a decoder. The first feature map is output by an encoder, and the decoder includes a plurality of first convolutional stages that receive the first feature map and output stage-by-stage a plurality of second feature maps corresponding to the first convolutional stages. The first feature map and the second feature maps have gradually increased scales. For each second convolutional stage of the last convolutional stage of the encoder and the first convolutional stages, a first skip connection is added between each second convolutional stage and each of at least one remaining convolutional stage of the first convolutional stages corresponding to each second convolutional stage. The last convolutional stage of the encoder outputs the first feature map. Each second convolutional stage outputs a corresponding third feature map of which a scale is increased in a corresponding third convolutional stage of the first convolutional stages. The corresponding third convolutional stage is immediately subsequent to each second convolutional stage.
In order to more clearly illustrate the embodiments of the present disclosure or related art, the following figures will be described in the embodiments are briefly introduced. It is obvious that the drawings are merely some embodiments of the present disclosure, a person having ordinary skill in this field can obtain other figures according to these figures without paying the premise.
Embodiments of the present disclosure are described in detail with the technical matters, structural features, achieved objects, and effects with reference to the accompanying drawings as follows. Specifically, the terminologies in the embodiments of the present disclosure are merely for describing the purpose of the certain embodiment, but not to limit the invention.
As used here, the term “using” refers to a case in which an object is directly employed for performing a step, or a case in which the object is modified by at least one intervening step and the modified object is directly employed to perform the step.
According to a first aspect, a computer-implemented method is provided and includes receiving and processing a first image, and outputting a first feature map by an encoder. The encoder includes: a plurality of first convolutional stages that receive the first image and output stage-by-stage a plurality of second feature maps corresponding to the first convolutional stages. The second feature maps have gradually decreased scales. For each second convolutional stage of the first convolutional stages, a first skip connection is added between each second convolutional stage and each of at least one remaining convolutional stage of the first convolutional stages corresponding to each second convolutional stage.
In some embodiments, the first skip connection includes downscaling one of the second feature maps, to generate a third feature map, and adding the third feature map in addition, to obtain a sum Xj of fourth feature maps by following an equation:
X
j=Σi=ai=jFij
Wherein a is a stage number of the first of the first convolutional stages, i is a stage number of a first source stage in the first convolutional stages, j is a stage number of a first destination stage of the first convolutional stages, and when i<j, Fij is the third feature map obtained by the first skip connection between the first source stage with the stage number i, and the first destination stage with the stage number j, and when i=j, Fij is a fifth feature map obtained by the first destination stage with the stage number. A scale of the third feature map is same as a scale of the fifth feature map.
In some embodiments, downscaling is performed by a first downscaling stage including a first activation function that outputs the third feature map; the first destination stage with the stage number j comprises a first convolutional layer and a second activation function; the first convolutional layer outputs the fifth feature map, and the second activation function receives the sum of the fourth feature maps and outputs a sixth feature map of the second feature maps.
In some embodiments, the first downscaling stage further comprises a second convolutional layer preceding the first activation function and having a first stride such that the second convolutional layer decreases the scale of the one of the second feature maps to the scale of the third feature map.
In some embodiments, the second convolutional layer is a 1×1 convolutional layer.
In some embodiments, the first downscaling stage further comprises a first pooling layer that decreases the scale of the one of the second feature maps to the scale of the third feature map, and a third convolutional layer following the first pooling layer and having a stride of 1.
In some embodiments, the third convolutional layer is a 1×1 convolutional layer.
In some embodiments, a number of channels of the fifth feature map is set such that the fifth feature map does not have information which is redundant with respect to the third feature map.
In some embodiments, the last of the second feature maps is the first feature map.
In some embodiments, the encoder further includes a bottleneck stage that receives the last of the second feature maps and outputs the first feature map, wherein the bottleneck stage comprises a global pooling layer.
In some embodiments, the method further includes: receiving and processing the first feature map, and outputting a second image by a decoder. The decoder includes: a plurality of third convolutional stages that receive the first feature map and output stage-by-stage a plurality of seventh feature maps corresponding to the third convolutional stages. The first feature map and the seventh feature maps have gradually increased scales. For each fourth convolutional stage of the last convolutional stage of the encoder and the third convolutional stages, a second skip connection is added between each fourth convolutional stage and each of at least one remaining convolutional stage of the third convolutional stages corresponding to each fourth convolutional stage. The last convolutional stage of the encoder outputs the first feature map. Each fourth convolutional stage outputs a corresponding eighth feature map of which a scale is increased in a corresponding fifth convolutional stage of the third convolutional stages, wherein the corresponding fifth convolutional stage is immediately subsequent to each fourth convolutional stage.
In some embodiments, the second skip connection comprises upscaling one of the first feature map and the seventh feature maps, to generate a ninth feature map, and adding the ninth feature map in addition, to obtain a sum Xn of tenth feature maps by following an equation:
X
j=Σi=ai=jFij
Wherein b is a stage number of the last convolutional stage of the encoder, m is a stage number of a second source stage which is one of the last convolutional stage of the encoder and the third convolutional stages, n is a stage number of a second destination stage of the third convolutional stages, and when m<n, Fmn is the ninth feature map obtained by the second skip connection between the second source stage with the stage number m and the second destination stage with the stage number n, and when m=n, Fmn is an eleventh feature map obtained by the second destination stage with the stage number n. A scale of the ninth feature map is same as a scale of the eleventh feature map.
In some embodiments, upscaling is performed by a first upscaling stage comprising a third activation function that outputs the ninth feature map; the second destination stage with the stage number n comprises a fourth convolutional layer and a fourth activation function; the fourth convolutional layer outputs the eleventh feature map, and the fourth activation function receives the sum of the tenth feature maps and outputs a twelfth feature map of the seventh feature maps.
In some embodiments, the first upscaling stage further includes a first deconvolutional layer preceding the third activation function and having a second stride such that the first deconvolutional layer increases the scale of the one of the first feature map and the seventh feature maps to the scale of the ninth feature map.
In some embodiments, the first deconvolutional layer is a 1×1 deconvolutional layer.
In some embodiments, the first upscaling stage further comprises a first upsampling layer that increases the scale of the one of the first feature map and the seventh feature maps to the scale of the ninth feature map, and a fifth convolutional layer following the first upsampling layer and having a stride of 1.
In some embodiments, the fifth convolutional layer is a 1×1 convolutional layer.
In some embodiments, a number of channels of the eleventh feature map is set such that the eleventh feature map does not have information which is redundant with respect to the ninth feature map.
According to a second aspect, a system is provided and includes at least one memory configured to store program instructions; and at least one processor configured to execute the program instructions, which cause the at least one processor to perform steps including: receiving and processing a first image, and outputting a first feature map by an encoder. The encoder includes a plurality of first convolutional stages that receive the first image and output stage-by-stage a plurality of second feature maps corresponding to the first convolutional stages. The second feature maps have gradually decreased scales. For each second convolutional stage of the first convolutional stages, a first skip connection is added between each second convolutional stage and each of at least one remaining convolutional stage of the first convolutional stages corresponding to each second convolutional stage.
According to a third aspect, a system is provided and includes: at least one memory configured to store program instructions; and at least one processor configured to execute the program instructions, which cause the at least one processor to perform steps including: receiving and processing a first feature map, and outputting a first image by a decoder. The decoder includes: a plurality of first convolutional stages that receive the first feature map and output stage-by-stage a plurality of second feature maps corresponding to the first convolutional stages. The first feature map and the second feature maps have gradually increased scales. For each second convolutional stage of the last convolutional stage of the encoder and the first convolutional stages, a first skip connection is added between each second convolutional stage and each of at least one remaining convolutional stage of the first convolutional stages corresponding to each second convolutional stage. The last convolutional stage of the encoder outputs the first feature map. Each second convolutional stage outputs a corresponding third feature map of which a scale is increased in a corresponding third convolutional stage of the first convolutional stages, wherein the corresponding third convolutional stage is immediately subsequent to each second convolutional stage.
The digital camera module 102 is an inputting hardware module and is configured to capture an input image 206 (labeled in
When the input image is captured, for example, under a low-light condition or an underwater condition, or with an insufficient amount of exposure time, it may be hard to identify content of the input image due to a low signal-to-noise ratio (SNR), low contrast, and/or a narrow dynamic range. The memory module 106 may be a transitory or non-transitory computer-readable medium that includes at least one memory storing program instructions that, when executed by the processor module 104, cause the processor module 104 to process the input image. The processor module 104 implements an encoder-decoder network 200 (shown in
The display module 108 is an outputting hardware module and is configured to display the output image 208 that is received from the processor module 104 through the buses 114. Alternatively, the output image 208 may be output using another outputting hardware module, such as the storage module 110, or the wired or wireless communication module 112. The storage module 110 is configured to store the output image 208 that is received from the processor module 104 through the buses 114. The wired or wireless communication module 112 is configured to transmit the output image 208 to the network through wired or wireless communication, wherein the output image 208 is received from the processor module 104 through the buses 114.
The terminal 100 is one type of computing system all of components of which are integrated together by the buses 114. Other types of computing systems such as a computing system that has a remote digital camera module instead of the digital camera module 102 are within the contemplated scope of the present disclosure.
In an embodiment, the encoder-decoder network 200 has a U-net architecture. Examples of the U-net architecture are described in more detail in “U-net: Convolutional networks for biomedical image segmentation,” O. Ronneberger, P. Fischer, and T. Brox, arXiv preprint arXiv: 1505.04597 [cs.CV], 2015. The encoder-decoder network 200 includes an encoder 202 and a decoder 204. The encoder 202 is configured to receive the input image 206, extract features of the input image 206, and output a feature map F5. The decoder 204 is configured to receive the feature map F5, reconstruct from the feature map F5, and output the output image 208. The encoder 202 includes a plurality of convolutional stages S1 to S5. The decoder 204 includes a plurality of convolutional stages S6 to S10. The convolutional stages S1 to S5 receive the input image 206, and output stage-by-stage a plurality of feature maps F1 to F5 corresponding to the convolutional stages S1 to S5. The convolutional stages S6 to S9 receive the feature map F5, and output stage-by-stage a plurality of feature maps F6 to F9 corresponding to the convolutional stages S6 to S9. The convolutional stage S10 receives the feature map F9 and outputs the output image 208. In an embodiment, the convolutional stage S10 includes a 1×1 vanilla convolutional layer that receives the feature map F9 and outputs the output image 208.
The feature maps F1 to F9 are multi-channel feature maps. For the encoder 202, the feature maps F1 to F5 have gradually decreased scales (i.e. spatial resolutions), which is represented by decreasing sizes of rectangles corresponding to the convolutional stages S1 to S5. The feature maps F1 to F5 have gradually increased number of channels. For the decoder 204, the feature maps F5 to F9 have gradually increased scales, which is represented by increasing sizes of rectangles corresponding to the convolutional stages S5 to S9. The feature maps F5 to F9 have gradually decreased number of channels.
Cross-stage skip connections S12 to S45 are added for the convolutional stages S1 to S5. For each convolutional stage S1, . . . , or S4 of the convolutional stages S1 to S5, the skip connection S12, . . . , or S45 is added between each convolutional stage S1, . . . , or S4 and each of at least one remaining convolutional stage S2 to S5, . . . , or S5 of the convolutional stages S1 to S5 corresponding to each convolutional stage S1, . . . , or S4. Cross-stage skip connections S56 to S89 are added for the convolutional stage S5 of the encoder 202, and the convolutional stages S6 to S9 of the decoder 204. For each convolutional stage S5, . . . , or S8 of the last convolutional stage S5 of the encoder 202 and the convolutional stages S6 to S9 of the decoder 204, the skip connection S56, . . . , or S89 is added between each convolutional stage S5, . . . , or S8 and each of at least one remaining convolutional stage S6 to S9, . . . , or S9 of the convolutional stages S6 to S9 corresponding to each convolutional stage S5, . . . , or S8. The last convolutional stage S5 of the encoder 202 outputs the feature map F5. Each convolutional stage S5, . . . , or S8 of the last convolutional stage S5 of the encoder 202 and the convolutional stages S6 to S9 of the decoder 204 outputs a corresponding feature map F5, . . . , or F8 of which a scale is increased in a corresponding convolutional stage S6, . . . , or S9 of the convolutional stages S6 to S9, and the corresponding convolutional stage S6, . . . , or S9 is immediately subsequent to each convolutional stage S5, . . . , or S8.
The first number of a reference numeral of a skip connection is a stage number of a source stage. The second number of a reference numeral of the skip connection is a stage number of a destination stage. For example, the first number “1” of a reference numeral “S12” of the skip connection S12 is a stage number “1” of the source stage S1, and the second number “2” of the reference numeral “S12” of the skip connection S12 is a stage number “2” of the destination stage. For simplicity, the skip connections in
The term “at least one remaining convolutional stage corresponding to a first convolutional stage” refers that in a group of convolutional stages, the at least one remaining convolutional stage corresponding to the first convolutional stage is all of at least one convolutional stage that is immediately subsequent to the first convolutional stage.
The convolutional stage S2 includes a downscaling layer B1, a convolutional layer B2 with the first activation function, a convolutional layer B3 without the first activation function, and an activation function B4. The downscaling layer B1 receives the feature map F1, the downscaling layer B1, the convolutional layers B2 and B3, and the activation function B4 process layer-by-layer, and the activation function B4 outputs the feature map F2. The downscaling layer B1 downscales the feature map F1 with a downscaling factor such as 2. In an embodiment, the downscaling layer B1 is a pooling layer such as a max pooling layer or an average pooling layer. Other downscaling layers such as a convolutional layer with a stride of 2 are within the contemplated scope of the present disclosure. In an embodiment, the convolutional layers B2 and B3 are 3×3 convolutional layers. In an embodiment, the activation function B4 is a nonlinear activation function such as a Leaky ReLU operation.
The convolutional stage S3 includes a downscaling layer C1, a convolutional layer C2 with the first activation function, a convolutional layer C3 without the first activation function, a summation block 302, and an activation function C4. The downscaling layer C1 receives the feature map F2, the downscaling layer C1, the convolutional layers C2 and C3, and the activation function C4 process layer-by-layer, and the activation function C4 outputs the feature map F3. The downscaling layer C1 downscales the feature map F2 with a downscaling factor such as 2. In an embodiment, the downscaling layer C1 is a pooling layer such as a max pooling layer or an average pooling layer. Other downscaling layers such as a convolutional layer with a stride of 2 are within the contemplated scope of the present disclosure. In an embodiment, the convolutional layers C2 and C3 are 3×3 convolutional layers. In an embodiment, the activation function C4 is a nonlinear activation function such as a Leaky ReLU operation.
The skip connection S13 or S23 includes downscaling the feature map F1 or F2 of the feature maps F1 to F5 (shown in
X
j=Σi=ai=jFij (1)
, where a is a stage number “1” of the first of the convolutional stages S1 to S5, i is a stage number “1” or “2” of a source stage S1 or S2 in the convolutional stages S1 to S5, j is a stage number “3” of a destination stage S3 of the convolutional stages S1 to S5, and when i<j, Fij is the feature map F13 or F23 obtained by the skip connection S13 or S23 between the source stage S1 or S2 with the stage number i, and the destination stage S3 with the stage number j, and when i=j, Fij is a feature map F33 obtained by the destination stage S3 with the stage number j. A scale and a number of channels of each of the feature maps F13 and F23, is same as a scale and a number of channels of the feature map F33. In an embodiment, because the downscaling layer B1 and the downscaling layer C1 has the downscaling factor of 2, the downscaling stage L13 has a downscaling factor of 4, and the downscaling stage L23 has a downscaling factor of 2. Each summation operation (i.e. adding operation) in equation (1) is an element-wise summation operation.
In an embodiment, a number of channels of the feature map F33 is set such that the feature map F33 does not have information which is redundant with respect to the feature map F13 or F23. In this way, the convolutional stage S3 does not need to learn and generate information that has been learned and generated by the convolutional stages S1 and S2. The reuse of the feature map F13 or F23 instead of having redundant information respect to the feature map F13 or F23 in the feature map F33 is represented by 3 dashed lines with different dashed line styles corresponding to the feature maps F13, F23, and F33.
Kernel sizes of the convolutional layers, the downscaling factors, and the first activation function and the activation functions B4 and C4 being the same activation function are only exemplary and the present embodiment is not limited to these particular configurations.
The convolutional stage S2 also includes a summation block similar to the summation block 302 and the summation block of the convolutional stage S2 is omitted in
The convolutional stage S6 includes an upscaling layer I1, a convolutional layer 12 with the first activation function, a convolutional layer 13 without the first activation function, and an activation function I4. The upscaling layer I1 receives the feature map F5, the upscaling layer I1, the convolutional layers 12 and 13, and the activation function I4 process layer-by-layer, and the activation function I4 outputs the feature map F6. The upscaling layer I1 upscales the feature map F5 with an upscaling factor such as 2. In an embodiment, the upscaling layer I1 is an upsampling layer that performs linear interpolation or bilinear interpolation. Other upscaling layers such as a deconvolutional layer with a stride of 2 are within the contemplated scope of the present disclosure. In an embodiment, the convolutional layers 12 and 13 are 3×3 convolutional layers. In an embodiment, the activation function I4 is a nonlinear activation function such as a Leaky ReLU operation.
The convolutional stage S7 includes an upscaling layer J1, a convolutional layer J2 with the first activation function, a convolutional layer J3 without the first activation function, a summation block 502, and an activation function J4. The upscaling layer J1 receives the feature map F6, the upscaling layer J1, the convolutional layers J2 and J3, and the activation function J4 process layer-by-layer, and the activation function J4 outputs the feature map F7. The upscaling layer J1 upscales the feature map F6 with an upscaling factor such as 2. In an embodiment, the upscaling layer J1 is an upsampling layer that performs linear interpolation or bilinear interpolation. Other upscaling layers such as a deconvolutional layer with a stride of 2 are within the contemplated scope of the present disclosure. In an embodiment, the convolutional layers J2 and J3 are 3×3 convolutional layers. In an embodiment, the activation function J4 is a nonlinear activation function such as a Leaky ReLU operation.
The skip connection S57 or S67 includes upscaling the feature map F5 or F6 of the feature maps F5 to F9 (shown in
X
n=Σm=bm=nFmn (2)
, where b is a stage number “5” of the last convolutional stage S5 of the encoder 202, m is a stage number “5” or “6” of a source stage S5 or S6 which is one of the last convolutional stage S5 the encoder 202 and the convolutional stages S6 to S9 of the decoder 204, n is a stage number “7” of a destination stage S7 of the convolutional stages S6 to S9, and when m<n, Fmn is the feature map F57 or F67 obtained by the skip connection S57 or S67 between the source stage S5 or S6 with the stage number m and the destination stage S7 with the stage number n, and when m=n, Fmn is a feature map F77 obtained by the destination stage S7 with the stage number n. A scale and a number of channels of each of the feature maps F57 and F67, is same as a scale and a number of channels of the feature map F77. In an embodiment, because the upscaling layer I1 and the upscaling layer J1 has the upscaling factor of 2, the upscaling stage L57 has an upscaling factor of 4, and the upscaling stage L67 has an upscaling factor of 2. Each summation operation (i.e. adding operation) in equation (2) is an element-wise summation operation.
In an embodiment, a number of channels of the feature map F77 is set such that the feature map F77 does not have information which is redundant with respect to the feature map F57 or F67. In this way, the convolutional stage S7 does not need to learn and generate information that has been learned and generated by the convolutional stages S5 and S6. The reuse of the feature map F57 or F67 instead of having redundant information respect to the feature map F57 or F67 in the feature map F77 is represented by 3 dashed lines with different dashed line styles corresponding to the feature maps F57, F67, and F77.
Kernel sizes of the convolutional layers, the downscaling factor, the upscaling factors, and the first activation function and the activation functions I4 and J4 being the same activation function are only exemplary and the present embodiment is not limited to these particular configurations.
The convolutional stage S6 also includes a summation block similar to the summation block 502 and the summation block of the convolutional stage S6 is omitted in
In an embodiment, the bottleneck stage G5 includes a global pooling layer, and at least one convolutional layer with the first activation function. The global pooling layer receives the feature map F5, the global pooling layer and the at least one convolutional layer process layer-by-layer, and the at least one convolutional layer outputs the feature map F5′. In an embodiment, a number of layers of the at least one convolutional layer is 3. Each of the at least one convolutional layer is a 1×1 convolutional layer.
In an embodiment, the feature map F4 output by the activation function of the convolutional stage S4 and a feature map output by the upscaling layer of the convolutional stage S6 have a substantially same scale. The skip connection 810 includes concatenating the feature map F4 and the feature map output by the upscaling layer of the convolutional stage S6. The feature map output by the upscaling layer of the convolutional stage S6 is input to layers of the convolutional stage S6 subsequent to the upscaling layer of the convolutional stage S6, to generate the feature map F6′ output by the convolutional stage S6. Similarly, the feature maps F3, F2, and F1 correspondingly output by the activation functions of the convolutional stages S3, S2, and S1 and feature maps correspondingly output by the upscaling layers of the convolutional stages S7, S8, and S9 have substantially same corresponding scales. The skip connections 812, 814, and 816 include correspondingly concatenating the feature maps F3, F2, and F1 and the feature maps output by the upscaling layers of the convolutional stages S7, S8, and S9. The feature maps output by the upscaling layers of the convolutional stage S7, S8, and S9 are correspondingly input to layers of the convolutional stages S7, S8, and S9 subsequent to the upscaling layers of the convolutional stages S7, S8, and S9, to correspondingly generate the feature map F7′, F8′, and F9′ correspondingly output by the convolutional stages S7, S8, and S9.
Furthermore, in an embodiment, during training, the input image 206 of the encoder-decoder network 200 is a short-exposure image captured under, for example, a low-light condition or an underwater condition. A loss function is calculated between the output image 208 of the encoder-decoder network 200 and a ground-truth image which is a corresponding long-exposure image. The loss function is a weighted joint loss of 1 and multi-scale structured similarity index (MS-SSIM), which is defined by equation (3):
=λ+(1−λ)MS-SSIM (3)
, where λ is set to 0.16 empirically, is the loss defined by equation (4), and MS-SSIM represents MS-SSIM loss given by equation (5). Equation (4) is as follows:
where Î and I are the output image 208 and the ground-truth image, respectively, and N is the total number of pixels in the input image 206. Equation (5) is as follows:
MS-SSIM=1−MS−SSIM (5)
, where MS-SSIM for pixel i is defined by equations (6)-(8). Equations (6)-(8) are as follows:
where x and y represent two discrete non-negative signals that have been aligned with each other (e.g. two image patches extracted from a same spatial location from two images being compared, respectively); μx and μy are means, σx and σy are standard deviations, M is the number of levels, and α,β are the weights to adjust the contribution of each component. The means μx and μy, and the standard deviations σx and σy are calculated with a Gaussian filter, G9, with zero mean and a standard deviation σg. Examples of MS-SSIM are described in more detail in “Multiscale structural similarity for image quality assessment,” Z. Wang, E. P. Simoncelli, A. C. Bovik, Conference on Signals, Systems and Computers, 2004.
Table 1, below, illustrates experimental results that may be achieved by the embodiments described with reference to
Some embodiments have one or a combination of the following features and/or advantages. In an embodiment, an encoder of an encoder-decoder network includes a plurality of first convolutional stages. For each second convolutional stage of the first convolutional stages, a first skip connection is added between each second convolutional stage and each of at least one remaining convolutional stage of the first convolutional stages corresponding to each second convolutional stage. In an embodiment, a decoder of the encoder-decoder network includes a plurality of third convolutional stages. For each fourth convolutional stage of the last convolutional stage of the encoder and the third convolutional stages, a second skip connection is added between each fourth convolutional stage and each of at least one remaining convolutional stage of the third convolutional stages corresponding to each fourth convolutional stage. Because information flow and gradient propagation of the encoder-decoder network may be improved by the first skip connection and the second skip connection, performance such as PSNR of an output image of the encoder-decoder network may be improved. In an embodiment, each of the first skip connection and the second skip connection is between a destination stage and a source stage. A number of the channels of a feature map of the destination stage is set such that the feature map of the destination stage does not have information which is redundant with respect to the feature map of the source stage modified by the first skip connection or the second skip connection. Because the feature map of the source stage modified by the first skip connection or the second skip connection is reused, a number of parameters of the encoder-decoder network may be reduced without sacrificing performance of the output image.
A person having ordinary skill in the art understands that each of the units, modules, layers, blocks, algorithm, and steps of the system or the computer-implemented method described and disclosed in the embodiments of the present disclosure are realized using hardware, firmware, software, or a combination thereof. Whether the functions run in hardware, firmware, or software depends on the condition of application and design requirement for a technical plan. A person having ordinary skill in the art can use different ways to realize the function for each specific application while such realizations should not go beyond the scope of the present disclosure.
It is understood that the disclosed system, and computer-implemented method in the embodiments of the present disclosure can be realized with other ways. The above-mentioned embodiments are exemplary only. The division of the modules is merely based on logical functions while other divisions exist in realization. The modules may or may not be physical modules. It is possible that a plurality of modules are combined or integrated into one physical module. It is also possible that any of the modules is divided into a plurality of physical modules. It is also possible that some characteristics are omitted or skipped. On the other hand, the displayed or discussed mutual coupling, direct coupling, or communicative coupling operate through some ports, devices, or modules whether indirectly or communicatively by ways of electrical, mechanical, or other kinds of forms.
The modules as separating components for explanation are or are not physically separated. The modules are located in one place or distributed on a plurality of network modules. Some or all of the modules are used according to the purposes of the embodiments.
If the software function module is realized and used and sold as a product, it can be stored in a computer readable storage medium. Based on this understanding, the technical plan proposed by the present disclosure can be essentially or partially realized as the form of a software product. Or, one part of the technical plan beneficial to the conventional technology can be realized as the form of a software product. The software product is stored in a computer readable storage medium, including a plurality of commands for at least one processor of a system to run all or some of the steps disclosed by the embodiments of the present disclosure. The storage medium includes a USB disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a floppy disk, or other kinds of media capable of storing program instructions.
While the present disclosure has been described in connection with what is considered the most practical and preferred embodiments, it is understood that the present disclosure is not limited to the disclosed embodiments but is intended to cover various arrangements made without departing from the scope of the broadest interpretation of the appended claims.
This application is a continuation-application of International (PCT) Patent Application No. PCT/CN2019/105464 filed on Sep. 11, 2019, which claims priority to a U.S. application No. 62/767,942 filed on Nov. 15, 2018, the entire contents of both of which are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
62767942 | Nov 2018 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2019/105464 | Sep 2019 | US |
Child | 17319597 | US |