The present disclosure relates to a method and apparatus for extracting and reducing noise of a compressed image, and more particularly, to a method and apparatus for extracting and reducing noise including a compressed artifact occurring in compression of an image, based on artificial intelligence (AI).
An image is encoded by a codec according to data compression standards, e.g., the moving picture expert group (MPEG) standards, etc., and is stored in a recording medium or transmitted through a communication channel, in the form of a bitstream.
In image compression, a compressed artifact may occur. Thus, there is a need to improve the quality of a compressed image by reducing the compressed artifact.
The present disclosure provides a method of extracting noise of a compressed image and a method and apparatus for reducing the noise of the compressed image, in which structural information having a high probability of reconstruction and a high-frequency noise region having a high damage probability are separated from the compressed image based on artificial intelligence (AI) to reduce a compressed artifact occurring in image compression, thereby improving the quality of the compressed image.
In an embodiment, a method of extracting noise of a compressed image, provided in the present disclosure, includes obtaining a first image by applying a convolution filter, which sequentially performs down-convolution and up-convolution corresponding to the down-convolution, to a compressed image of an original image, obtaining a second image by subtracting the first image from the compressed image, and obtaining noise including high-frequency information and a compressed artifact of the compressed image from the second image.
In an embodiment, a method of reducing noise of a compressed image, provided in the present disclosure, includes obtaining a first image by applying a convolution filter, which sequentially performs down-convolution and up-convolution corresponding to the down-convolution, to a compressed image of an original image, obtaining a second image by subtracting the first image from the compressed image, obtaining noise including high-frequency information and a compressed artifact of the compressed image from the second image, obtaining a third image by removing the compressed artifact by applying a deep neural network (DNN) for removing the compressed artifact to the noise, and reconstructing the compressed image by summing the first image and the third image.
In an embodiment, an apparatus for extracting noise of a compressed image, provided in the present disclosure, includes a memory storing one or more instructions and a processor configured to execute the one or more instructions stored in the memory, in which the processor is further configured to obtain a first image by applying a convolution filter, which sequentially performs down-convolution and up-convolution corresponding to the down-convolution, to a compressed image of an original image, obtain a second image by subtracting the first image from the compressed image, and obtain noise including high-frequency information and a compressed artifact of the compressed image from the second image.
In an embodiment, an apparatus for reducing noise of a compressed image, provided in the present disclosure, includes a memory storing one or more instructions and a processor configured to execute the one or more instructions stored in the memory, in which the processor is further configured to obtain a first image by applying a convolution filter, which sequentially performs down-convolution and up-convolution corresponding to the down-convolution, to a compressed image of an original image, obtain a second image by subtracting the first image from the compressed image, obtain noise including high-frequency information and a compressed artifact of the compressed image from the second image, obtain a third image by removing the compressed artifact by applying a deep neural network (DNN) for removing the compressed artifact to the noise, and reconstruct the compressed image by summing the first image and the third image.
In one embodiment, disclosed is a method of extracting noise of a compressed image, the method including: obtaining a first image by applying a convolution filter, the convolution filter sequentially performing down-convolution and up-convolution corresponding to the down-convolution, to the compressed image of an original image; obtaining a second image by subtracting the first image from the compressed image; and obtaining from the second image, noise including high-frequency information of the compressed image and a compressed artifact of the compressed image.
In one embodiment, disclosed is a method of reducing noise of a compressed image, the method including: obtaining a first image by applying a convolution filter, the convolution sequentially performing down-convolution and up-convolution corresponding to the down-convolution, to the compressed image of an original image; obtaining a second image by subtracting the first image from the compressed image; obtaining from the second image, noise including high-frequency information of the compressed image and a compressed artifact of the compressed image; obtaining a third image by removing the compressed artifact by applying a deep neural network (DNN) for removing the compressed artifact, to the noise; and reconstructing the compressed image by summing the first image and the third image.
In one embodiment, disclosed is an apparatus for reducing noise of a compressed image, the apparatus including: a memory storing one or more instructions; and a processor configured to execute the one or more instructions stored in the memory, wherein the processor is further configured to: obtain a first image by applying a convolution filter, the convolution filter sequentially performing down-convolution and up-convolution corresponding to the down-convolution, to a compressed image of an original image; obtain a second image by subtracting the first image from the compressed image; obtain from the second image, noise including high-frequency information of the compressed image and a compressed artifact of the compressed image; obtain a third image by removing the compressed artifact by applying a deep neural network (DNN) for removing the compressed artifact to the noise; and reconstruct the compressed image by summing the first image and the third image.
By separating a compressed image into an image including low-frequency information and intermediate-frequency information and an image including noise including high-frequency information and a compressed artifact by using a convolution filter including down-convolution and up-convolution operations and subtraction and summation operations so as to extract the noise, inputting the image including the noise including the high-frequency information and the compressed artifact to a deep neural network (DNN) for removing the compressed artifact to remove the compressed artifact, and then summing the image having the compressed artifact removed therefrom with the image including the low-frequency information and the intermediate-frequency information, the quality of the compressed image may be improved.
Brief descriptions about respective drawings are provided to gain a sufficient understanding of the drawings of the present specification.
According to an embodiment of the present disclosure, a method of extracting noise of a compressed image includes obtaining a first image by applying a convolution filter, which sequentially performs down-convolution and up-convolution corresponding to the down-convolution, to a compressed image of an original image, obtaining a second image by subtracting the first image from the compressed image, and obtaining noise including high-frequency information and a compressed artifact of the compressed image from the second image.
According to an embodiment, the first image may include intermediate-frequency information and low-frequency information of the compressed image.
According to an embodiment, the convolution filter may be trained to obtain the first image including the intermediate-frequency information and the low-frequency information of the compressed image, with the compressed image as an input.
According to an embodiment, the down-convolution may be a convolution operation trained to reduce the size of the compressed image.
According to an embodiment, the up-convolution may be a convolution operation trained to increase the down-convoluted compressed image to the original size of the compressed image.
According to an embodiment, the down-convolution and the up-convolution may be transposed to each other.
According to another embodiment of the present disclosure, a method of reducing noise of a compressed image includes obtaining a first image by applying a convolution filter, which sequentially performs down-convolution and up-convolution corresponding to the down-convolution, to a compressed image of an original image, obtaining a second image by subtracting the first image from the compressed image, obtaining noise including high-frequency information and a compressed artifact of the compressed image from the second image, obtaining a third image by removing the compressed artifact by applying a deep neural network (DNN) for removing the compressed artifact to the noise, and reconstructing the compressed image by summing the first image and the third image.
According to an embodiment, the DNN for removing the compressed artifact may be trained to reduce the compressed artifact with high-frequency information and the compressed artifact of the compressed image as an input.
According to an embodiment, the DNN for removing the compressed artifact may include a convolution layer and an activation layer.
According to an embodiment, the DNN for removing the compressed artifact may further include a batch normalization layer.
According to an embodiment, the first image may include intermediate-frequency information and low-frequency information of the compressed image.
According to an embodiment, the convolution filter may be trained to obtain the first image including the intermediate-frequency information and the low-frequency information of the compressed image.
According to an embodiment, the down-convolution may be a convolution operation trained to reduce the size of the compressed image.
According to an embodiment, the up-convolution may be a convolution operation trained to increase the down-convoluted compressed image to the original size of the compressed image.
According to an embodiment, the down-convolution and the up-convolution may be transposed to each other.
According to another embodiment of the present disclosure, an apparatus for extracting noise of a compressed image includes a memory storing one or more instructions and a processor configured to execute the one or more instructions stored in the memory, in which the processor is further configured to obtain a first image by applying a convolution filter, which sequentially performs down-convolution and up-convolution corresponding to the down-convolution, to a compressed image of an original image, obtain a second image by subtracting the first image from the compressed image, and obtain noise including high-frequency information and a compressed artifact of the compressed image from the second image.
According to another embodiment of the present disclosure, an apparatus for reducing noise of a compressed image includes a memory storing one or more instructions and a processor configured to execute the one or more instructions stored in the memory, in which the processor is further configured to obtain a first image by applying a convolution filter, which sequentially performs down-convolution and up-convolution corresponding to the down-convolution, to a compressed image of an original image, obtain a second image by subtracting the first image from the compressed image, obtain noise including high-frequency information and a compressed artifact of the compressed image from the second image, obtain a third image by removing the compressed artifact by applying a deep neural network (DNN) for removing the compressed artifact to the noise, and reconstruct the compressed image by summing the first image and the third image.
While embodiments of the present disclosure are susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that there is no intent to limit embodiments of the present disclosure to the particular forms disclosed, but conversely, embodiments of the present disclosure are to cover all modifications, equivalents, and alternatives falling within the scope of the disclosure.
In the following description of the present disclosure, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present disclosure unclear. Moreover, a number (e.g., a first, a second, etc.) used in a process of describing the present specification is merely an identification symbol for distinguishing one component from another component.
Moreover, herein, when a component is mentioned as being “connected” or “coupled” to another component, it may be directly connected or directly coupled to the another component, but unless described otherwise, it should be understood that the component may also be connected or coupled to the another component via still another component therebetween.
In the present disclosure, two or more elements expressed as “units”, “modules”, or the like may be combined into one element, or one element may be divided into two or more elements for subdivided functions. Each element described herein may not only perform main functions thereof but also additionally perform some or all functions of other elements, and some main functions of each element may be exclusively performed by another element.
As used herein, the term ‘image’ or ‘picture’ may refer to a still image, a moving image including a plurality of consecutive still images (or frames), or a video.
Herein, a ‘deep neural network (DNN)’ is a representative example of an artificial neural network model simulating a brain nerve, and is not limited to an artificial neural network model using a specific algorithm.
Herein, a ‘parameter’ may be a value used in an operation process of each layer constituting a neural network, and may include, for example, a weight used in application of an input value to a predetermined calculation formula. The parameter may be expressed in the form of a matrix. The parameters are values set as a result of training, and may be updated based on separate training data when necessary.
Herein, a ‘convolution filter’ means a trained filter used to obtain an image having intermediate-frequency and low-frequency information of a compressed image by sequentially performing down-convolution and up-convolution on the compressed image, and is a type of a DNN in that two convolution operations are performed and trained, and a ‘DNN for removing an AI compressed artifact’ means a DNN used to remove a compressed artifact of an image having high-frequency information and the compressed artifact of the compressed image.
Herein, ‘filter setting information’ includes the above-described parameter as information related to the convolution filter. The convolution filter may be set using the filter setting information.
Herein, ‘DNN setting information’ may include the above-described parameter as information related to an element constituting the DNN. The DNN for removing the AI compressed artifact may be set using the DNN setting information.
Herein, the ‘original image’ means an image subject to image compression, the ‘compressed image’ means an image generated as a result of image compression with respect to the original image, and a ‘first image’ means an image to which a convolution filter that sequentially performs down-convolution and up convolution is applied. A ‘second image’ means an image obtained by subtracting the first image from the compressed image, a ‘third image’ means an image obtained by removing an AI compressed artifact from the second image in a compressed artifact removal process, and a ‘reconstructed image’ means an image obtained by summing the first image with the third image.
Herein, ‘subtraction’ means element-wise subtraction that subtracts a pixel value per pixel of an image, and ‘summation’ means element-wise summation that sums pixel values per pseudo pixel.
Herein, the ‘compressed artifact’ means an error occurring due to damage of a vicinity of an edge and a texture in image compression, and the compressed artifact generally includes a block artifact, a ringing artifact, etc.
In a compressed image generated by image compression of the original image, noise including a compressed artifact occurring in compression is generated, such that a scheme is required to effectively separate and extract the noise and effectively reduce the compressed artifact included in the noise.
The compressed image includes a region including structural information (low-frequency band information and intermediate-frequency band information) having a high probability of reconstruction and a high-frequency noise region (high-frequency band information and the compressed artifact) having a high damage probability.
As shown in
The convolution filter 110 may be a filter that sequentially performs down-convolution 115 and up-convolution 125 corresponding thereto, and may be trained to obtain the first image 135 including low-frequency and intermediate-frequency information of the compressed image 105 using the compressed image 105 as an input. That is, the first image 135 having the low-frequency and intermediate-frequency information may be obtained as a result of applying the convolution filter 110, and the first image 135 may be subtracted from the compressed image 105 to obtain the second image 145 having high-frequency information and a compressed artifact of the compressed image 105. A convolution operation of the convolution filter will be described with reference to
That is, the artifact-reducing DNN 130 may be trained to obtain the third image 155 from which the compressed artifact is reduced, using the second image 145 having the high-frequency information and the compressed artifact of the compressed image 105 as an input. That is, the third image 155 from which the compressed artifact is reduced as a result of applying the artifact-reducing DNN 130 may be obtained, and the first image 135 and the third image 155 may be summed to obtain the reconstructed image 165 from which the compressed artifact is reduced, from the compressed image 105. A structure of the artifact-reducing DNN 130 will be described later with reference to
The method of extracting the noise including the compressed artifact and reducing the compressed artifact may be performed several times. Moreover, the method may also be performed per block of an image or several times per block of the image. Moreover, when removal of the compressed artifact is required after image compression, the method of extracting the noise including the compressed image and reducing the compressed artifact may be applied to any stage.
When the artifact is reduced based on the AI without separating the compressed image into the image including the low-frequency and intermediate-frequency information and the image including the compressed artifact, the compressed artifact that needs to be reduced and an artifact that needs to be maintained may not be distinguished from each other, such that the compressed artifact may not be separately reduced. However, as described with reference to
As shown in
By obtaining the output feature map 230 with a reduced size compared to the input feature map 210 through the down-convolution 115, input information corresponding to a high-frequency region is lost as much as deleted pixels, but thereafter, a feature map size may increase through the up-convolution 125 to reconstruct the number of pixels.
That is, the output feature map 230 of a small dimension with the high-frequency information lost may be obtained while reducing the input feature map 210 having large-dimension information to a small dimension through a down-convolution operation using a trained filter kernel, and the output feature map 260 with the original large dimension may be obtained while expanding the small-dimension input feature map 240 to a large dimension through an up-convolution operation using a trained filter kernel corresponding to the down-convolution operation.
Thus, the first image 135 including the low-frequency and intermediate-frequency information of the compressed image 105 input to the convolution filter may be obtained.
According to an embodiment, the convolution filter may be a DNN including one convolution layer that performs down-convolution and one convolution layer that performs up-convolution.
By using a trained convolution filter that sequentially performs a convolution operation that reduces a size and transposed convolution that enlarges a size rather than a fixed filter using size reduction and size enlargement of an interpolation scheme, an image of a noise region including the high-frequency information and the compressed artifact may be effectively separated from the compressed image as a result of subtracting a convolution-filtered image from the compressed image. By applying the image of the noise region separated in this way to the DNN for removing the compressed artifact described later with reference to
The convolution operations described with reference to
As shown in
Feature maps generated as the convolution result may be input to a batch normalization layer that calculates an average and a standard deviation for each feature and normalizes them in the unit of a batch. A batch normalization method performs normalization and learning in the unit of a mini batch to improve the speed of learning. Herein, a batch may mean a part of data.
Feature maps output from the first convolution layer 305 may be input to the first batch normalization layer 310 and then to a first activation layer 315.
In another example, batch normalization, which is one option, may be immediately input to the first activation layer 315 without performing batch normalization on the feature maps output from the first convolution layer 305.
The first activation layer 315 may give a non-linear feature to each feature map. The first activation layer 315 may include a sigmoid function, a tangent (tanh) function, a rectified linear unit (ReLU) function, etc., but the present disclosure is not limited thereto.
Giving the non-linear feature to the first activation layer 315 may mean changing and outputting the first convolution layer 305 or a partial sample value of a feature map, which is an output of the first convolution layer 305 and the first batch normalization layer 310. In this case, the change may be performed by applying the non-linear feature.
The first activation layer 315 may determine whether to transmit output sample values of feature maps to a second convolutional layer 320. For example, some of the sample values of the feature maps may be activated by the first activation layer 315 and transmitted to the second convolutional layer 320, and other sample values may be inactivated by the first activation layer 315 and not transmitted to the second convolutional layer 320. Among features of the second image 145 where the features maps appear, a feature corresponding to the compressed artifact may be inactivated and a feature not corresponding to the compressed artifact may be enhanced by the first activation layer 315.
The feature maps output from the first activation layer 315 may be input to the second convolutional layer 320. 3×3×1 marked on the second convolutional layer 320 shown in
In another example, batch normalization, which is one option, may be immediately input to the second activation layer 330 without performing batch normalization on the feature maps output from the second convolution layer 320.
According to an embodiment, the second activation layer 330 may output the third image 155 by removing the compressed artifact included in the second image 145.
As will be described later with reference to
The convolution process described with reference to
Referring to
The convolution filter 410, the subtracting unit 420, and the filter setting unit 430 may be implemented through one processor. In this case, the convolution filter 410, the subtracting unit 420, and the filter setting unit 430 may be implemented as a dedicated processor, or through a combination of software (S/W) and a general-purpose processor such as an application processor (AP), a central processing unit (CPU), or a graphic processing unit (GPU). The dedicated processor may include a memory to implement an embodiment of the disclosure, or include a memory processor to use an external memory.
The convolution filter 410, the subtracting unit 420, and the filter setting unit 430 may be implemented as a plurality of processors. In this case, the convolution filter 410, the subtracting unit 420, and the filter setting unit 430 may be implemented as a combination of dedicated processors, or through a combination of software and a plurality of general-purpose processors such as APs, CPUs, or GPUs. In an embodiment, the convolution filter 410 may be implemented as a first processor, the subtracting unit 420 may be implemented as a second processor that is different from the first processor, and the filter setting unit 430 may be implemented as a third processor that is different from the first processor and the second processor.
The filter setting unit 430 may provide trained filter setting information corresponding to the compressed image 105 to the convolution filter 410 based on a predetermined criterion.
The filter setting unit 430 may store the filter setting information to obtain the first image 135 corresponding to the compressed image 105.
The filter setting information may have been trained to obtain the first image 135 having the low-frequency and intermediate-frequency information of the compressed image 105.
The filter setting information may include the number of filter kernels, the size of a filter kernel, a parameter of the filter kernel, etc.
The convolution filter 410 may sequentially perform down-convolution and up-convolution according to the filter setting information provided from the filter setting unit 430.
The convolution filter 410 may output the first image 135 with the compressed image 105 as an input by using the filter setting information provided from the filter setting unit 430.
The subtracting unit 420 may subtract the first image 135 output through application of the convolution filter 410 from the compressed image 105.
The subtracting unit 420 may output the second image 145 having noise including high-frequency information and a compressed artifact of the compressed image 105, as a result of subtraction.
Referring to
While the noise extracting unit 510 and the image reconstructing unit 550 are shown as separate units in
The noise extracting unit 510 and the image reconstructing unit 550 may be configured as a plurality of processors. In this case, the noise extracting unit 510 and the image reconstructing unit 550 may be implemented as a combination of dedicated processors, or through a combination of S/W and a plurality of general-purpose processors such as APs, CPUs, or GPUs. In an embodiment, the convolution filter 520 and the subtracting unit 530 may be implemented as a first processor, the artifact-reducing DNN 560 and the summing unit 570 may be implemented as a second processor that is different from the first processor, and the filter setting unit 540 and the AI setting unit 580 may be implemented as a third processor that is different from the first processor and the second processor.
The filter setting unit 540 may provide trained filter setting information corresponding to the compressed image 105 to the convolution filter 520 based on a predetermined criterion.
The filter setting unit 540 may store the filter setting information to obtain the first image 135 corresponding to the compressed image 105.
The filter setting information may have been trained to obtain the first image 135 having the low-frequency and intermediate-frequency information of the compressed image 105.
The filter setting information may include the number of filter kernels, the size of a filter kernel, a parameter of the filter kernel, etc.
The convolution filter 520 may sequentially perform down-convolution and up-convolution according to the filter setting information provided from the filter setting unit 540.
The convolution filter 520 may output the first image 135 with the compressed image 105 as an input by using the filter setting information provided from the filter setting unit 540.
The subtracting unit 530 may subtract the first image 135 output using the convolution filter 520 from the compressed image 105.
The subtracting unit 530 may output the second image 145 having noise including high-frequency information and a compressed artifact of the compressed image 105, as a result of subtraction.
The AI setting unit 580 may provide the trained DNN setting information corresponding to the second image 145 to the artifact-reducing DNN 560 according to the predetermined criterion.
The AI setting unit 580 may store the DNN setting information to obtain the third image 155 in which the compressed artifact included in the second image 145 is reduced.
The DNN setting information has been trained to obtain the third image 155 in which the compressed artifact is removed from the second image 145 including the high-frequency information and the compressed artifact of the compressed image 105.
The DNN setting information may include information about at least one of the number of convolution layers included in the artifact-reducing DNN 560, the number of filter kernels for each convolution layer, and a parameter of each filter kernel.
The artifact-reducing DNN 560 may output the third image 155 in which the compressed artifact included in the second image 145 is removed, with the second image 156 as an input by using the DNN setting information provided in the AI setting unit 580.
The summing unit 570 may output the reconstructed image 165 by summing the first image 135 output through the convolution filter 520 of the noise extracting unit 510 with the third image 155 output through the artifact-reducing DNN 560. The first image 135 includes the low-frequency and intermediate-frequency information of the compressed image 105 and the third image 155 includes the high-frequency information of the compressed image 105, such that the summed reconstructed image is an image in which the compressed artifact is removed from the compressed image. Thus, the apparatus 500 for reducing the noise of the compressed image according to an embodiment may obtain the reconstructed image 165 in which the compressed artifact occurring in image compression, included in the compressed image 105, is removed.
In
The original training image 605 may include a still image or a moving image including a plurality of frames. In an embodiment, the original training image 605 may include a luminance image extracted from the still image or the moving image including the plurality of frames. In addition, in an embodiment, the original training image 605 may include a patch image extracted from the still image or the moving image including the plurality of frames. When the original training image 605 includes the plurality of frames, the compressed training image 615, the first training image 635, and a low-frequency and intermediate-frequency training image 630 may also include a plurality of frames. When the plurality of frames of the original training image 605 are subject to the image compression 610, the plurality of frames of the compressed training image 615 may be obtained and the plurality of frames of the first training image 635 may be obtained through the convolution filter 620.
For the image compression 610 of the original training image 605, any one codec of the MPEG-2, H.264, MPEG-4, high efficiency video coding (HEVC), VC-1, VP8, VP9, and AV1.
Referring to
The low-frequency and intermediate-frequency training image 630 may mean an image including the low-frequency and intermediate-frequency information of the compressed training image through the legacy frequency-band filter 625. For training of the convolution filter 620, the low-frequency and intermediate-frequency training image 630 including the low-frequency and intermediate-frequency information of the compressed training image may be obtained.
Before training, the convolution filter 620 may be set by predetermined filter setting information. As training progresses, filter loss information 640 may be determined.
The filter loss information 640 may be determined based on a result of comparison between the first training image 635 and the low-frequency and intermediate-frequency training image 630.
The filter loss information 640 may include at least one of an L1-norm value and an L2-norm value regarding a difference between the first training image 635 and the low-frequency and intermediate-frequency training image 630, a structural similarity (SSIM) value, a peak signal-to-noise (PSNR)-human vision system (HVS) value, a multiscale (MS)-SSIM value, a variance inflation factor (VIF) value, and a video multimethod assessment fusion (VMAF) value. The filter loss information 640 may indicate an extent to which the first training image 635 is similar to the low-frequency and intermediate-frequency training image 630. As the filter loss information 640 decreases, the first training image 635 becomes more similar to the low-frequency and intermediate-frequency training image 630.
The convolution filter 620 may update a parameter to reduce or minimize the filter loss information 640.
Final loss information of the convolution filter 620 may be determined as Equation 1 provided below.
LossCF=a*Filter Loss Information [Equation 1]
In Equation 1, LossCF may indicate final loss information that needs to be reduced or minimized for training of the convolution filter 620 and a may indicate a predetermined weight.
That is, the convolution filter 620 may update parameters in a direction to reduce LossCF of Equation 1. When the parameters of the convolution filter 620 are updated based on LossCF derived in a training process, the first training image 635 obtained based on the updated parameters may differ with the first training image 635 of a previous training process.
In
Herein, the first training image 635 may result from applying the convolution filter 620, which has been trained, to the compressed training image 615. This is because the convolution filter 620 is trained for effective separation into the first image 135 having the low-frequency and intermediate-frequency information and the second image 145 having the high-frequency information and the compressed artifact, and the AI artifact-reducing DNN 715 is trained for effective removal of the compressed artifact of the second image 145. That is, the convolution filter 620 and the AI artifact-reducing DNN 715 have to be independently trained.
The original training image 605 may include a still image or a moving image including a plurality of frames. In an embodiment, the original training image 605 may include a luminance image extracted from the still image or the moving image including the plurality of frames. In addition, in an embodiment, the original training image 605 may include a patch image extracted from the still image or the moving image including the plurality of frames. When the original training image 605 includes the plurality of frames, the compressed training image 615, the first training image 635, the second training image 710, and the third training image 720 may also include a plurality of frames. When the plurality of frames of the original training image 605 are subject to the image compression 610, the plurality of frames of the compressed training image 615 are obtained, the plurality of frames of the first training image 635 are obtained through the convolution filter 620 that has been trained, the plurality of frames of the second training image 710 are obtained through the subtraction 705 of the first training image 635 from the compressed training image 615, the plurality of frames of the third training image 720 are obtained through the AI artifact-reducing DNN 715, and the plurality of frames of the reconstructed training image 730 are obtained through the summation 725 of the first training image 635 and the third training image 720.
Before training, the AI artifact-reducing DNN 715 may be set with predetermined DNN setting information. As training progresses, reduced loss information 740 may be determined.
The reduced loss information 740 may be determined based on a result of comparison between the original training image 605 and the reconstructed training image 730.
The reduced loss information 740 may include at least one of an L1-norm value and an L2-norm value regarding a difference between the original training image 605 and the reconstructed training image 730, an SSIM (structural similarity) value, a PSNR-HVS (peak signal-to-noise-human vision system) value, an MS-SSIM (multiscale-SSIM) value, a VIF (variance inflation factor) value, and a VMAF (video multimethod assessment fusion) value. The reduced loss information 740 may indicate an extent to which the reconstructed training image 730 is similar to the original training image 605. As the reduced loss information 740 decreases, the reconstructed training image 730 becomes more similar to the original training image 605. That is, the reconstructed training image with the reduced compressed artifact occurring due to image compression of the original training image may be obtained.
The AI artifact-reducing DNN 715 may update a parameter to reduce or minimize the reduced loss information 740.
Final loss information of the AI artifact-reducing DNN 715 may be determined as Equation 2 below.
LossAR=b*Reduced Loss Information [Equation 2]
In Equation 2, LossAR may indicate final loss information that needs to be reduced or minimized for training of the AI artifact-reducing DNN 715 and b may indicate a predetermined weight.
That is, the AI artifact-reducing DNN 715 may update parameters in a direction to reduce LossAR of Equation 2. When the parameters of the AI artifact-reducing DNN 715 are updated based on LossAR derived in the training process, the third training image 720 obtained based on the updated parameters may differ with the third training image 720 of the previous training process.
In operation S810, the apparatus 400 for extracting noise of a compressed image may obtain a first image by applying a convolution filter, which sequentially performs down-convolution and up-convolution corresponding to the down-convolution, to a compressed image of an original image.
According to an embodiment, the first image may include intermediate-frequency information and low-frequency information of the compressed image.
According to an embodiment, the convolution filter may be trained to obtain the first image including the intermediate-frequency information and the low-frequency information of the compressed image, with the compressed image as an input.
According to an embodiment, the down-convolution may be a convolution operation trained to reduce the size of the compressed image.
According to an embodiment, the up-convolution may be a convolution operation trained to increase the down-convoluted compressed image to the original size of the compressed image.
According to an embodiment, the down-convolution and the up-convolution may be transposed to each other.
In operation S830, the apparatus 400 for extracting the noise of the compressed image may obtain a second image by subtracting the first image from the compressed image.
In operation S850, the apparatus 400 for extracting the noise of the compressed image may obtain noise including high-frequency information and a compressed artifact of the compressed image from the second image.
In operation S910, the apparatus 500 for reducing noise of a compressed image may obtain a first image by applying a convolution filter, which sequentially performs down-convolution and up-convolution corresponding to the down-convolution, to a compressed image of an original image.
According to an embodiment, the first image may include intermediate-frequency information and low-frequency information of the compressed image.
According to an embodiment, the convolution filter may be trained to obtain the first image including the intermediate-frequency information and the low-frequency information of the compressed image, with the compressed image as an input.
According to an embodiment, the down-convolution may be a convolution operation trained to reduce the size of the compressed image.
According to an embodiment, the up-convolution may be a convolution operation trained to increase the down-convoluted compressed image to the original size of the compressed image.
According to an embodiment, the down-convolution and the up-convolution may be transposed to each other.
In operation S930, the apparatus 500 for reducing the noise of the compressed image may obtain a second image by subtracting the first image from the compressed image.
In operation S950, the apparatus 500 for reducing the noise of the compressed image may obtain noise including high-frequency information and a compressed artifact of the compressed image from the second image.
In operation S970, the apparatus 500 for reducing the noise of the compressed image may remove the compressed artifact by applying a DNN for removing the compressed artifact to the noise, thereby obtaining a third image.
According to an embodiment, the DNN for removing the compressed artifact may be trained to reduce the compressed artifact with high-frequency information and the compressed artifact of the compressed image as an input.
According to an embodiment, the DNN for removing the compressed artifact may include a convolution layer and an activation layer.
According to an embodiment, the DNN for removing the compressed artifact may further include a batch normalization layer.
In operation S990, the apparatus 500 for reducing the noise of the compressed image may reconstruct the compressed image by summing the first image and the third image.
Meanwhile, the above-described embodiments of the disclosure may be written as a program or instruction executable on a computer, and the program or instruction may be stored in a medium.
The medium may continuously store an executable program or instruction or temporarily store the same for execution or downloading. The medium may include various recording means or storage means in a form of single hardware or a combination of several hardware, and may be distributed over a network without being limited to a medium directly connected to a certain computer system. Examples of the medium may include a magnetic medium such as a hard disk, a floppy disk, and a magnetic tape, an optical recording medium such as a CD-ROM and a DVD, a magneto-optical medium such as a floptical disk, ROM, RAM, flash memory, etc., to store a program instruction. Other examples of the medium may include a recording medium or a storage medium managed by an app store that distributes applications, a site that supplies or distributes various software, a server, etc.
Meanwhile, the model associated with the DNN described above may be implemented as a software module. When implemented as a software module (e.g., a program module including an instruction), a DNN model may be stored on a computer-readable readable recording medium.
The DNN model may be integrated in the form of a hardware chip so as to be a part of the apparatus 400 for extracting noise of a compressed image or the apparatus 500 for reducing noise of a compressed image. For example, the DNN model may be made in a dedicated hardware chip form for artificial intelligence, or as a part of a conventional universal processor (e.g., a CPU or AP) or a graphical dedicated processor (e.g., a GPU).
In addition, the DNN model may be provided in the form of downloadable software. The computer program product may include a product (e.g., a downloadable application) in the form of a software program electronically distributed electronically through a manufacturer or an electronic market. For the electronic distribution, at least a part of the software program may be stored in a storage medium or temporarily generated. In this case, the storage medium may be a server of the manufacturer or the electronic market, or a storage medium of a relay server.
While the disclosure has been particularly shown and described with reference to embodiments of the present disclosure, it will be understood by one of ordinary skill in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as defined by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2020-0058331 | May 2020 | KR | national |
10-2020-0158043 | Nov 2020 | KR | national |
This application is a Continuation Application of International Application PCT/KR2021/006074 filed on May 14, 2021, which claims benefit of Korean Patent Application No. 10-2020-0058331 filed on May 15, 2020, at the Korean Intellectual Property Office, and Korean Patent Application No. 10-2020-0158043 filed on Nov. 23, 2020, at the Korean Intellectual Property Office, the disclosures of which are incorporated herein in their entireties by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/KR2021/006074 | May 2021 | US |
Child | 17987582 | US |