APPARATUS FOR IMPROVING IMAGE QUALITY AND METHOD THEREOF

Information

  • Patent Application
  • 20240029222
  • Publication Number
    20240029222
  • Date Filed
    December 16, 2022
    a year ago
  • Date Published
    January 25, 2024
    4 months ago
Abstract
The present disclosure provides an apparatus for improving image quality including an image rendering module configured to generate a first data buffer and a second data buffer; an artificial neural network module configured to receive and learn the first data buffer and the second data buffer generated from the image rendering module; and a self-supervised loss calculation module that includes a first self-supervised loss function using pixel values of the first corrected image and the independent and correlated images of the second data buffer and a second self-supervised loss function using pixel values of the second corrected image and the independent and correlated images of the first data buffer, and a method for improving image quality.
Description
CROSS-REFERENCE TO RELATED APPLICATION

This application claims priority to Korean Provisional Patent Application No. 10-2022-0090740, filed on Jul. 22, 2022, and Korean Patent Application No. 10-2022-0092050, filed on Jul. 25, 2022, and all the benefits accruing therefrom under U.S.C. § 119, the disclosure of which is incorporated by reference herein in its entirety.


BACKGROUND
1. Field

The present disclosure relates to an image quality improving apparatus and an image quality improving method. The present disclosure relates to an apparatus and method for improving quality of a rendered image upon rendering an image.


2. Description of the Related Art

As a method for generating an image through photorealistic rendering, Monte Carlo path tracing method is widely known. Since Monte Carlo path tracing method uses a statistical method, noise exists in the rendered image. In addition, many samples are required to obtain good quality, which takes a long time. The image provided in this way may be said to be an independent image since there is no inter-pixel correlation in the image.


A method of reducing noise of an independent image has been devised over a long period of time. This method is a method of reducing noise on the assumption that correlation between pixels does not exist. When an image having correlation between pixels is used as an input, some existing error in the image cannot be effectively removed.


The image provided in this way may be said to be a correlated image since there is some correlation between pixels.


Methods for providing a correlated image include a post-processing denoising technique (Non-Patent Documents 1, 4, and 5), a correlated sampling technique (Non-Patent Document 2), and a light transport simulation method introducing some inter-pixel correlation (Non-Patent Document 3).


The correlated image may reduce the noise generated by the independent image but may not prevent method-specific residual noise or systematic errors.


RELATED ART DOCUMENT
Non-Patent Document

(Non-Patent Document 0001) Document 1: Bitterli et al, “Nonlinearly Weighted First-order Regression for Denoising Monte Carlo Renderings”, Computer Graphics Forum 35, 4 (2016), 107-117


(Non-Patent Document 0002) Document 2: Sadeghi et al, “Coherent path tracing”, Journal of Graphics, GPU, and GameTools 14, 2 (2009), 33-43


(Non-Patent Document 0003) Document 3: Hachisuka et al, “Progressive Photon Mapping”, ACM Trans Graph 27, 5, Article 130 (2008), 8 pages


(Non-Patent Document 0004) Document 4: Kalantari et al, “A Machine Learning Approach for Filtering Monte Carlo Noise”, ACM Trans Graph 34, 4, Article 122 (2015), 12 pages


(Non-Patent Document 0005) Document 5: Bako et al, “Kernel-Predicting Convolutional Networks for Denoising Monte Carlo Renderings”, ACM Trans Graph 36, 4, Article 97 (2017), 14 pages


(Non-Patent Document 0006) Document 6: Xu et al, “Adversarial Monte Carlo Denoising with Conditioned Auxiliary Feature Modulation”, ACM Trans Graph 38, 6, Article 224 (2019), 12 pages


(Non-Patent Document 0007) Document 7: Yu et al, “Monte Carlo Denoising via Auxiliary Feature Guided Self-Attention”, ACM Trans Graph 40, 6, Article 273 (2021), 13 pages


It is to be understood that this background of the technology section is intended to provide useful background for understanding the technology and as such disclosed herein, the technology background section may include ideas, concepts or recognitions that were not portion of what was known or appreciated by those skilled in the pertinent art prior to a corresponding effective filing date of subject matter disclosed herein.


SUMMARY

In view of the above, the present disclosure provides an apparatus for improving image quality and a method for improving image quality capable of reducing noise of an independent image and an error of a correlated image.


According to embodiments of the present disclosure, an apparatus for improving image quality includes: an image rendering module configured to generate an independent image, a correlated image, and an auxiliary feature into a first data buffer and a second data buffer; an artificial neural network module configured to receive and learn the first data buffer and the second data buffer generated from the image rendering module; a combination module configured to generate a first corrected image of the first data buffer and a second corrected image of the second data buffer that are output from the artificial neural network module; and a self-supervised loss calculation module configured to calculate self-supervised losses of the first data buffer and the second data buffer, including a first self-supervised loss function using pixel values of the first corrected image and the independent and correlated images of the second data buffer and a second self-supervised loss function using pixel values of the second corrected image and the independent and correlated images of the first data buffer.


The image rendering module may generate an auxiliary feature including at least one of normal information, texture information, and visibility information.


The artificial neural network module may calculate parameters used in Equations 4 and 5 for outputting post-corrected images of the first data buffer and the second data buffer.










w
i

=




Equation


5









{







exp

(


-



log
e

(

1
+





y
c

-

y
i




2


)




(

γ
c
y

)

2

+
ε



-



log
e

(

1
+





z
c

-

z
i




2


)




(

γ
c
z

)

2

+
e



)

×

,






if


i


c

,








exp

(


-






ρ
c

-

ρ
i




2




(

γ
c
ρ

)

2

+
e



-






n
c

-

n
i




2




(

γ
c
n

)

2

+
ε


-



(


v
c

-

v
i


)

2




(

γ
c
v

)

2

+
ε



)



τ
c


,



otherwise



.







    • γcy, γcz, γcρ, γcn, γcv denote bandwidth parameters for an independent and correlated pixel pair (y and z) and the auxiliary feature including albedo ρ, normal n, and visible buffer v at pixel c. τc denotes a center weight for the pixel c.





The combination module outputs a post-corrected image for each buffer by combining the first data buffer and the second data buffer using Equation 4 which is a combination function.











g
c

(

y
,
z

)

=




Equation


4
















i


Ω
c





w
i



{


y
i

+


β
c
z



(


z
c

-

z
i


)


+


β
c
ρ



(


ρ
c

-

ρ
i


)


+


β
c
n



(


n
c

-

n
i


)



}









i


Ω
c





w
i






Symbol custom-character denotes a Hadamard product, ρc and nc denote the albedo and normal values of size 3×1 at the pixel c, and βcz, βcρ, βcn of size 3×1 denote scale parameters at the pixel c that control a relative importance of zc-zi, ρci, and nc-ni.


The first self-supervised loss function is represented by Equation 10.











(


μ
^

c
a

)


=







μ
^

c
a

-

y
c
b




2




(


z
_

c
b

)

2

+
0.01






Equation


10







The artificial neural network module updates a learning parameter θ in the artificial neural network module for generating a post-corrected estimate to minimize the self-supervised loss output from the self-supervised loss calculation module.


According to other embodiments of the present disclosure, a method for improving image quality includes: generating an independent image, a correlated image, and an auxiliary feature into a first data buffer and a second data buffer; calculating parameters for outputting post-corrected images of the first data buffer and the second data buffer; outputting a post-corrected image for each buffer by using the first data buffer and the second data as a combination function; calculating a self-supervised loss using a first self-supervised loss function and a second self-supervised loss function for the post-corrected image; and updating a learning parameter to minimize the first self-supervised loss function and the second self-supervised loss function.


There is an advantage of improving image quality according to the present disclosure.


According to the present disclosure, it is possible to further improve image quality during rendering by faithfully reflecting specific image features.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a diagram illustrating an apparatus for improving image quality according to an embodiment of the present disclosure.



FIG. 2 is a diagram illustrating a method for improving image quality according to an embodiment of the present disclosure.



FIG. 3 is a diagram illustrating the effects of the apparatus for improving image quality and the method for improving image quality according to the embodiment of the present disclosure.



FIG. 4 is a diagram comparing denoised estimates and post-corrected results obtained by applying a self-supervised learning method according to the present disclosure to the denoised estimates.



FIG. 5 is a view comparing a result of applying the conventional combination module to a denoising technique (e.g., AFGSA) and post-corrected results obtained by applying the self-supervised learning method according to the present disclosure to a result of applying the conventional combination module.



FIG. 6 is a diagram illustrating numerical convergence of denoised estimates and the post-corrected results obtained by applying the method for improving image quality according to the present disclosure to the denoised estimates for various scenes.





DETAILED DESCRIPTION

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. The spirit of the present disclosure is not limited to the embodiments presented below, and those skilled in the art who understand the spirit of the present disclosure will be able to easily propose other embodiments included within the scope of the same spirit by supplementing, changing, deleting, and adding components, which will also fall within the scope of the spirit of the present disclosure.


Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. The same or similar components will be denoted by the same reference numerals independent of the drawing numerals, and an overlapping description for the same or similar components will be omitted. In addition, the terms “module” and “unit” for components used in the following description are used only to easily explain the disclosure. Therefore, these terms do not have meanings or roles that are distinguished from each other. Further, when it is decided that a detailed description for the known art related to the present disclosure may obscure the gist of the present disclosure, the detailed description will be omitted. Further, it should be understood that the accompanying drawings are provided only in order to allow exemplary embodiments of the present disclosure to be easily understood, and the spirit of the present disclosure is not limited by the accompanying drawings, but includes all the modifications, equivalents, and substitutions included in the spirit and the scope of the present disclosure.


Terms including ordinal numbers such as “first,” “second,” and the like, may be used to describe various components. However, these components are not limited by these terms. The terms are used only to distinguish one component from another component.


It is to be understood that when one component is referred to as being “connected to” or “coupled to” another component, one component may be connected directly to or coupled directly to another component or be connected to or coupled to another component with the other component interposed therebetween. On the other hand, it should be understood that when one element is referred to as being “connected directly to” or “coupled directly to” another element, it may be connected to or coupled to another element without the other element interposed therebetween.


Artificial intelligence (AI) refers to the field of researching artificial intelligence or the methodologies capable of generating the artificial intelligence, and machine learning refers to the field of researching methodologies to define and solve various problems dealt with in the field of artificial intelligence. The machine learning is also defined as an algorithm that improves the performance of a task through continuous experience.


An artificial neural network (ANN) is a model used in machine learning, and may refer to all models having problem-solving ability, which is composed of artificial neurons (nodes) that form a network by coupling synapses. The artificial neural network may be defined by a connection pattern between neurons of different layers, a learning process for updating learning parameters, and an activation function for generating an output value.


The artificial neural network may include an input layer, an output layer, and optionally one or more hidden layers. Each layer includes one or more neurons, and the artificial neural network may include neurons and synapses connecting neurons. In an artificial neural network, each neuron can output a function of an activation function for input signals, weights of each layer, and biases input through synapses.


The learning parameters of the model refer to parameters determined through learning and include weights of synaptic connections and biases of neurons. In addition, the hyperparameter refers to a parameter that should be set before learning in a machine learning algorithm, and includes a learning rate, an iteration count, a mini-batch size, and an initialization function.


The purpose of learning the artificial neural network may be seen as determining the learning parameters that minimize the loss function. The loss function may be used as an index to determine an optimal learning parameter in the learning process of the artificial neural network.


The machine learning may be classified into supervised learning, unsupervised learning, and reinforcement learning according to the learning method.


The supervised learning refers to a method of learning an artificial neural network in a state in which a label is given for learning data, and the label may refer to a correct answer (or result value) that the artificial neural network should infer when the learning data is input to the artificial neural network. The unsupervised learning may refer to a method of learning an artificial neural network in a state in which a label is not given for learning data. The reinforcement learning may mean a learning method of learning an agent defined in a certain environment to select a sequence of actions or actions that maximize a cumulative reward in each state.


Among the artificial neural networks, machine learning implemented as a deep neural network (DNN) including a plurality of hidden layers is sometimes referred to as deep learning, and the deep learning is a part of the machine learning. Hereinafter, the machine learning is used as a meaning including the deep learning.


The artificial intelligence may be performed by an artificial neural network module (refer to 2 in FIG. 1).


The present disclosure is characterized by using the independent image and the correlated image together to provide an image.


The present disclosure is characterized by combining the independent image and the correlated image to provide an image.



FIG. 1 is a diagram illustrating an apparatus for improving image quality according to an embodiment of the present disclosure.


Referring to FIG. 1, according to an embodiment of the present disclosure, an apparatus for improving image quality includes: an image rendering module configured to generate an independent image, a correlated image, and an auxiliary feature into a first data buffer and a second data buffer; an artificial neural network module configured to receive and learn the first data buffer and the second data buffer generated from the image rendering module; a combination module configured to generate a first corrected image of the first data buffer and a second corrected image of the second data buffer that are output from the artificial neural network module; and a self-supervised loss calculation module configured to calculate self-supervised losses of the first data buffer and the second data buffer, including a first self-supervised loss function using pixel values of the first corrected image and the independent and correlated images of the second data buffer and a second self-supervised loss function using pixel values of the second corrected image and the independent and correlated images of the first data buffer.


An image rendering module 1 generates an independent image 11, a correlated image 12, and an auxiliary feature 13 into a first data buffer and a second data buffer.


The independent image may be an image without correlation between pixels in the image. The correlated image may be an image with some inter-pixel correlation and may be an image to which a denoising technique is applied. Each pixel of the image may include RGB information.


The image rendering module 1 may generate auxiliary feature 13 of each image. The auxiliary feature 13 may include normal information. The normal information may be normal vector information of a surface of an image. In addition, the auxiliary feature 13 may include texture information and/or visibility information. The auxiliary feature 13 may be calculated in advance by the process of providing the independent image 11 and the correlated image 12. The auxiliary feature 13 may be utilized as additional information such as boundary identification of an object in an image. In this way, it is possible to obtain the result with much improved quality.


The image rendering module 1 generates a first data buffer and a second data buffer using two pairs of images including independent pixel estimates ya and yb containing some noise and denoised correlated pixel estimates za and zb in independent image 11 and correlated image 12. According to Equations 1 and 2 below, the correlated pixel estimates za and zb may be generated by removing noise from the independent pixel estimates ya and yb.


As a result, the image rendering module 1 generates a first data buffer ya and za and a second data buffer yb and zb independent of each other from the correlated pixel estimates za and zb and the independent pixel estimates ya and yb. In the same way, the auxiliary feature is also included in the first data buffer and the second data buffer independent of each other. The image rendering module 1 inputs the generated first data buffer and second data buffer to the artificial neural network module 2.


The artificial neural network module 2 receives and learns the first data buffer and the second data buffer. The artificial neural network module 2 may include a convolutional neural network (CNN). In addition, the artificial neural network module 2 may perform deep learning on machine learning implemented as a deep neural network (DNN) including a plurality of hidden layers.


In this embodiment, the CNN of the artificial neural network module 2 may process an input image through a plurality of convolutional layers, a maximum pooling layer, a region-of-interest pooling layer, and/or a fully connected layer. The input image may be iteratively processed through each layer of the CNN and output as a convolutional feature map. The CNN may be a Zeiler and Fergus model, a Simonyan and Zisserman model, or a convolutional neural network model.


As an embodiment of the present invention, the artificial network module 2 includes nine layers, and each layer uses a convolution filter with a 3×3 kernel. A final layer of the artificial neural network module 2 uses 15 filters (that is, the number of parameters βc, γc, and τc) and other layers use 16 filters.


As an activation function of the final layer, a tanh function may be used.


In the final layer, the tanh function for βc and softplus for γc and τc are used, and ReLU is used in the other layers.


An apparatus for improving image quality using a self-supervised learning framework is implemented using Tensorflow.


The image rendering module 1 extracts 128×128 patches from an input color and an auxiliary feature for training the artificial neural network module 2 within runtime.


The artificial neural network module 2 is trained for 20 epochs using an Adam optimizer.


The artificial neural network module 2 may calculate the parameters βc, γc, and τc for correcting the pixel estimate by learning the first data buffer and the second data buffer.


In addition, the artificial neural network module 2 may obtain the optimal parameters βc, γc, and τc that may minimize a self-supervised loss.


The combination module 3 may include a combination function. The combination module 3 outputs a post-corrected image having improved quality by the combination function with the parameters βc, γc, and τc corresponding to the first data buffer and the second data buffer.


The combination module 3 may use a weight w, when combining pixels of each image.


Accordingly, the combination module 3 outputs a post-corrected combined image 4 by combining the independent image and the correlated image.


Hereinafter, it will be described in detail that the combination module 3 combines the first data buffer and the second data buffer using the combination function.


A model for predicting a pixel having no correlation with other pixels may be given by Equation 1.






y
ccc  Equation 1


Here, y refers to independent pixel estimates, c refers to a target pixel, μ, refers to ground truth, and E refers to error.


Equation 1 above may be a model for an independent image. The method of Equation 1 above may be used for modeling an image provided by a path tracing method.


A model for predicting a pixel having some correlation with other neighboring pixels may be given by Equation 2.






z
c
−z
ic−μici  Equation 2


Here, z refers to the correlated pixel estimates, c refers to a target pixel, i refers to a neighboring pixel centered at the target pixel, μ, refers to a ground truth, and εci refers to a correlated error of ground truth values at two pixels c and i. Therefore, zc-zi may refer to the difference between the correlated pixel estimates at pixel c and pixel i.


The method of Equation 2 may be applied to various methods in which correlation in images exist. The correlated image may be an image from which noise has been removed by a denoising technique.


The combination module 3 may combine the independent image and the correlated image using Equations 1 and 2.


By combining the independent image and the correlated image, the information of the two images can be considered together.


The independent image and the correlated image may be combined by being added with their respective weights wi.


The independent image and the correlated image may be combined by being added and then divided by a common weight.


Equation 3 is a combination function combining Equations 1 and 2 and presents an example of pixel estimate {circumflex over (μ)}c=fc(y,z) as a weighted average of the independent pixel estimates yi and the difference zc-zi between the correlated pixel estimates.











f
c

(

y
,
z

)

=


1







i


Ω
c





w
i





{





i


Ω
c





w
i



y
i



+




i


Ω
c





w
i

(


z
c

-

z
i


)



}






Equation


3







Here, Ωc denotes a local window including a set of neighboring pixels centered at a target pixel c. wi denotes a weight of the neighboring pixel i included in the local window Ωc centered at the target pixel c.


The weight w, may be a weight for the sum of independent pixel estimates yi of any neighboring pixel centered at the target pixel c and a difference zc-zi between the correlated pixel estimates at pixel c and pixel i.


In Equations 1 and 2, it is preferable that the errors εc and εci are smaller. For example, assuming that expected values of the errors εc and εci are 0, image quality can be improved by finding weights.


Accordingly, an image with improved quality may be obtained by finding a weight that makes the pixel estimate given in Equation 3 smaller.


According to Equation 3, both the independent pixel estimates yc and yi at the target pixel c and the adjacent pixel i and the correlated pixel estimates zc and zi at the target pixel c and the adjacent pixel i may be utilized. The independent pixel estimate and correlated pixel estimate may be a value included in the independent image and the correlated image.


The combination module 3 generates the pixel estimate as a weighted average of the independent color yi and the difference zc-zi between the correlated colors within a local window Ωc centered at the pixel c.


Since a variance of yi and zc-zi may vary locally, the pixel estimate is controlled by the weight wi that should be adjusted per pixel.


Meanwhile, the combination module 3 may generate the pixel estimate by the combination function defined by Equation 4 using a smaller number of learning parameters, thereby preventing overfitting and reducing training time.


Equation 4 is the expression of Equation 3 again as follows.











g
c

(

y
,
z

)

=




Equation


4
















i


Ω
c





w
i



{


y
i

+


β
c
z



(


z
c

-

z
i


)


+


β
c
ρ



(


ρ
c

-

ρ
i


)


+


β
c
n



(


n
c

-

n
i


)



}









i


Ω
c





w
i






The symbol custom-character is the Hadamard product. βcz∘(zc−zi)+βcρ∘(ρc−ρi)+βcn∘(nc−ni) denotes a substitution of zc-zi in Equation 3 using albedo and normal buffers ρ and n.


In particular, ρc and nc denote the albedo and normal values of size 3×1 at the pixel c, and βcz, βcρ, βcn denote scale parameters at pixel c controlling the relative importance of zc-zi, ρci, and nc-ni.


The artificial neural network module 2 calculates the parameters βcz, βcρ, βcn for compensating for approximation errors using rendering information, for example, the albedo and the normal values, and outputs the calculated parameters to the combination module 3.


The weight wi of Equation 4 is defined as Equation 5 in a cross-bilateral form.










w
i

=




Equation


5









{







exp

(


-



log
e

(

1
+





y
c

-

y
i




2


)




(

γ
c
y

)

2

+
ε



-



log
e

(

1
+





z
c

-

z
i




2


)




(

γ
c
z

)

2

+
e



)

×

,






if


i


c

,








exp

(


-






ρ
c

-

ρ
i




2




(

γ
c
ρ

)

2

+
e



-






n
c

-

n
i




2




(

γ
c
n

)

2

+
ε


-



(


v
c

-

v
i


)

2




(

γ
c
v

)

2

+
ε



)



τ
c


,



otherwise



.







    • γcy, γcz, γcρ, γcn, γcv denote bandwidth parameters for an independent and correlated pixel pair (y and z) and the auxiliary feature that includes the albedo ρ, the normal n, and the visibility buffer v at pixel c. τc denotes the center weight for pixel c. As a result, the combination function gc(y,z) for the post-correction in Equation 4 requires one set of scale parameters βc ≡{βcz, βcρ, βcn}, bandwidth parameters γc≡{γcy, γcz, γcρ, γcn, γcv}, and a center weight τc at pixel c.





As a result, the artificial neural network module 2 generates a parameter set βc, γc, and τc per pixel through the learning parameter θ in the module.


The combination module 3 outputs a post-corrected image by generating a combined image for each buffer using Equation 4, which is the combination function, for the first data buffer and the second data buffer.


Table 1 shows the post-corrected results to which Equation 4 and Equation 5 is applied or is not applied.














TABLE 1





Scenes

text missing or illegible when filed

Setting A
Setting B
Setting C
Setting D








text missing or illegible when filed

KPCN
0.002903
0.001015
0.002200
0.001003



AFGSA

text missing or illegible when filed


text missing or illegible when filed


text missing or illegible when filed


text missing or illegible when filed




text missing or illegible when filed

KPCN

text missing or illegible when filed


text missing or illegible when filed


text missing or illegible when filed


text missing or illegible when filed




AFGSA

text missing or illegible when filed


text missing or illegible when filed


text missing or illegible when filed


text missing or illegible when filed




text missing or illegible when filed

KPCN

text missing or illegible when filed


text missing or illegible when filed


text missing or illegible when filed


text missing or illegible when filed




AFGSA

text missing or illegible when filed


text missing or illegible when filed


text missing or illegible when filed


text missing or illegible when filed




text missing or illegible when filed

KPCN

text missing or illegible when filed


text missing or illegible when filed


text missing or illegible when filed


text missing or illegible when filed




AFGSA
0.007512
0.0968.45
0.010412
0.006721






text missing or illegible when filed indicates data missing or illegible when filed







Referring to Table 1, the self-supervised corrected result by the cross-bilateral weights (settings B and D in Table 1) according to Equations 4 and 5 is more accurate than the results without weights (settings A and C in Table 1). The combination function (AFGSA for Dragon) according to the present disclosure having the cross-bilateral weight (setting D) shows better performance than the combination function (setting B) of Equation 3 having the same weighting system except for the case of (AFGSA for Dragon) by the additional use of the auxiliary feature for bias compensation. The best result and the second best result are shown as indicated by brown and cyan.


The conventional apparatus for improving image quality learns in advance the learning parameters in the artificial neural network module 2 that minimizes the loss due to the loss function as in Equation 6.











(


μ
^

c

)


=







μ
^

c

-

μ
c




2




μ
_

c
2

+
0.01






Equation


6







In Equation 6, μc denotes the ground truth value at the pixel c, and μc denotes the average value of the ground truth value μc.


The loss function as in Equation 6 relies on the ground truth (μc) for the independent pixel c.


The apparatus for improving image quality according to the embodiment of the present disclosure may train the artificial neural network module 2 without relying on the ground truth μc.


The self-supervised loss calculation module 5 according to the embodiment of the present disclosure includes a first self-supervised loss function and a second self-supervised loss function. The self-supervised loss calculation module 5 calculates the self-supervised loss by using the first self-supervised loss function and the second self-supervised loss function for the post-corrected image output from the combination module 3, thereby enabling the combination module 3 to output an optimal corrected image without performing learning in advance in the neural network module 2.


Accordingly, when the self-supervised loss calculation module 5 calculates the self-supervised loss by the first self-supervised loss function and the second self-supervised loss function, the artificial neural network module 2 can update the learning parameter minimizing the self-supervised loss, and thus, performs the training for each input y and z without the ground truth p within the runtime.


The first self-supervised loss function and the second self-supervised loss function of the self-supervised loss calculation module 5 according to the present disclosure use only the test input without using the actual error value ∥{circumflex over (μ)}c−μc2 to predict the expected value of the actual error value E∥{circumflex over (μ)}c−μc2.


Hereinafter, the first self-supervised loss function and the second self-supervised loss function will be described in detail.


The self-supervised loss calculation module 5 receives the independent and correlated images of the first data buffer and the second data buffer output from the image rendering module 1.


In addition, the self-supervised loss calculation module 5 receives the post-corrected pixel estimates {circumflex over (μ)}ca and {circumflex over (μ)}cb of the first data buffer and the second data buffer output from the combination module 3.


The self-supervised loss optimizing the correction process of the first data buffer may be defined by Equation 7 to be defined.











(


μ
^

c
a

)


=


E







μ
^

c
a

-

μ
c





2







μ
_

c
2

+
0.01






Equation


7







The expected value of this self-supervised loss function may be simplified as in Equation 6.






E∥{circumflex over (μ)}
c
a−μc2≈∥{circumflex over (μ)}ca−ycb2−1T{circumflex over (σ)}2(ycb)  Equation 8


Here, {circumflex over (σ)}2(ycb) denotes an unbiased estimate of the variance, i.e., the sample variance of the pixel color ycb, and 1 denotes a 3×1 vector in which each element is composed of 1.


Accordingly, if Equation 8 is substituted into Equation 7, the self-supervised loss of the first data buffer is expressed by Equation 9.











(


μ
^

c
a

)











μ
^

c
a

-

y
c
b




2

-


1
T





σ
^

2

(

y
c
b

)






μ
_

c
2

+
0.01






Equation


9







Since this self-supervised loss function relies on the average value of the ground truth, which is still the reference value, the unknown value should be predicted.


For this estimation, it is desirable to select values that are statistically independent of the estimate because variance-related items 1T{circumflex over (σ)}2(ycb)/(μc2+0.01) may be ignored. An intensity ycb or zcb of the pixel values of the independent and correlated image of the second data buffer is preferably substituted for the intensity of unknown ground truth. The correlated image generally has lower errors than the independent image.


As a result, when the intensity zcb of the pixel value of the correlated image is substituted for the unknown value, the first self-supervised loss function for calculating the self-supervised loss for the first data buffer may be expressed by Equation 10.











(


μ
^

c
a

)


=







μ
^

c
a

-

y
c
b




2




(


z
_

c
b

)

2

+
0.01






Equation


10







As a result, the first self-supervised loss function uses the post-corrected estimate of the first data buffer, the intensity of the correlated image of the second data buffer zcb, and the pixel value of independent image of the second data buffer ycb.


Also, in the same manner, the second self-supervised loss function for the second data buffer may be defined using the independent pixel value and the correlated pixel value of the first data buffer instead of the unknown estimated value in the same manner.


The self-supervised loss calculation module 5 outputs the self-supervised loss of the first data buffer and the self-supervised loss of the second data buffer calculated by the first self-supervised loss function and the second self-loss function to the artificial neural network 2.


The artificial neural network module 2 optimizes the learning parameter θ of the artificial neural network module that receives the self-supervised loss output from the self-supervised loss calculation module 5 and generates the post-corrected estimates {circumflex over (μ)}a and {circumflex over (μ)}b.


The optimal learning parameter θ* of the artificial neural network module 2 is defined by Equation 11.










θ
*

=



arg

min

θ



1

3

N







c
=
1

N


0.5

(




^

(


μ
^

c
a

)

+



^

(


μ
^

c
b

)


)








Equation


11







Here, N denotes the number of pixels in an input image.


The artificial neural network module 2 may update the learning parameter defined by Equation 11 by minimizing the first self-supervised loss function and the second self-supervised loss function according to Equation 10.


The learning parameter of Equation 11 may make an optimized factor of the denoising filter presented in Non-Patent Document 4. In addition, within the necessary range, the content of non-patent Document 4 shall be included in the present disclosure. Of course, the first self-supervised loss function and the second self-supervised loss function may obtain more accurate information by additionally using the auxiliary feature 13.


A method for improving image quality according to an embodiment of the present disclosure will be described as follows. FIG. 2 is a diagram illustrating a method for improving image quality according to an embodiment of the present disclosure.


Referring to FIG. 2, a method for improving image quality according to an embodiment of the present disclosure generates the first data buffer and the second data buffer with the independent image 11, the correlated image 12, and the auxiliary feature 13 (S1).


Next, parameters for correcting pixel estimate are calculated by learning the first data buffer and the second data buffer (S2).


Next, the first data buffer and the second data buffer are combined using Equation 5, which is the cross bilateral weight, and Equation 4, which is the combination function, to generate combined images for each buffer and output a post-corrected final image (S3).


Next, the self-supervised loss is calculated for the post-corrected image by using the first self-supervised loss function and the second self-supervised loss function defined by Equation 10 (S4).


Here, the first self-supervised loss function is calculated using the post-corrected estimate of the first data buffer and the pixel values of the independent and correlated images of the second data buffer.


The second self-supervised loss function is calculated using the post-corrected estimate of the second data buffer and the pixel values of the independent and correlated images of the first data buffer.


Next, the learning parameter defined by Equation 11 is updated to minimize the first self-supervised loss function and the second self-supervised loss function according to Equation 10 (S5).


Referring to FIG. 1, as an embodiment of the present invention, the artificial network module 2 includes nine layers, and each layer uses a convolution filter with a 3×3 kernel.


The final layer of the artificial neural network module 2 uses 15 filters (that is, the number of parameters βc, γc, and τc) and other layers use 16 filters.


As the activation function of the final layer, the tanh function may be used.


In the final layer, the tanh function for βc and the softplus for γc and τc are used, and the ReLU is used in the other layers.


The apparatus and method for improving image quality using a self-supervised learning framework are implemented using the Tensorflow.


The image rendering module 1 extracts 128×128 patches from the input color and the auxiliary feature for training the artificial neural network module 2 within runtime.


The artificial neural network module 2 is trained for 20 epochs using the Adam optimizer.


The learning rate is set by the following Equation.






0.01



1

3

N







c
=
1

N



1
T





σ
^

2

(

y
c

)









Here, {circumflex over (σ)}2(yc) denotes an estimated variance using independent pixel estimates of the first data buffer and the second data buffer (i.e., {circumflex over (σ)}2(yc)=(yca−ycb)∘(yca−ycb)/4).


A batch size is set to 16 and a Xavier uniform initializer is used. A size |Ωc| of the local window size is set to 19×19.



FIG. 3 is a diagram illustrating the effects of the apparatus for improving image quality and the method for improving image quality according to the embodiment of the present disclosure.


Referring to FIG. 3, the apparatus for improving image quality and the method for improving image quality according to the embodiment of the present disclosure prevent overfitting to noise and show a more accurate post-correction result.


Referring to FIG. 4, as the apparatus and method for improving image quality according to the embodiment of the present disclosure, state-of-the-art denoising methods such as KPCN [Bako et al. 2017] (Non-Patent Document 5), AMCD [Xu et al. 2019] (Non-Patent Document 6), and AFGSA [Yu et al. 2021] (Non-Patent Document 7) were applied for the post-correction. Referring to FIG. 5, The self-supervised learning method according to the present disclosure was further compared with a deep combiner [Back et al. 2020].


Referring to FIG. 3, it can be seen that FIG. 3(c) which is the method for improving image quality according to the present disclosure is further improved than FIG. 3(a) which is the result of applying AMCD and FIG. 3(b) which is the result of applying the conventional combination module to AMCD.


Also, referring to graphs FIGS. 3(e) and 3(f), despite using smaller learning parameters, FIG. 3(c) which is the method for improving image quality according to the present disclosure shows a stronger and more stable post-correction result for overfitting than FIG. 3(b) which is the result of applying the conventional combination module to AMCD.



FIG. 4 is a diagram comparing denoised estimates and post-corrected results obtained by applying a self-supervised learning method according to the present disclosure to the denoised estimates.


Referring to FIG. 4, the latest denoising methods, KPCN, AMCD, and AFGSA, do not always show good denoising results for all test scenes (see FIGS. 4(c), 4(e), and 4(g)). For example, the KPCN generates relatively high-quality results with details in Dragon and Sanmiguel, but some details in Bathroom and Hair are overly blurred. The AFGSA preserves high-frequency information about the Bathroom, but similarly to AMCD, smooths the detailed information excessively for other scenes.


At the runtime, the test images may often differ from the actual training images, so even with extensive pre-training, a learning-based denoising machine is not ideal for all possible scenarios.


The method for improving image quality according to the present disclosure uses invisible data (i.e., test images) to learn a post-corrected artificial neural network within runtime and improves denoising results, for example in case of excessively blurry artifacts.



FIG. 5 is a view comparing the result of applying the conventional combination module to the denoising technique (e.g., AFGSA) and the post-corrected results obtained by applying the self-supervised learning method according to the present disclosure to the result of applying the conventional combination module.


As illustrated in FIG. 5, the combination module (deep combiner (DC)) for post-correction does not effectively restore the lost high-frequency details (see FIG. 5(a)). Because of the flexibility of the method for improving image quality according to the present disclosure, the method for improving image quality according to the present disclosure may take the post-corrected estimate from the combination module as an input and further correct the result (see FIG. 5(b)). The method for improving image quality according to the present disclosure indicates that various kinds of latest denoising techniques may be supplemented.



FIG. 6 is a diagram illustrating numerical convergence of denoised estimates and the post-corrected results obtained by applying the method for improving image quality according to the present disclosure to the denoised estimates for various scenes.


The method for improving image quality according to the present disclosure helps other denoising techniques to generate numerically more accurate results. For example, the AMCD and the AFGSA do not effectively reduce errors even in the case of increasing the sample size. Referring to FIG. 6, it can be seen that the method for improving image quality according to the present disclosure provides more meaningful results when the present disclosure is applied to two results in a situation with a large number of samples. Technically, since the invented self-supervised loss relies on the independent image (e.g., ycb in Equation 10), the loss value may be more accurate in the situation of many samples.


The description of the presented embodiments is provided to allow any person skilled in the art to implement or use the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the scope of the present disclosure. Therefore, the present disclosure is not intended to be limited to the embodiments presented herein, but is to be construed in the widest scope consistent with the principles and novel features presented herein.

Claims
  • 1. An apparatus for improving image quality, comprising: an image rendering module configured to generate an independent image, a correlated image, and an auxiliary feature into a first data buffer and a second data buffer;an artificial neural network module configured to receive and learn the first data buffer and the second data buffer generated from the image rendering module;a combination module configured to generate a first corrected image of the first data buffer and a second corrected image of the second data buffer that are output from the artificial neural network module; anda self-supervised loss calculation module configured to calculate self-supervised losses of the first data buffer and the second data buffer, including a first self-supervised loss function using pixel values of the first corrected image and the independent and correlated images of the second data buffer and a second self-supervised loss function using pixel values of the second corrected image and the independent and correlated images of the first data buffer.
  • 2. The apparatus of claim 1, wherein the image rendering module generates an auxiliary feature including at least one of normal information, texture information, and visibility information.
  • 3. The apparatus of claim 1, wherein the artificial neural network module calculates parameters used in Equations 4 and 5 for outputting post-corrected images of the first data buffer and the second data buffer:
  • 4. The apparatus of claim 1, wherein the combination module outputs a post-corrected image for each buffer by combining the first data buffer and the second data buffer using Equation 4 which is a combination function:
  • 5. The apparatus of claim 1, wherein the first self-supervised loss function is represented by Equation 10.
  • 6. The apparatus of claim 1, wherein the artificial neural network module updates a learning parameter in the artificial neural network module for generating post-corrected estimate to minimize the self-supervised loss output from the self-supervised loss calculation module.
  • 7. The apparatus of claim 6, wherein the optimal learning parameter is defined by Equation 11:
  • 8. The apparatus of claim 6, wherein the learning parameter is updated to minimize the first self-supervised loss function and the second self-supervised loss function.
  • 9. The apparatus of claim 1, wherein the artificial neural network module includes 9 layers, and each layer includes a convolutional neural network (CNN) using a convolutional filter with a 3×3 kernel.
  • 10. A method for improving image quality, comprising: generating an independent image, a correlated image, and an auxiliary feature into a first data buffer and a second data buffer;calculating parameters for outputting post-corrected images of the first data buffer and the second data buffer;outputting a post-corrected image for each buffer by using the first data buffer and the second data as a combination function;calculating a self-supervised loss using a first self-supervised loss function and a second self-supervised loss function for the post-corrected image; andupdating a learning parameter to minimize the first self-supervised loss function and the second self-supervised loss function.
  • 11. The method of claim 10, wherein the first self-supervised loss function is calculated using pixel values of the post-corrected image of the first data buffer and the independent and correlated image of the second data buffer, and the second self-supervised loss function is calculated using pixel values of the post-corrected image of the second data buffer and the independent and correlated image of the first data buffer.
  • 12. The method of claim 10, wherein the weight is defined by Equation 5:
  • 13. The method of claim 10, wherein the combination function is defined by Equation 4:
  • 14. The method of claim 10, wherein the optimal learning parameter is defined by Equation 11:
  • 15. The apparatus of claim 2, wherein the artificial neural network module calculates parameters used in Equations 4 and 5 for outputting post-corrected images of the first data buffer and the second data buffer:
Priority Claims (2)
Number Date Country Kind
10-2022-0090740 Jul 2022 KR national
10-2022-0092050 Jul 2022 KR national