Various embodiments relate to an image processing device for performing deconvolution for up-scaling an image, and an operation method therefor.
With the development of computer technology, data traffic has increased exponentially and artificial intelligence has become an important trend leading future innovation. The artificial intelligence emulates a human's way of thinking, and thus is practically and unlimitedly applicable to all industries. Representative technologies of artificial intelligence include pattern recognition, machine learning, an expert system, a neural network, natural language processing, and the like.
The neural network is obtained by modeling characteristics of the human's biological neurons by mathematical expression, and uses an algorithm emulating the human ability of learning. Through this algorithm, the neural network may generate mapping between input data and output data, and such a capability of generating the mapping may be expressed as a learning capability of the neural network. Also, the neural network has a generalization capability of generating correct output data for input data that was not used for learning, based on a learning result.
In a convolution neural network (CNN) or the like, a deconvolution layer may be used to generate an output image having a size greater than an input image. However, when an image is up-scaled by a non-integer multiple by using the deconvolution layer, the number of kernels applied to a deconvolution operation is higher than that when the image is up-scaled by an integer multiple. Accordingly, a memory required for the deconvolution operation is increased, and because parallel processing is unable to be performed, a throughput is increased and an operation speed is decreased.
Various embodiments may provide an image processing device capable of reducing a required throughput and memory when performing deconvolution for up-scaling by a non-integer multiple, and an operation method therefor.
In an image processing device according to an embodiment, the number of kernels required for a deconvolution operation for up-scaling by a non-integer multiple can be reduced.
In an image processing device according to an embodiment, a throughput and memory required when performing deconvolution for up-scaling by a non-integer multiple can be reduced, and an operation speed can be increased.
An image processing device according to an embodiment can up-scale an image in various sizes (resolutions) by performing a deconvolution operation for up-scaling by a non-integer multiple.
An image processing device according to an embodiment can generate an up-scaled image of high quality by performing a deconvolution operation for up-scaling by a non-integer multiple.
An image processing device according to an embodiment includes: a memory storing one or more instructions; and a processor configured to execute the one or more instructions stored in the memory to: obtain location change feature information based on values of pixels included in a first image and relative location relationships of the pixels included in the first image and pixels included in a second image; and generate the second image obtained by up-scaling the first image by k times, by performing a convolution operation on the location change feature information and a kernel, wherein k is a real number.
k may be a real number that is not an integer.
Values of weights included in the kernel may not change according to locations of values of samples included in the location change feature information on which the convolution operation is performed.
The processor may be further configured to execute the one or more instructions to obtain a value of one pixel included in the second image by using values of n pixels among a plurality of pixels included in the first image, and obtain the location change feature information based on relative location relationships of the n pixels with respect to a location of the one pixel.
When the first image and the second image have a same size, locations of the pixels included in the first image may be indicated by i and locations of the pixels included in the second image may be indicated by j, wherein i is an integer and j is a real number, and the processor may be further configured to execute the one or more instructions to: determine kj by performing a rounding-down operation on a value of an index indicated by the one pixel, wherein kj is an integer, and determine uj that is a difference between j and kj; and obtain the location change feature information based on kj and uj.
A distance between adjacent pixels included in the first image may be 1.
The n pixels may be determined based on a value of the index j indicated by the one pixel.
The values of samples included in the location change feature information may be represented as a linear expression for uj.
A number of weights included in the kernel may be determined based on a value of n.
An operation method of an image processing device, according to an embodiment, includes: obtaining location change feature information based on values of pixels included in a first image and relative location relationships of the pixels included in the first image and pixels included in a second image; and generating the second image obtained by up-scaling the first image by k times, by performing a convolution operation on the location change feature information and a kernel, wherein k is a real number.
The terms used in the specification will be briefly defined, and the present disclosure will be described in detail.
All terms including descriptive or technical terms which are used herein should be construed as having meanings that are obvious to one of ordinary skill in the art. However, the terms may have different meanings according to the intention of one of ordinary skill in the art, precedent cases, or the appearance of new technologies. Also, some terms may be arbitrarily selected by the applicant, and in this case, the meaning of the selected terms will be described in detail in the detailed description of the present disclosure. Thus, the terms used herein have to be defined based on the meaning of the terms together with the description throughout the specification.
When a part “includes” or “comprises” an element, unless there is a particular description contrary thereto, the part may further include other elements, not excluding the other elements. In addition, terms such as “unit” and “module” described in the specification denote a unit that processes at least one function or operation, which may be implemented in hardware or software, or implemented in a combination of hardware and software.
Hereinafter, embodiments will be described in detail with reference to the accompanying drawings such that one of ordinary skill in the art may easily implement the present disclosure. However, the present disclosure may be implemented in various different forms and is not limited to embodiments described herein. Also, in the drawings, parts irrelevant to the description are omitted in order to clearly describe the present disclosure, and like reference numerals designate like elements throughout the specification.
Referring to
The neural network 20 may include one or more deconvolution layers, and each deconvolution layer may perform a deconvolution operation 30 on a kernel and an image (input) input to the deconvolution layer and generate an output image (output) as a result of the deconvolution operation 30.
The deconvolution operation 30 may be generally used to up-scale an input image in a convolution neural network (CNN), and for example, may be used in various fields, such as super-resolution image generation, auto-encoder, and style transfer. However, the present disclosure is not limited thereto.
A size of the up-scaled second image (output) generated as a result of the deconvolution operation 30 is greater than a size of the first image (input).
Meanwhile, there may be a case where the first image is to be up-scaled by a real number multiple (a non-integer multiple) instead of an integer multiple.
Referring to
Here, when an image of a 4K resolution, which is outputtable on the first display 210, is to be output on the second display 220, the image of the 4K resolution may need to be up-scaled by 9/8 times.
As such, when an image is up-scaled by a real number multiple (a non-integer multiple) instead of an integer multiple by using a deconvolution operation, a number of kernels required for the deconvolution operation may be greater than when the image is up-scaled by an integer multiple. This will be described with reference to
(a) of
For example, a value of a fourth pixel P4 included in the second image HR is determined based on values of a first pixel P1 and a second pixel P2 included in the first image LR, and the value of the fourth pixel P4 may be calculated by performing a convolution operation applying a first kernel K0 on the values of the first pixel P1 and second pixel P2. Here, the first kernel K0 includes values of weights applied to the values of the first pixel P1 and second pixel P2. The values of weights may be determined based on relative locations of the first pixel P1 and second pixel P2 with respect to the fourth pixel P4, but are not limited thereto.
Also, a value of a fifth pixel P5 included in the second image HR is determined based on values of first to third pixels P1 to P3 included in the first image LR, and the value of the fifth pixel P5 may be calculated by performing a convolution operation applying a second kernel K1 to the values of the first to third pixels P1 to P3. Here, the second kernel K1 includes values of weights applied to the values of the first to third pixels P1 to P3. The values of weights may be determined based on relative locations of the first to third pixels P1 to P3 with respect to the fifth pixel P5, but are not limited thereto.
Meanwhile, referring to (a) of
Similarly, for each of fifth and seventh pixels P5 and P7, relative locations of a corresponding pixel and pixels used to calculate a value of the corresponding pixel are the same, and thus a kernel (for example, the second kernel K1) applied to calculate the fifth pixel P5 and a kernel (for example, the second kernel K1) applied to calculate the seventh pixel P7 are the same.
Accordingly, when the first image LR is up-scaled by 2 times, a number of required kernels is 2, and when the first image LR is up-scaled by k1 times (k1 is an integer), the number of required kernels is K1.
(b) of
For example, a value of a ninth pixel P9 included in the second image HR is determined based on the values of the first pixel P1 and second pixel P2 included in the first image LR, and the value of the ninth pixel P9 may be calculated by performing a convolution operation applying a third kernel Ki on the values of the first pixel P1 and second pixel P2. Here, the third kernel Ki includes the values of weights applied to the values of the first pixel P1 and second pixel P2. The values of weights may be determined based on relative locations of the first pixel P1 and second pixel P2 with respect to the ninth pixel P9.
Also, a value of a tenth pixel P10 may be calculated by performing a convolution operation applying a fourth kernel Ki+1 on the values of first pixel P1 and second pixel P2 included in the first image LR. Values of weights included in the fourth kernel Ki+1 may be determined based on relative locations of the first pixel P1 and second pixel P2 with respect to the tenth pixel P10.
Meanwhile, referring to (b) of
For example, when the first image LR is up-scaled by 1001/1000 times, a number of kernels required for a deconvolution operation is 1001, and accordingly, a memory and throughput required for the deconvolution operation are highly increased and an operation speed is decreased.
Referring to
For example, when the first image and the second image have a same size and a distance of adjacent pixels included in the first image is 1, locations of the pixels of the first image may be represented in an integer and locations of pixels of the second image may be represented in a real number. For example, the pixels of the first image may be represented in xi (i=0, 1, 2, 3, . . . , n), wherein an index i may indicate a coordinate value (location) of a corresponding pixel. The pixels of the second image may be represented in yj (j=real number), wherein j may indicate a coordinate value (location) of a corresponding pixel.
Referring to
For example, yj=B0(u)*xi−1+B1(u)*xi+B2(u)*xi+1+B3(u)*xi+2.
Here, when the first image and the second image have the same size and a point corresponding to the pixel yj included in the second image is located between the pixels xi and xi+1 of the first image, u may be a distance between points corresponding to the pixels xi and yj. Referring to
Meanwhile, the curves B0(u), B1(u), B2(u), and B3(u) may be represented as a matrix B(u) as shown in
y
j
=<B(u),1>F [Equation 1]
Here, an operation of a sign <A, B>F denotes a convolution operation. In other words, values obtained by multiplying elements at same locations in a matrix A and a matrix B are added.
In the matrix B of
Referring to
K(u,θ)=θU [Equation 2]
Here, the matrix Θ 510 and the matrix U may each be represented as shown in
Also, a deconvolution operation for up-scaling the first image may be represented as Equation 3 below.
y
j
=<K
j
,I
i>F=<ΘUj,1>F=<Θ,Uj·IT> [Equation 3]
In Equation 3, a matrix Ii indicates values of pixels of the first image used to calculate values of pixels yj included in a second image. Also, a location of a matrix Uj may be changed such as to be operated with a matrix Ii instead of the matrix Θ 510.
Accordingly, a matrix Kj (=ΘU) is a matrix indicated by a function of uj and is a space-variant matrix changing according to a value of uj, but the matrix Θ 510 is a space-invariant matrix not changing according to the value of uj. Also, because an order may change in a convolution operation, the values of pixels yj of the second image may be represented as Equation 4 below.
y
j
=<U
j
·I
i
T,θ>F [Equation 4]
Meanings of parameters in Equation 4 will be described in detail below with reference to
Referring to
As described in
Also, the matrix Ii is differently determined according to the index j of the pixel yj. For example, the index i is determined according to the index j, and the matrix Ii=[xi−1, xi, xi+1, xi+2] is determined according to the index i. For example, when the index j is 1.2 and the index i is 1, I1=[x0, xi, x2, x3]. However, the matrix Ii may be differently configured according to a number of elements included in the matrix Ii.
Meanwhile, the matrix Θ is determined via training, and may be identically applied regardless of the index j of the value of the pixel yj to be calculated, or uj.
The image processing device 100 according to an embodiment may obtain pieces of location change feature information 740 based on values of pixels included in a first image 710 and relative location relationships of the pixels included in the first image 710 and pixels included in a second image 720.
For example, the image processing device 100 may generate pieces of transform feature information 730 by using the values of pixels included in the first image 710. As shown in
The pieces of transform feature information 730 according to an embodiment may include four pieces of transform feature information (first transform feature information Xk−1, second transform feature information Xk, third transform feature information Xk+1, and fourth transform feature information Xk+2. A method of generating the transform feature information 730 will be described in detail with reference to
Referring to
The pieces of transform feature information 730 may be generated to have a same size as the second image 720. For example, when the size of the second image 720 is W×H, the size of the pieces of transform feature information 730 may also be W×H.
Values of samples included in each of the pieces of transform feature information 730 may be determined based on a location (index) of a pixel of the second image 720 corresponding to each sample. For example, a first pixel 840 of the second image 720 may have a value of an index j0 and a value k0 may be determined by performing a rounding-down operation on j0. Accordingly, values of samples S11 to S41 at locations corresponding to the first pixel 840 may be determined based on the value k0.
For example, a value of a first sample S11 of first transform feature information 831 is determined to be a value of a pixel xk0-1 of the first image 710 having an index of k0−1, a value of a second sample S21 of second transform feature information 832 is determined to be a value of a pixel xk0 of the first image 710 having an index of k0, a value of a third sample S31 of third transform feature information 833 is determined to be a value of a pixel Xk0+1 of the first image 710 having an index of k0+1, and a value of a fourth sample S41 of fourth transform feature information 834 is determined to be a value of a pixel Xk0+2 of the first image 710 having an index of k0+2.
In the same manner, the image processing device 100 may obtain first to fourth transform feature information 831 to 834 having a same size as the second image 720.
Referring back to
Also, the image processing device 100 may obtain the second image 720 by up-scaling the first image 710 by a non-integer multiple, by performing a convolution operation on the pieces of location change feature information 740 and a kernel 750. Here, values of weights included in the kernel 750 may not change according to locations of values of samples included in the pieces of location change feature information 740 on which the convolution operation is performed. A number of the weights included in the kernel 750 may be determined based on the number n of pixels included in the first image 710 used to calculate a value of one pixel included in the second image 720.
For example, a value of the pixel yj0 of the second image 720 up-scaled as the deconvolution operation of
y
j0=(θ00uj03x0+θ01uj03x1+θ02uj03x2+θ03uj03x3)+(θ10uj03x0+θ11uj03x1+θ12uj03x2+θ13uj03x3)+(θ20uj03x0+θ21uj03x1+θ22uj03x2+θ23uj03x3)+(θ30uj03x0+θ31uj03x1+θ32uj03x2+θ33uj03x3) [Equation 5]
Referring to
The variables vj, pj, and qj may be represented as a function for uj as indicated in Equation 6, and may be represented as a nonlinear function for uj.
Here, WS denotes a size of a first image and WT denotes a size of a second image.
For example, Equation 5 may be modified as Equation 7 below by using relationships between uj and the variables vj, pj, and qj of Equation 6.
y
j0=(θ′00uj03x0+θ′01uj03x1+θ′02uj03x2+θ′03uj03x3)+(θ′10uj03x0+θ′11uj03x1+θ′12uj03x2+θ′13uj03x3)+(θ′20uj03x0+θ′21uj03x1+θ′22uj03x2+θ′23uj03x3)+(θ′30uj03x0+θ′31uj03x1+θ′32uj03x2+θ′33uj03x3) [Equation 7]
Also, vj, pj, and qj may be represented as Equation 8 below.
v
j
=u
j−1
p
j
=u
j+1
q
j
=u
j+2 [Equation 8]
By using Equations 7 and 8, yj+1 may be represented in a general equation, such as Equation 9 below.
y
j+1=(θ′00uj3xi+θ′01uj+13xi+1+θ′02uj+23xi+2+θ′03uj+33xi+3)+(θ′10uj3xi+θ′11uj+13xj+1+θ′12uj+23xi+2+θ′13ui+33xj+3)+(θ′20uj3xi+θ′21uj+13xi+1+θ′22uj+23xi+2+θ′23uj+33xi+3)+(θ′30uj3xi+θ′31uj+13xi+1+θ′32uj+23xi+2+θ′33uj+33xi+3) [Equation 9]
The image processing device 100 may generate pieces of location change feature information 1040 from a first image 1010 by multiplying uj3, uj2, uj1, and uj0 respectively to the transform feature information 1030 (). Also, the image processing device 100 may obtain a second image 1020 in which the first image 1010 is up-scaled by a non-integer multiple by performing a convolution operation on the pieces of location change feature information 1040 and a kernel 1050. Here, values of weights included in the kernel 1050 may not change according to locations of values of samples included in the pieces of location change feature information 1040 on which the convolution operation is performed. A number of the weights included in the kernel 1050 may be determined based on a number n of pixels included in the first image 1010 used to calculate a value of one pixel included in the second image 1020.
For example, a value of a pixel yj1 of the second image 1020 up-scaled as the deconvolution operation of
y
j1=(θ′00uj03x0+θ′01uj13x1+θ′02uj23x2+θ′03uj33x3)+(θ′10uj03x0+θ′11uj13x1+θ′12uj23x2+θ′13uj33x3)+(θ′20uj03x0+θ′21uj13x1+θ′22uj23x2+θ′23uj33x3)+(θ′30uj03x0+θ′31uj13x1+θ′32uj23x2+θ′33uj33x3) [Equation 10]
Referring to
The image processing device 100 according to an embodiment obtains a value of one pixel in the second image by using values of n pixels among the plurality of pixels included in the first image. Here, relative location relationships of the n pixels in the first image with respect to a location of the one pixel in the second image may be obtained.
For example, when the first image and the second image have a same size and a distance between adjacent pixels in the first image is 1, locations of the pixels included in the first image may be indicated by an index i (i=integer) and locations of pixels included in the second image may be indicated by an index j (j=real number). For example, x0 and x1 may denote values of pixels located at coordinates 0 and 1 in the first image, respectively, and y0.2 may denote a value of a pixel located at a coordinate 0.2 in the second image.
The image processing device 100 may determine k by performing a rounding-down operation on a value of the index j indicated by the pixel included in the second image, and determine uj that is a difference between kj and the index j indicated by the pixel. For example, in a case of y0.2, kj may be determined to be 0 and uj may be determined to be 0.2.
The image processing device 100 according to an embodiment may obtain location change feature information based on kj, uj, and the number of pixels of the first image used to obtain the value of one pixel of the second image. Because a method of obtaining the location change feature information has been described in detail with reference to
The image processing device 100 may generate the second image in which the first image is up-scaled by k times (k is a real number) by performing a convolution operation on the location change feature information and a kernel, in operation S1120.
Here, k may denote a real number that is not an integer, and a size of the kernel (for example, a number of weights included in the kernel) may be determined based on the number n of the pixels in the first image used to obtain the value of one pixel in the second image. Values of the weights included in the kernel do not change according to locations of values of samples included in the location change feature information on which the convolution operation is performed.
Referring to
Also, the processor 120 according to an embodiment may control overall operations of the image processing device 100. The processor 120 according to an embodiment may execute one or more programs stored in the memory 130.
The memory 130 according to an embodiment may store various types of data, programs, or applications for driving and controlling the image processing device 100. The program stored in the memory 130 may include one or more instructions. The program (one or more instructions) or application stored in the memory 130 may be executed by the processor 120.
The processor 120 according to an embodiment may obtain relative location relationships between pixels included in a first image and pixels included in a second image, and obtain location change feature information based on the obtained location relationships and values of the pixels included in the first image.
The processor 120 according to an embodiment obtains a value of one pixel in the second image by using values of n pixels among the plurality of pixels included in the first image. Here, relative location relationships of n pixels in the first image with respect to a location of the one pixel in the second image may be obtained. For example, when the first image and the second image have a same size and a distance between adjacent pixels in the first image is 1, locations of the pixels included in the first image may be indicated by an index i (i=integer) and locations of pixels included in the second image may be indicated by an index j (j=real number).
The processor 120 may determine k by performing a rounding-down operation on a value of the index j indicated by the pixel included in the second image, and determine uj that is a difference between kj and the index j indicated by the pixel. For example, in a case of y0.2, kj may be determined to be 0 and uj may be determined to be 0.2.
The processor 120 according to an embodiment may obtain location change feature information based on kj, uj, and the number of pixels of the first image used to obtain the value of one pixel of the second image. Because a method of obtaining the location change feature information has been described in detail with reference to
The processor 120 may generate the second image in which the first image is up-scaled by k times (k is a real number) by performing a convolution operation on the location change feature information and a kernel. Here, k may denote a real number that is not an integer, and a size of the kernel (for example, a number of weights included in the kernel) may be determined based on the number n of the pixels in the first image used to obtain the value of one pixel in the second image. Values of the weights included in the kernel do not change according to locations of values of samples included in the location change feature information on which the convolution operation is performed.
Also, the values of weights included in the kernel according to an embodiment are values determined via training, and the values of weights determined via training may be stored in the memory 130.
Meanwhile, the block diagram of the image processing device 100 of
An operation method of an image processing device, according to an embodiment, may be recorded on a computer-readable recording medium by being implemented in a form of program commands executed by using various computers. The computer-readable recording medium may include at least one of a program command, a data file, and a data structure. The program commands recorded in the computer-readable recording medium may be specially designed or well known to one of ordinary skill in the computer software field. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, and hardware devices specially configured to store and perform program commands, such as read-only memory (ROM), random-access memory (RAM), and flash memory. Examples of the computer command include mechanical codes prepared by a compiler, and high-level languages executable by a computer by using an interpreter.
Furthermore, the image processing device and the operation method of the image processing device, according to embodiments, may be provided by being included in a computer program product. The computer program products are products that can be traded between sellers and buyers.
The computer program product may include a software program or a computer-readable storage medium storing a software program. For example, the computer program product may include a product (for example, a downloadable application) in a form of a software program that is electronically distributable through a manufacturer of the electronic device or an electronic market (for example, Google PlayStore™ or AppStore™). For electronic distribution, at least a part of the software program may be stored in the storage medium or temporarily generated. In this case, the storage medium may be a storage medium of a server of a manufacturer, a server of an electronic market, or a relay server that temporarily stores the software program.
The computer program product may include a storage medium of a server or a storage medium of a client apparatus in a system including the server and the client apparatus. Alternatively, when there is a third device, e.g., a smartphone, that communicates with the server or the client apparatus, the computer program product may include a storage medium of the third device. Alternatively, the computer program product may include the software program transmitted from the server to the client apparatus or the third device, or transmitted from the third device to the client apparatus.
In this case, one of the server, the client apparatus, and the third device may perform a method according to embodiments of the disclosure by executing the computer program product. Alternatively, two or more of the server, the client apparatus, and the third device may execute the computer program product to perform the method according to the embodiments of the disclosure in a distributed fashion.
For example, the server, for example, a cloud server or an artificial intelligence server, may execute the computer program product stored in the server to control the client apparatus communicatively connected to the server to perform the method according to the embodiments.
While the embodiments have been particularly shown and described in detail, it will be understood by one of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure as defined by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2019-0057605 | May 2019 | KR | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/KR2020/004596 | 4/3/2020 | WO |