1. Field of the Invention
The present invention relates to an image processing apparatus, an image processing method, and a program, and more particularly, to an image processing apparatus, an image processing method, and a program performing a super resolving process for increasing a resolution of an image.
2. Description of the Related Art
As a method of generating a high resolution image from a low resolution image, a super resolving process is well known. The super resolving process is a process of generating a high resolution image from a low resolution image.
For example, as a super resolving process method, there are the following methods.
(a) Reconstruction Type Super Resolving Method
(b) Learning Type Super Resolving Method
The reconstruction type super resolving method (a) is a method of deriving parameters representing photographing conditions such as “blur caused by lens and atmosphere scattering”, “motion of a subject and the entire camera”, and “sampling by the imaging device” based on the low resolution image which is the photographed image and estimating an ideal high resolution image by using the parameters.
In addition, in the related art, the reconstruction type super resolving method is disclosed in, for example, Japanese Unexamined Patent Application Publication No. 2008-140012.
The overview of the processes of the reconstruction type super resolving method is as follows.
(1) An image photographing model is expressed by the equations by taking into consideration the blur, the motion, the sampling, and the like.
(2) A cost calculation equation is obtained from the image photographing model expressed by the equation model. At this time, in some cases, regularized terms of pre-establishment or the like may be added by using Bayes' theorem.
(3) An image for minimizing the cost is obtained.
The reconstruction type super resolving method is a method of obtaining a high resolution image by using the above processes. In addition, specified processes are described in detail in the front section of the specification of the invention.
Although the high resolution image obtained according to the reconstruction type super resolving method depends on the input image, the super resolving effect (resolution recovering effect) is high.
On the other hand, the learning type super resolving method (b) is a method of performing a super resolving process using learned data which are generated in advance. The learned data are constructed with, for example, transform information for a high resolution image from a low resolution image, or the like. A learned data generating process is performed as a process of comparing an assumed input image (low resolution image) generated through, for example, a simulation or the like with an ideal image (high resolution image) and generating transform information for generating a high resolution image from the low resolution image.
The learned data are generated, and the low resolution image as a new input image is converted into the high resolution image by using the learned data.
In addition, in the related art, the learning type super resolving method is disclosed in, for example, Japanese Patent No. 3321915.
According to the learning type super resolving method, if the learned data are generated, the high resolution image can be obtained as stabilized output results with respect to various input images.
However, with respect to the reconstruction type super resolving method (a), although high performance can be generally expected, there are restrictions such as “A plurality of the low resolution images is necessarily input”, and “There is a limitation in the frequency band of the input image, or the like”. In the case where the input image (low resolution image) which does not satisfy these restrictive conditions may not be obtained, there is a problem in that the reconstruction performance may not be sufficiently obtained and the sufficient high resolution image may not be generated.
On the other hand, with respect to the learning type super resolving method (b), although the restriction caused by the number of the input images and the properties of the input image is low and stabilized, there is a problem in that the peak performance of the finally-obtained high resolution image does not reach the reconstruction type super resolving.
It is desirable to provide an image processing apparatus, an image processing method, and a program capable of implementing a super resolving method using advantages of a reconstruction type super resolving method and a learning type super resolving method.
According to an embodiment of the invention, there is provided an image processing apparatus including a super resolving processor including: a high frequency estimator which generates difference image information between a low resolution image input as a processing object image of a super resolving process and a mid-processing image of the super resolving process or a processed image, that is, an initial image; and a calculator which performs a process of updating the processed image through a process of calculation between the difference image information output from the high frequency estimator and the processed image, wherein the high frequency estimator performs a learning type data process using learned data in the difference image information generating process.
In addition, in the image processing apparatus according to the above embodiment of the invention the high frequency estimator performs the learning type super resolving process in an upsampling process of a downsampling processed image which is converted to have the same resolution as that of the low resolution image through a downsampling process of the processed image constructed with the high resolution images.
In addition, in the image processing apparatus according to the above embodiment of the invention, the high frequency estimator may perform the learning type super resolving process in an upsampling process of the low resolution image input as a processing object image of the super resolving process.
In addition, in the image processing apparatus according to the above embodiment of the invention, the high frequency estimator may perform the upsampling process as a learning type super resolving process using the learned data including data corresponding to feature amount information of a localized image area of the low resolution image and the high resolution image generated based on the low resolution image and image transform information for converting the low resolution image into the high resolution image.
In addition, in the image processing apparatus according to the above embodiment of the invention, the high frequency estimator may perform the learning type super resolving process in an upsampling process on the difference image between a downsampling processed image, which is converted to have the same resolution as that of the low resolution image through a downsampling process of the processed image constructed with the high resolution images, and the low resolution image input as a processing object image of the super resolving process.
In addition, in the image processing apparatus according to the above embodiment of the invention, the high frequency estimator may perform the upsampling process as a learning type super resolving process using the learned data including data corresponding to feature amount information of a localized image area of the difference image between the low resolution image and the high resolution image generated based on the low resolution image and image transform information for converting the difference image into the high resolution difference image.
In addition, in the image processing apparatus according to the above embodiment of the invention, the super resolving processor may have a configuration of performing a resolution converting process by using a reconstruction type super resolving method and performs the learning type super resolving process using the learned data in the upsampling process of the resolution converting process.
In addition, in the image processing apparatus according to the above embodiment of the invention, the super resolving processor may have a configuration of performing the resolution converting process by taking into consideration a blur and a motion of an image and a resolution of an imaging device according to the reconstruction type super resolving method and performs the learning type super resolving process using the learned data in the upsampling process of the resolution converting process.
In addition, in the image processing apparatus according to the above embodiment of the invention, the image processing apparatus may further include a convergence determination portion which performs convergence determination on a calculation result of the calculator, wherein the convergence determination portion performs the convergence determination process according to a predefined convergence determination algorithm and outputs a result corresponding to the convergence determination.
In addition, according to another embodiment of the invention, there is provided an image processing method performed in an image processing apparatus, including the steps of: allowing a high frequency estimator to generate difference image information between a low resolution image input as a processing object image of a super resolving process and a mid-processing image of the super resolving process or a processed image, that is, an initial image; and allowing a calculator to perform a process of updating the processed image through a process of calculation between the difference image information output from the step of allowing the high frequency estimator to generate the difference image information and the processed image, wherein in the step of allowing the high frequency estimator to generate the difference image information, a learning type data process using learned data is performed in the difference image information generating process.
In addition, according to still another embodiment of the invention, there is provided a program allowing an image processing apparatus to perform an image process, including steps of: allowing a high frequency estimator to generate difference image information between a low resolution image input as a processing object image of a super resolving process and a mid-processing image of the super resolving process or a processed image, that is, an initial image; and allowing a calculator to perform a process of updating the processed image through a process of calculation between the difference image information output from the step of allowing the high frequency estimator to generate the difference image information and the processed image, wherein in the step of allowing the high frequency estimator to generate the difference image information, a learning type data process using learned data is performed in the difference image information generating process.
In addition, the program according to the invention is a program which may be provided to, for example, an information processing apparatus or a computer system which can execute various types of program codes by a storage medium or a communication medium which is provided in a computer-readable format. The program is provided in a computer-readable format, so that a process according to the program can be implemented in the information processing apparatus or the computer system.
The other objects, features, and advantages of the invention will be clarified in more detailed description through the later-described embodiments of the invention and the attached drawings. In addition, in the specification, a system denotes a logical set configuration of a plurality of apparatuses, but the apparatus of each configuration is not limited to be in the same casing.
According to a configuration of an embodiment of the invention, there are provided an apparatus and method of generating a high resolution image by performing a process of combination of a reconstruction type super resolving process and learning type super resolving process. According to an embodiment of the invention, difference image information between a low resolution image which becomes a processing object of the super resolving process and a mid-processing image of the super resolving process or a processed image, that is, an initial image is generated, and a process of updating the processed image through a process of calculation between the difference image information and the processed image is performed to generate a high resolution image. In the high frequency estimator which generates the difference image, a learning type super resolving process using a learned data is performed. More specifically, for example, an upsampling process is performed as a learning type super resolving process. According to this configuration, defects of the reconstruction type super resolving process are solved, so that it is possible to generate a high-quality high resolution image.
Hereinafter, an image processing apparatus, an image processing method, and a program according to the invention will be described in detail with reference to the drawings. In addition, the description is made in the following order.
1. Description of Definition of Terminology Used in Description
2. Overview of Super Resolving Method
(2a) Overview of Reconstruction Type Super Resolving Method
(2b) Overview of Learning Type Super Resolving Method
(2c) Problems of Super Resolving Methods
3. Embodiments of Super Resolving Method According to the Invention
(3a) First Embodiment
(3b) Second Embodiment
(3c) Third Embodiment
4. Example of Hardware Configuration of Image Processing Apparatus
First, definitions of terminology used in the following description are described before the description of the invention.
(Input Image)
An input image is an image actually photographed by an imaging device or the like and an image input to an image processing apparatus performing a super resolving process.
The input image is an image which is likely to have deterioration, for example, according to photographing conditions or the like, deterioration at the time of transmitting and recording, or the like. In general, the input image is a low resolution image.
(Output Image)
An output image is an image obtained as a result of performing a super resolving process on the input image in an image processing apparatus. In addition, the output image can be output as a high resolution image obtained by magnifying or reducing the input image with an arbitrary magnification ratio.
(Ideal Image)
An ideal image is an ideal image obtained in the case where quality deterioration and restrictions according to the photographing do not exist in the aforementioned input image. The ideal image is a target high resolution image which is a target to be acquired as a process result of the super resolving process.
(Reconstruction Type Super Resolving Method)
A reconstruction type super resolving method is an example of a method of a super resolving process in the related art. The reconstruction type super resolving method is a method of estimating a high resolution image as an ideal image from photographing conditions such as “blur caused by lens and atmosphere scattering”, “motion of a subject and the entire camera”, and “sampling by the imaging device”.
The reconstruction type super resolving process is configured by the following processes.
(a) An image photographing model is expressed by the equations by taking into consideration the blur, the motion, the sampling, and the like.
(b) A cost equation is obtained from the image photographing model. At this time, in some cases, regularized terms such pre-establishment may be added by using Bayes' theorem.
(c) An image for minimizing the cost is obtained.
Although the result depends on the input image, the super resolving effect (resolution recovering effect) is high.
(Learning Type Super Resolving Method)
The learning type super resolving method is a method of comparing an assumed input image (low resolution image) generated in a simulation or the like with an ideal image (high resolution image), generating the learned data for generating a high resolution image from a low resolution image, and converting a low resolution image as a new the input image to a high resolution image by using the learned data.
Next, in the overview of the super resolving method of converting the low resolution image into the high resolution image, the following two methods are sequentially described.
(2a) Overview of Reconstruction Type Super Resolving Method
(2b) Overview of Learning Type Super Resolving Method
(2a) Overview of Reconstruction Type Super Resolving Method
First, the overview of the reconstruction type super resolving method is described.
The reconstruction type super resolving method is a method of generating one high resolution image by using a plurality of the low resolution images having, for example, a difference in position. An ML (Maximum-Likelihood) method or an MAP (Maximum A Posterior) method is known as a reconstruction type super resolving method.
Hereinafter, the overview of a general MAP method is described.
Herein, the case where n low resolution images are input and a high resolution image is generated is described.
First, a relationship between the low resolution images (gk) obtained in a photographing process of a camera and an ideal image (f) which is an ideal high resolution image is described with reference to
The ideal image (f) 10 may be referred to as an image having a pixel value corresponding to a real environment where a subject is photographed as illustrated in
The images obtained by photographing of the camera are set to the low resolution images (gk) 20 as photographed images. In addition, the low resolution image (gk) 20 becomes the input image with respect to the image processing apparatus performing the super resolving process.
The low resolution image (gk) 20 which is the object of performance of the super resolving process and which is the photographed image may be referred to as an image formed when some portion of image information of the ideal image (f) 10 is lost due to various factors.
As main factors of loss in the image information, there are the following factors illustrated in
Motion (image warping) 11 (=Wk),
Blur 12 (=H),
Camera resolution (Camera Resolution Decimation) 13 (=D),
Noise 14 (=nk)
The motion (Wk) 11 is a motion of the subject or a motion of the camera.
The blur (H) 12 is a blur caused by scattering in the atmosphere, frequency deterioration in an optical system of a camera, or the like.
The camera resolution (D) 13 is a limitation in the sampling data defined by the resolution (the number of pixels) of the imaging device of the camera.
The noise (nk) 14 is other noise, for example, a deterioration in image quality, or the like occurring in a signal process, or the like.
Due to the various factors, the image photographed by the camera becomes a low resolution image (gk) 20.
In addition, k indicates the k-th image among the images continuously photographed by the camera.
The blur (H) 12 and the camera resolution (D) 13 are not the parameters changed according to the photographing timing of the k-th image, but the motion (Wk) 11 and the noise (nk) 14 are the parameters changed according to the photographing timing.
In this manner, the low resolution image (gk) 20 which is the photographed image is image data formed when some portion of the image information of the ideal image (f) 10 is lost due to various factors. The correspondence relationship between the low resolution image (gk) 20 and the ideal image (f) 10 can be expressed by the following equation.
g
k
=DHW
k
f+n
k (Equation 1)
The above equation expresses that the low resolution image (gk) 20 which is the object of the performance of the super resolving process is generated by the deterioration in the motion (Wk), the blur (H), and the camera resolution (D) in the sampling and the addition of the noise (nk) in comparison with the ideal image (f) 10.
In addition, as data representing the input image (gk) and the ideal image (f), data expressing pixel values constituting each image may be used, and various expression can be set.
For example, as illustrated in
The input image (gk) is a vertical vector having the number of elements of L.
The ideal image (f) is a vertical vector having the number of elements of J.
The number of elements corresponds to the number of pixels in one vertical column.
Other parameters have the following configurations.
n: the number of images as input images (low resolution images)
f: an ideal image, a vertical vector (the number of elements J)
gk: a k-th low resolution image, a vertical vector (the number of elements L)
nk: noise (the number of elements L) overlapped with an n-th image
Wk: a matrix (J×J) performing a k-th motion (warping)
H: a blur filter matrix (J×J) expressing deterioration or optical scattering in a high frequency component by a lens
D: a matrix (J×L) expressing sampling by an imaging device
In the above equation (Equation 1), the motion (Wk), the blur (H), and the camera resolution (D) are acquirable parameters, that is, known parameters.
In this case, a process of calculating the ideal image (f) which is a high resolution image may be considered to be a process of calculating an image (f) having the highest probability according to the following equation by using a plurality (n) of the low resolution images (g1) to (gn).
The above equation can be modified by using Bayes' theorem as follows.
Herein, a plurality (n) of the low resolution images (g1) to (gn) are photographed images and known images. Therefore, the denominator Pr(g1, g2, . . . gn) of the above equation (Equation 3) becomes a constant number. Accordingly, the above equation can be expressed by using only the numerator as follows.
Pr(f|g1,g2, . . . gn)=Pr(g1,g2, . . . gn|f)·Pr(f). (Equation 4)
Furthermore, by taking logarithm on the both sides thereof, the above equation (Equation 4) can be modified into the following equation (Equation 5).
log(Pr(f|g1,g2, . . . gn))=log(Pr(g1,g2, . . . gn|f))+log(Pr(f)) (Equation 5)
Through a series of the modifications, the problem in the original equation (Equation 2) can be expressed as follows.
On the other hand, the noise nk of the k-th photographed image gk can be expressed according to the aforementioned equation (Equation 1) as follows.
n
k
=g
k
−DHW
k
f (Equation 7)
Herein, if it is assumed that the noise has a Gauss distribution with a variance σ2, Pr(g1, g2, . . . gn|f) in the aforementioned equation (Equation 6) can be expressed by the following equation (Equation 8).
In addition, under the assumption that the low resolution images (g1) to (gm) input as photographed images are flat images, a preparatory probability of the image is defined as the following equation (Equation 9). Herein, L is the Laplacian operator.
Pr(f))=exp(−α·∥Lf∥2) (Equation 9)
By inserting these, the original problem, that is, the problem of calculation of the ideal image (f) can be defined as a process of obtaining (f) so that the cost, that is, E(f) is minimized in the following equation (Equation 10).
In the cost calculation equation (Equation 10), the (f) for minimizing the cost E(f) can be obtained by using a gradient method.
If f0 is an arbitrary initial value and fm is an image after iteration of m image processes (super resolving processes), the following supper resolution convergence equation (Equation 11) can be defined.
In the supper resolution convergence equation (Equation 11), α is an arbitrary user setting parameter in the image process (super resolving process). T represents a transpose matrix.
According to the above relational equation (Equation 11), the ideal image (f) for minimizing a cost E(f), that is, the high resolution image can be obtained by a gradient method.
The image processing apparatus 110 illustrated in
The image processing apparatus 110 is input with a plurality (n) of the low resolution images g1 to gn and outputs one high resolution image fm.
The g1, g2, . . . gn illustrated in
The initial image generation unit 111 sets an initial value of the super resolving process result. The initial value may be an arbitrary value. In the embodiment, as an example, g1 is input and an image where the g1 is spread is output.
The switch 112 turns to the output side of the initial image generation unit 113 only at the time of first performance, and in other cases, the switch 112 is operated so that the previous-time output of the convergence determination portion 114 is input to the super resolving processor 113.
The super resolving processor 113 is input with the n low resolution images g1, g2, g3 . . . gn and the image from the switch 112 and outputs the result to the convergence determination portion 114. Details of the super resolving processor 113 are described later.
The convergence determination portion 114 is input with the output of the super resolving processor 113 and determines whether or not sufficient convergence is performed. In the case where sufficient convergence is determined to be performed from the result of the super resolving processor 113, the convergence determination portion 114 outputs the result of the process to an external portion and stops the process. In the case where the process is determined to be insufficient, the above data are input through the switch 112 to the super resolving processor 113, and the calculation is performed again. For example, the convergence determination portion 114 extracts a difference between the newest process result and the previous-time process result, and in the case where the difference is equal to or smaller than a predetermined value, it is determined that the convergence is performed. Alternatively, in the case where the number of processes reaches a predetermined number of processes, it is determined that the convergence is performed, and the process result is output.
Details of a configuration and process of the super resolving processor 113 are described with reference to
The super resolving processor 113 is input with the input from the switch 112 illustrated in
Each of the high frequency estimators 121 is input with an image as the mid-reconstruction result which is the input from the switch 112 and one of the low resolution images a g1, g2, . . . gn and outputs the process result to the adder 122. Each of the high frequency estimators 121 calculates a correction value for recovering the high frequency of the image. Details of the process of the high frequency estimator 121 are described later.
The adder 122 adds the results of the high frequency estimators 121 and outputs the process result to the adder 125.
The image quality controller 123 calculates a control value of the pixel value to be used for an ideal image based on a pre-establishment model of the image. The output of the image quality controller 123 is input to the multiplier 124.
The multiplier 124 multiplies the output of the image quality controller 123 with the user setting value α. The image quality of the final image is controlled according to the value of the user setting value α. In addition, in the configuration illustrated in the figure, the user setting value is taken so as to perform the control of the image quality. However, a fixed value may be used without occurrence of a problem.
The adder 125 adds the output of the adder 122 and the output of the multiplier 124 and outputs the calculation result to the scale calculator 126 and the multiplier 127. The scale calculator 126 is input with the mid-calculation result from the switch 112 and the pixel value control signal from the adder 125 to determine the scale value for the final control value. The result of the scale calculator 126 is output to the multiplier 127. The multiplier 127 multiplies the control value of the adder 125 with the output value of the scale calculator 126 and outputs the calculation result to the adder 128. The adder 128 subtracts the result of the multiplier 127 from the mid-processing result from the switch 112 and outputs the result to the convergence determination portion 114.
A detailed configuration and process of each of a plurality of the high frequency estimators 121 set in the super resolving processor 113 illustrated in
The high frequency estimator 121 performs a process corresponding to the lower line portion illustrated in (1) of
The motion detector 130 is input with the high resolution image from the switch 112 and the low resolution image gk to detect a size of the motion between the two images. More specifically, the motion detector 130 calculates the motion vector.
In addition, as a preparation, since the motion is different between the two images, the resolution converter 138 constructed with, for example, an upsampling filter performs an upsampling process on the low resolution image gk, and performs a process of combining a resolution with a to-be-generated high resolution image.
The motion corrector (MC) 131 is input with the high resolution image from the switch 112 and the motion vector from the motion detector 130 and performs a transform on the input high resolution image. The process corresponds to the process of calculation of the motion (Wk) in the aforementioned supper resolution convergence equation (Equation 11) illustrated in (1) of
The spatial filter 132 performs a process of simulation of the deterioration in the spatial resolution. Herein, convolution is performed on the image by using a pre-measured point spread function as a filter. The process corresponds to the process of calculation of the blur (H) in the aforementioned supper resolution convergence equation (Equation 11) illustrated in (1)
The downsampling processor 133 performs a downsampling process on the high resolution image down to the resolution equal to that of the input image. The process corresponds to the process of calculation of the camera resolution (D) in the aforementioned supper resolution convergence equation (Equation 11) illustrated in (1) of
After that, the subtractor 134 calculates the difference value for each pixel between the output of the downsampling processor 133 and the low resolution image gk.
The upsampling processor 135 performs an upsampling process of the difference value. The process corresponds to the process of calculation of the transpose matrix (DT) of the camera resolution (D) in the aforementioned supper resolution convergence equation (Equation 11) illustrated in (1) of
The reverse spatial filter 136 performs a process corresponding to the process of calculation of the transpose matrix (HT) of the blur (H) in the aforementioned supper resolution convergence equation (Equation 11) illustrated in (1) of
The reverse motion corrector 137 performs reverse correction of the motion. The motion which is offset by the motion corrector 131 is applied to the difference value.
Next, a detailed configuration and process of the image quality controller 123 set in the super resolving processor 113 illustrated in
As illustrated in
The image quality controller 123 performs a process corresponding to the calculation process LTLfm, that is, the calculation in the lower line portion of the aforementioned supper resolution convergence equation (Equation 11) illustrated in (1) of
The Laplacian transformation portion 141 applies the Laplacian operator (L) two times on the high resolution image (fm) input from the switch 112 illustrated in
Next, a process of the scale calculator 126 set in the super resolving processor 113 illustrated in
The scale calculator 126 determines the scale (coefficient β) for the gradient vector in the image convergence calculation using a steepest descent method. In other words, the scale calculator 126 determines the coefficient β in the aforementioned supper resolution convergence equation (Equation 11) illustrated in (1) of
The scale calculator 126 is input with the gradient vector ((a) gradient vector (Va) in the aforementioned supper resolution convergence equation (Equation 11) illustrated in (1) of
The scale calculator 126 obtains the coefficient β based on these inputs so that the cost E(fm+1) expressed in the cost calculation equation described as the aforementioned equation (Equation 10) illustrated in (2) of
As a process of calculation of the coefficient β, the coefficient β for the minimization is calculated by generally using a method such as binary search. In addition, in the case where the reduction in the cost of the calculation is desired, a configuration where a constant number is output regardless of an input may be used.
The result (β) of the scale calculator 126 is output to the multiplier 127. The multiplier 127 multiplies the gradient vector (Va) obtained as an output of the adder 125 with the output value (β) of the scale calculator 126 and outputs β(Va) to the adder 128. The adder 128 performs a process of subtracting β(Va), which is an input from the multiplier 127, from the super resolving processed image fm as the m-th super resolving process result, which is input as a mid-processing result of the super resolving process from the switch 112 and calculates the (m+1)-th super resolving process result fm+1.
In other words, fm+1=fm−β(Va).
The (m+1)-th super resolving process result fm+1 is calculated based on the above equation. This equation corresponds to the aforementioned supper resolution convergence equation (Equation 11) illustrated in (1) of
The super resolving processor 113 outputs the (m+1)-th super resolving process result, that is, fm+1=fm−β(Va) to the convergence determination portion 114.
The convergence determination portion 114 is input with the (m+1)-th super resolving process result, that is, fm+1=fm−β(Va) from the super resolving processor 113 and determines based on the input whether or not sufficient convergence is performed. In the case where sufficient convergence is determined to be performed from the result of the super resolving processor 113, the convergence determination portion 114 outputs the result of the process to an external portion and stops the process. In the case where the process is determined to be insufficient, the above data are input through the switch 112 to the super resolving processor 113, and the calculation is performed again. For example, the convergence determination portion 114 extracts a difference between the newest process result and the previous-time process result, and in the case where the difference is equal to or smaller than a predetermined value, it is determined that the convergence is performed. Alternatively, in the case where the number of processes reaches a predetermined number of processes, it is determined that the convergence is performed, and the process result is output.
Next, a configuration and process of the image processing apparatus performing the reconstruction type super resolving process in the case where the processing object is specified to a moving picture are described with reference to
As illustrated in
In the process for a moving picture, the following definitions are used.
gt: one frame of a low resolution moving picture at a time point t
ft: one frame of a high resolution moving picture at a time point t
In this manner, a low resolution image gt is set to one frame of the low resolution moving picture at a time point t, and a high resolution image ft is the high resolution image obtained as a result of the super resolving process applied on the low resolution image gt.
In the image processing apparatus 200 performing the reconstruction type super resolving process illustrated in
The moving picture initial image generation unit 201 is input with the previous-frame moving picture super resolving process results (ft−1) and (gt) and outputs the generated initial image to the moving picture super resolving processor 202. Details of the moving picture initial image generation unit 201 are described later.
The moving picture super resolving processor 202 generates the high resolution image (ft) by applying the low resolution image (gt) to the input initial image and outputs the high resolution image (ft). Details of the moving picture super resolving processor 202 are described later.
The high resolution image output from the moving picture super resolving processor 202 is output to the image buffer 203 at the same time of being output to an external portion, so that the high resolution image is used for the super resolving process for the next frame.
Next, a detailed configuration and process of the moving picture initial image generation unit 201 are described with reference to
First, a process of combining the resolution of the low resolution image gt and the resolution of the to-be-generated high resolution image is performed by performing the upsampling process by the resolution converter 206 constructed with, for example, an upsampling filter.
The motion detector 205 detects a size of the motion between the previous-frame high resolution image ft−1 and the upsampled low resolution image gt. More specifically, the motion detector 205 calculates the motion vector.
In the motion corrector (MC) 207, the motion corrector (MC) 207 performs a motion correction process on the high resolution image ft−1 by using the motion vector detected by the motion detector 205. Therefore, the motion correction is performed on the high resolution image ft−1, so that a motion correction image where a position of the subject is set to be the same as that in the upsampled low resolution image gt is generated.
The MC non-applied area detector 208 detects an area where the motion correction (MC) is not well applied by comparing the high resolution image generated by the motion correction (MC) process with the upsampled low resolution image. The MC non-applied area detector 208 sets appropriateness information α [0:1] of MC application in units of a pixel and outputs the appropriateness information.
The blend processor 209 is input with the motion correction resulting image for the high resolution image ft−1, which is generated by the motion corrector (MC) 207, the upsampled image which is obtained by upsampling the low resolution image gt in the resolution converter 206, and the MC non-applied area detection information which is detected by the MC non-applied area detector 208.
The blend processor 209 outputs the moving picture super resolution initial image as a blend result based on the following equation by using the above input information.
moving picture super resolution initial image(blend result)=(1−α)(upsampled image)+α(motion correction resulting image)
Next, a configuration and process of the moving picture super resolving processor 202 in the image processing apparatus 200 performing the reconstruction type super resolving process illustrated in
As illustrated in
The moving picture super resolving processor 202 is input with the moving picture super resolution initial image as the aforementioned blend result from the moving picture initial image generation unit 201 illustrated in
Furthermore, the moving picture super resolving processor 202 is input with the low resolution image gt and the user setting value α as an image adjustment parameter to generate the high resolution image (ft) as a process result and outputs the high resolution image (ft).
A detailed configuration and process of the moving picture high frequency estimator 211 in the moving picture super resolving processor 202 are described with reference to
Unlike the high frequency estimator 121 corresponding to the still image described above with reference to
The moving picture high frequency estimator 211 is input with the moving picture super resolution initial image as the aforementioned blend result generated by the moving picture initial image generation unit 201 and the low resolution image (gt) and outputs the process result to the adder 214.
The spatial filter 211 illustrated in
The downsampling processor 222 performs a downsampling process on the high resolution image down to the resolution equal to that of the input image. The process corresponds to the process (refer to (1) of
After that, the subtractor 223 calculates the difference value for each pixel between the output of the downsampling processor 222 and the low resolution image gt.
The upsampling processor 224 performs an upsampling process on the difference value. The process corresponds to the process (refer to (1) of
The reverse spatial filter 225 performs a process corresponding to the process ((1) of
In the moving picture super resolving processor 202 illustrated in
In other words, as illustrated in
The Laplacian transformation portion is input with the moving picture super resolution initial image as the aforementioned blend result generated by the moving picture initial image generation unit 201 and applies the Laplacian operator (L) two times on the moving picture super resolution initial image to output the process result to the multiplier 213 illustrated in
The scale calculator 215 determines the scale for the gradient vector in the image convergence calculation using a steepest descent method. In other words, the scale calculator 215 determines the coefficient β in the aforementioned supper resolution convergence equation (Equation 11) illustrated in (1) of
The scale calculator 215 is input with the gradient vector ((a) gradient vector in the aforementioned supper resolution convergence equation (Equation 11) illustrated in (1) of
The scale calculator 215 obtains the coefficient β based on these inputs so that the cost E(fm+1) expressed in the cost calculation equation described as the aforementioned equation (Equation 10) illustrated in (2) of
As a process of calculation of the coefficient β, the coefficient β for the minimization is calculated by generally using a method such as binary search. In addition, in the case where the reduction in the cost of the calculation is desired, a configuration where a constant number is output regardless of an input may be used.
As a result, the coefficient β by which the minimum cost can be set is determined. The subsequent processes are the same as processes described above with reference with
In other words, the result (β) of the scale calculator 215 illustrated in
In other words, ft=f0−β(Va)
The super resolving process result ft is calculated based on the above equation. This equation corresponds to the aforementioned supper resolution convergence equation (Equation 11) illustrated in (1) of
The moving picture super resolving processor 202 outputs the super resolving process result and stores the super resolving process result to the image buffer 203.
(2b) Overview of Learning Type Super Resolving Method
Next, the overview of the learning type super resolving method is described.
The learning type super resolving method is a method of comparing an assumed input image (low resolution image) generated through simulation or the like with an ideal image (high resolution image), generating learned data for generating the high resolution image from the low resolution image, and converting a low resolution image as a new input image into a high resolution image by using the learned data.
The overview of a configuration and process of an image processing apparatus performing the learning type super resolving method is described with reference to
In the case where the learning type super resolving process is performed, as a preparation, the learned data are necessarily generated. First, the learning data generating process is described with reference to
The learning data generating unit 300 is input with the ideal image 351 as a high resolution image and generates the low resolution image 352 as a virtual deteriorated image. The ideal image 351 and the low resolution image 352 are treated as data for the learning. For example, the learning process performing unit 320 illustrated in
As illustrated in
Many combinations of the ideal image 351 as a high resolution image and the low resolution image 352 as a virtual deteriorated image are generated, and the learning process is performed by using the combinations in the learning process performing unit 320 illustrated in
The learning process of the learning process performing unit 320 is described with reference to
The learning process performing unit 320 is sequentially input with the image pairs of the ideal image 351 and the low resolution image 352 generated by the learning data generating unit 300, generates the learned data, and stores the learned data in database (DB) 325.
The block dividers 321 and 322 perform dividing of blocks (localized area) corresponding to the ideal image 351 and the low resolution image 352.
The image feature amount extractor 323 extracts the image feature of the block (localized area) selected from the low resolution image 352. Details of the extracting process are described later.
The transform filter coefficient derivation portion 324 is input with the corresponding blocks extracted from the ideal image 351 and the low resolution image 352 and calculates an optimal transform filter coefficient (filter tap or the like) for performing a spreading process for generating the ideal image 351 from the low resolution image 352.
The database (DB) 325 stores the image feature amount in units of a block generated by the image feature amount extractor 323 and the transform filter coefficient generated by the transform filter coefficient derivation portion 324.
Details of the image feature amount extracting process performed by the image feature amount extractor 323 are described with reference to
The vector transformation portion 331 converts the block image 337 which is the localized area image data of the low resolution image 352 selected by the block divider 321 into a one-dimensional vector 338.
Furthermore, the quantization processor 332 performs conversion such as quantization on each vector element of the one-dimensional vector 338 to generate a quantized vector 339. The value obtained by the calculation is set to the feature amount of the localized image (block). The feature amount data are stored as learned data in the database 325.
The quantized vectors which are the feature amount data in units of a block and the data corresponding to the transform filter coefficient corresponding to the block are stored in the database (DB) 325.
Next, a configuration and process of the learning type super resolving process performing unit performing the learning type super resolving process using the learned data are described with reference to
The learning type super resolving process performing unit 340 illustrated in
First, the block divider 341 is input with the low resolution image 371 which is the object of the performance of the super resolving process and divides the blocks (small areas).
The image feature amount extractor 342 extracts the image feature amount in units of a block. The feature amount is the same quantized vector data as those described with reference to
The transform filter coefficient selector 344 searches for the data that are most similar to the feature amount (quantized vector data) corresponding to the block extracted by the image feature amount extractor 342 from the input data of the database (DB) 343.
The database (DB) 343 corresponds to the database 325 described with reference to
The transform filter coefficient selector 344 selectively extracts the transform filter coefficient, which is in correspondence with the data having the maximum likelihood with respect to the feature amount (quantized vector data) corresponding to the block extracted by the image feature amount extractor 342, from the database 343 and outputs the transform filter coefficient to filter applying portion 345.
The filter applying portion 345 performs a data transform process by using a filter process set with the transform filter coefficient supplied from the transform filter coefficient selector 344 and generates a localized image which becomes a constituting block of the high resolution image 372.
The block combiner 346 combines the blocks sequentially output from the filter applying portion 345 to generate the high resolution image 372.
In this manner, in the high resolution image generating process using the learning type super resolving process, the assumed input image (low resolution image) generated in a simulation or the like is compared with the ideal image (high resolution image), the learned data for generating the high resolution image from the low resolution image are generated, and the low resolution image as a new input image is converted into the high resolution image by using the learned data.
(2c) Problems of Super Resolving Methods
As described above, as a method of generating a high resolution image from a low resolution image, there are the following methods.
(a) Reconstruction Type Super Resolving Method
(b) Learning Type Super Resolving Method
However, with respect to the reconstruction type super resolving method (a), although high performance can be generally expected, there are restrictions as follows
“A plurality of the low resolution images is necessarily input.”
“There is a limitation in the frequency band of the input image, or the like.”
In the case where the input image (low resolution image) which does not satisfy these restrictive conditions may not be obtained, there is a problem in that the reconstruction performance may not be sufficiently obtained and the sufficient high resolution image may not be generated.
In this manner, since the reconstruction type super resolving method is based on the use of a plurality of images, the effect thereof is limited in the case where there is a single input image or a small number of input images.
In addition, in the reconstruction type super resolving method, the following processes are performed in terms of a practical effect.
(a) Component estimation for high frequency component (equal to or higher than the Nyquist frequency) based on aliasing component (aliasing) in the input screen
(b) Elimination of aliasing component (aliasing) in low frequency component (equal to or lower than the Nyquist frequency) and recovery of high frequency component (equal to or higher than the Nyquist frequency)
However, in the case where there are a small number of the input images, there is a problem in that the estimation of the aliasing component (aliasing) is not appropriately performed. In addition, even in the case where the aliasing component may not be detected in the input image caused by an extreme deterioration of the input image, similarly, there is a case where the high frequency performance is insufficient.
Therefore, in the reconstruction type super resolving method, in the case where a plurality of the input images is used and aliasing distortion occurs caused by the sampling, a great effect can be expected. However, in the case where there are a small number of the input images or in the case where there is no aliasing distortion in the input image, there is a disadvantages in that a resolution improvement effect is low.
On the other hand, with respect to the learning type super resolving method (b), although the restriction caused by the number of the input images and the properties of the input image is low and stabilized, there is a problem in that the peak performance of the finally-obtained high resolution image does not reach the reconstruction type super resolution.
In the learning type super resolving method (b), in the case where the learned data are sufficient and reference information at the time of selecting the learned data is sufficient, a great effect can be obtained.
However, practically, there are restrictions as follows.
Upper limit of the data amount of the learned data
Limitation in the reference information at the time of selecting the learned data
Due to these restrictions, in the learning type super resolving method, a final high resolution image is generated as a result of combination of the processes in units of a block, so that whole balance may be deteriorated. Therefore, there is a case where a sufficient resolution improvement effect may not be obtained.
Hereinafter, embodiments of the super resolving method according to the invention are described. The image processing apparatus according to the invention implements the super resolving method using advantages of the reconstruction type super resolving method and the learning type super resolving method. First, the overview of the super resolving process according to the invention is described.
The image processing apparatus according to an embodiment of the invention performs a process the following supper resolution convergence equation (Equation 12) which is obtained by modifying the supper resolution convergence equation described as the aforementioned equation (Equation 11).
In the supper resolution convergence equation (Equation 12), α is an arbitrary user setting parameter in an image process (super resolving process). T represents a transpose matrix.
According to the above relational equation (Equation 12), the ideal image (f) for minimizing a cost E(f) can be obtained by a gradient method.
In the supper resolution convergence equation (Equation 12), (HTDT) and (DH) have the meaning corresponding to the execution of the following processes as processes of the image processing apparatus.
DH: Process of applying downsampling filter
HTDT: Process of applying upsampling filter
Simple downsampling and upsampling processes which are calculated from the model equation expressed in the aforementioned supper resolution convergence equation (Equation 12) derive mathematically correct results. However, there is a case where these results are not necessarily coincident with subjective estimation.
In the invention, the simple downsampling process calculated from the model equation is replaced with a reducing process using the learned data, and the upsampling process is replaced with a spreading process using the same learned data or a learning type super resolving process, so that the subjective result of the super resolving result is improved. By using this method, in the case where the number of input images is small or even in the case where the input image is extremely deteriorated, the effect of image quality improvement can be expected.
Hereinafter, the super resolving processes according to a plurality of embodiments (first to third embodiments) of the invention are sequentially described.
First, an image processing apparatus according to a first embodiment of the invention is described with reference to
The image processing apparatus 500 illustrated in
In the image processing apparatus 500 according to the invention, the configuration of the high frequency estimator 521 constructed in the super resolving processor 503 is different from the aforementioned configuration in the related art, so that a different process is performed.
The image processing apparatus 500 illustrated in FIG. 16 is input with a plurality (n) of the low resolution images g1 to gn and outputs one high resolution image fm. The g1, g2, . . . gn illustrated in
The initial image generation unit 501 sets an initial value of the super resolving process result. The initial value may be an arbitrary value. For example, the low resolution image g1 is input, and an image where the g1 is spread is output.
The switch 502 turns to the output side of the initial image generation unit 501 only at the time of first performance, and in other cases, the switch 502 is operated so that the previous-time output of the convergence determination portion 504 is input to the super resolving processor 503.
The super resolving processor 503 is input with the n low resolution images g1, g2, g3 . . . gn and the image from the switch 502 and output the result to the convergence determination portion 504. Details of the super resolving processor 503 are described later.
The convergence determination portion 504 is input with the output of the super resolving processor 503 and determines whether or not sufficient convergence is performed. In the case where sufficient convergence is determined to be performed from the result of the super resolving processor 503, the convergence determination portion 504 outputs the result of the process to an external portion and stops the process. In the case where the process is determined to be insufficient, the above data are input through the switch 502 to the super resolving processor 503, and the calculation is performed again. For example, the convergence determination portion 504 extracts a difference between the newest process result and the previous-time process result, and in the case where the difference is equal to or smaller than a predetermined value, it is determined that the convergence is performed. Alternatively, in the case where the number of processes reaches a predetermined number of processes, it is determined that the convergence is performed, and the process result is output.
Details of a configuration and process of the super resolving processor 503 are described with reference to
As illustrated in
The super resolving processor 503 is input with the input from the switch 502 illustrated in
Each of the high frequency estimators 521 is input with an image as a mid-reconstruction result image which is the input from the switch 502 and one of the low resolution images a g1, g2, . . . gn and outputs the process result to the adder 522. Each of the high frequency estimators 521 calculates a correction value for recovering the high frequency of the image. Details of the process of the high frequency estimator 521 are described later.
The adder 522 adds the results of the high frequency estimators 521 and outputs the process result to the adder 525.
The image quality controller 523 calculates a control value of the pixel value to be used for an ideal image based on a pre-establishment model of the image. The output of the image quality controller 523 is input to the multiplier 524.
The multiplier 524 multiplies the output of the image quality controller 523 with the user setting value α. The image quality of the final image is controlled according to the value of the user setting value α. In addition, in the configuration illustrated in the figure, the user setting value is taken so as to perform the control of the image quality. However, a fixed value may be used without occurrence of a problem.
The adder 525 adds the output of the adder 522 and the output of the multiplier 524 and outputs the calculation result to the scale calculator 526 and the multiplier 527. The scale calculator 526 is input with the mid-calculation result from the switch 512 and the pixel value control signal from the adder 525 to determine the scale value for the final control value. The result of the scale calculator 526 is output to the multiplier 527. The multiplier 527 multiplies the control value of the adder 525 with the output value of the scale calculator 526 and outputs the calculation result to the adder 528. The adder 528 subtracts the result of the multiplier 527 from the mid-processing result from the switch 502 and outputs the result to the convergence determination portion 504.
A detailed configuration and process of each of a plurality of the high frequency estimators 521 set in the super resolving processor 503 illustrated in
The high frequency estimator 521 performs a process corresponding to the process of calculation in the lower line portion illustrated in
The motion detector 601 is input with the high resolution image from the switch 502 and the low resolution image gk to detect a size of the motion between the two images. More specifically, the motion detector 601 calculates the motion vector.
In addition, as a preparation, since the motion is different between the two images, the resolution converter 602 constructed with, for example, an upsampling filter performs an upsampling process on the low resolution image gk, and performs a process of combining a resolution with a to-be-generated high resolution image.
The motion corrector (MC) 603 is input with the high resolution image from the switch 502 and the motion vector from the motion detector 601 and performs a transform on the input high resolution image. The process corresponds to the process of calculation of the motion (Wk) in the aforementioned supper resolution convergence equation (Equation 12) illustrated in
The spatial filter 604 performs a process of simulation of the deterioration in the spatial resolution. Herein, convolution is performed on the image by using a pre-measured point spread function as a filter. The process corresponds to the process of calculation of the blur (H) in the aforementioned supper resolution convergence equation (Equation 12) illustrated in
The downsampling processor 605 performs a downsampling process on the high resolution image down to the resolution equal to that of the input image. The process corresponds to the process of calculation of the camera resolution (D) in the aforementioned supper resolution convergence equation (Equation 12) illustrated in
The low resolution image generated through the downsampling of the high resolution image in the downsampling processor 605 is input to the learning type super resolving processor 606.
The learning type super resolving processor 606 has the same configuration as that of the learning type super resolving process performing unit 340 described above with reference to
The low resolution image generated through the downsampling provided from the downsampling processor 605 corresponds to the low resolution image 371 illustrated in
At the same time, the learning type super resolving processor 608 is input with the low resolution image (gk) which is input to the high frequency estimator 521. The learning type super resolving processor 608 has the same configuration as that of the learning type super resolving process performing unit 340 described above with reference to
In addition, the learning type super resolving processor 606 and the learning type super resolving processor 608 perform an upsampling process as a learning type super resolving process using the learned data including data corresponding to feature amount information of a localized image area of the low resolution image and the high resolution image generated based on the low resolution image and image transform information for converting the low resolution image into the high resolution image.
In addition, the learning type super resolving processor 606 and the learning type super resolving processor 608 may have a configuration where only the input data are different and the same processes are simultaneously performed or a configuration where individual processes using learned data or algorithm optimized to each process are performed.
Through these processes, the learning type super resolving processor 606 generates a first high resolution image, and the learning type super resolving processor 608 generates a second high resolution image.
The first high resolution image generated by the learning type super resolving processor 606 is a high resolution image generated by inputting the low resolution image generated through the downsampling process of the high resolution image input from the switch 502 and performing the learning type super resolving process.
The second high resolution image generated by the learning type super resolving processor 608 is a high resolution image generated by inputting the low resolution image (gk) input to the high frequency estimator 521 and performing the learning type super resolving process.
The first high resolution image generated by the learning type super resolving processor 606 and the second high resolution image generated by the learning type super resolving processor 608 are input to the reverse motion correctors 607 and 609.
Each of the reverse motion correctors 607 and 609 performs reverse correction of the motion on each of the high resolution images. The reverse motion correction which is offset with the motion correction process of the motion corrector 603 is performed.
The adder 610 subtracts the output of the reverse motion corrector 609 from the output of the reverse motion corrector 607. In other words, the difference data in the reverse motion correction image between the first high resolution image generated by inputting the low resolution image generated through the downsampling process of the high resolution image input from the switch 502 and performing the learning type super resolving process and the second high resolution image generated by inputting the low resolution image (gk) input to the high frequency estimator 521 and performing the learning type super resolving process is generated. The difference data are output to the adder 522.
In addition, as illustrated in
The output of the reverse motion corrector 607 corresponds to WkTHTDTDHWkfm in the supper resolution convergence equation (Equation 12), and the output of the reverse motion corrector 609 corresponds to WkTHTDTgk in the supper resolution convergence equation (Equation 12).
As illustrated in
In this manner, the adder 522 adds the results of the high frequency estimators 521 and outputs the process result to the adder 525.
The image quality controller 523 calculates a control value of the pixel value to be used for an ideal image based on a pre-establishment model of the image. The output of the image quality controller 523 is input to the multiplier 524.
The multiplier 524 multiplies the output of the image quality controller 523 with the user setting value α. The image quality of the final image is controlled according to the value of the user setting value α.
The output of the multiplier 524 corresponds to αLTLfm in the supper resolution convergence equation (Equation 12).
In addition, in the configuration illustrated in the figure, the user setting value is taken so as to perform the control of the image quality. However, a fixed value may be used without occurrence of a problem.
The processes of the adder 525, the scale calculator 526, and the like are described with reference to
The scale calculator 526 determines the scale (coefficient β) for the gradient vector in the image convergence calculation using a steepest descent method. In other words, the scale calculator 526 determines the coefficient β in the aforementioned supper resolution convergence equation (Equation 12) illustrated in
The scale calculator 526 is input with the gradient vector ((a) gradient vector (Va) in the aforementioned supper resolution convergence equation (Equation 12) illustrated in
The scale calculator 526 obtains the coefficient β based on these inputs so that the cost E(fm+1) expressed in the cost calculation equation described as the aforementioned equation (Equation 10) illustrated in
As a process of calculation of the coefficient β, the coefficient β for the minimization is calculated by generally using a method such as binary search. In addition, in the case where the reduction in the cost of the calculation is desired, a configuration where a constant number is output regardless of an input may be used.
The result (β) of the scale calculator 526 is output to the multiplier 527. The multiplier 527 multiplies the gradient vector (Va) obtained as an output of the adder 525 with the output value (β) of the scale calculator 526 and outputs β(Va) to the adder 528. The adder 528 performs a process of subtracting β(Va), which is an input from the multiplier 527, from the super resolving processed image fm as the m-th super resolving process result, which is input as a mid-processing result of the super resolving process from the switch 502 and outputs the (m+1)-th super resolving process result fm+1.
In other words, fm+1=fm−β(Va)
The (m+1)-th super resolving process result fm+1 is calculated based on the above equation. This equation corresponds to the aforementioned supper resolution convergence equation (Equation 12) illustrated in
The super resolving processor 503 outputs the (m+1)-th super resolving process result, that is, fm+1=fm−β(Va) to the convergence determination portion 504.
The convergence determination portion 504 is input with the (m+1)-th super resolving process result, that is, fm+1=fm−β(Va) from the super resolving processor 503 and determines based on the input whether or not sufficient convergence is performed.
In the case where sufficient convergence is determined to be performed from the result of the super resolving processor 503, the convergence determination portion 504 outputs the result of the process to an external portion and stops the process. In the case where the process is determined to be insufficient, the above data are input through the switch 502 to the super resolving processor 503, and the calculation is performed again. For example, the convergence determination portion 504 extracts a difference between the newest process result and the previous-time process result, and in the case where the difference is equal to or smaller than a predetermined value, it is determined that the convergence is performed. Alternatively, in the case where the number of processes reaches a predetermined number of processes, it is determined that the convergence is performed, and the process result is output.
Next, an image processing apparatus according to a second embodiment of the invention is described with reference to
In the second embodiment, the basic configuration is the same as that of the aforementioned first embodiment except that the configuration of the high frequency estimator 521 in the first embodiment is modified.
The basic configuration of the image processing apparatus according to the second embodiment is the same as that of the first embodiment as illustrated in
The configuration of the super resolving processor 503 is also the same as that of the first embodiment as illustrated in
The configuration of the high frequency estimator 521 in the super resolving processor 503 is different from that of the first embodiment (
The configuration and process of the high frequency estimator 521 according to the second embodiment are described with reference to
The high frequency estimator 521 performs a process corresponding to the calculation process in the lower line portion illustrated in
The motion detector 651 is input with the high resolution image from the switch 502 and the low resolution image gk in the image processing apparatus 500 illustrated in
In addition, as a preparation, since the motion is different between the two images, the resolution converter 652 constructed with, for example, an upsampling filter performs an upsampling process on the low resolution image gk, and performs a process of combining a resolution with a to-be-generated high resolution image.
The motion corrector (MC) 653 is input with the high resolution image from the switch 502 and the motion vector from the motion detector 601 and performs a transform on the input high resolution image. The process corresponds to the process of calculation of the motion (Wk) in the aforementioned supper resolution convergence equation (Equation 12) illustrated in
The spatial filter 654 performs a process of simulation of the deterioration in the spatial resolution. Herein, convolution is performed on the image by using a pre-measured point spread function as a filter. The process corresponds to the process of calculation of the blur (H) in the aforementioned supper resolution convergence equation (Equation 12) illustrated in
The downsampling processor 655 performs a downsampling process on the high resolution image down to the resolution equal to that of the input image. The process corresponds to the process of calculation of the camera resolution (D) in the aforementioned supper resolution convergence equation (Equation 12) illustrated in
The low resolution image generated through the downsampling of the high resolution image in the downsampling processor 655 is input to the subtractor 656.
The subtractor 656 calculates the difference value for each pixel between the low resolution image generated by the downsampling processor 655 and the low resolution image gk input to the high frequency estimator 521.
The difference image as a difference value calculated by the subtractor 656 is input to the learning type super resolving processor 657.
The learning type super resolving processor 657 has the same configuration as that of the learning type super resolving process performing unit 340 described above with reference to
The difference image data generated by the subtractor 656, that is, the difference image data which are constructed with difference values of pixels between the low resolution image generated through the downsampling of the high resolution image and the input low resolution images gk correspond to the low resolution image 371 illustrated in
The learning type super resolving processor 657 performs the learning type super resolving process using the learned data stored in advance in the database to generate the high resolution difference image constructed with the difference data. In other words, the learning type super resolving processor 657 generates the high resolution difference image constructed with the difference data as data corresponding to the high resolution image 372 illustrated in
In addition, the learned data stored in the database, which are used for the learning type super resolving process, are the learned data for generating the difference data corresponding to the high resolution difference image from the difference image data which are constructed with the difference values of pixels in the low resolution images.
In this manner, the learning type super resolving processor 657 performs a learning type super resolving process in the upsampling process on the difference image between the downsampling processed image, which is converted to have the same resolution as that of the low resolution image through a downsampling process of the processed image constructed with the high resolution images, and the low resolution image input as a processing object image of the super resolving process.
In addition, the learning type super resolving processor 657 performs an upsampling process as a learning type super resolving process using the learned data including data corresponding to feature amount information of a localized image area of the difference image between the low resolution image and the high resolution image generated based on the low resolution image and image transform information for converting the difference image into the high resolution difference image.
The high resolution difference image data generated by the learning type super resolving processor 657 are input to the reverse motion corrector 658.
The reverse motion corrector 658 performs reverse correction of the motion on the high resolution image as a difference image. The reverse motion correction which is offset with the motion correction process of the motion corrector 603 is performed. The difference data is output to the adder 522.
The output of the reverse motion corrector 658 is the data corresponding to the output of the adder 610 of the high frequency estimator 521 illustrated in
In other words, the second embodiment is different from the first embodiment in that, in the first embodiment, the learning type super resolving process is performed not on the image difference data but on individual images, and the process of calculating the resulting differences is performed, and in the second embodiment, the difference data are generated in advance, and the learning type super resolving process is performed on the difference data.
With respect to the processes after outputting the difference data to the adder 522, the processes of the second embodiment are the same as those of the first embodiment, and thus, the description thereof is omitted.
As described above, in the first and second embodiments, the upsampling process is configured to be performed as a learning type super resolving process using learned data.
In the first embodiment, the upsampling process which is performed as a process of generating the high resolution image from the low resolution image is configured to be performed as a learning type super resolving process using learned data.
In addition, in the second embodiment, the upsampling process for the difference image between the low resolution images is configured to be performed as a learning type super resolving process using learned data.
In other words, as illustrated in
As described with reference to
For example, the upsampling process is performed as a learning type super resolving process using learned data, so that the subjective result of the super resolving result is improved. In addition, in the case where the number of input low resolution images is small or even in the case where the input image is extremely deteriorated, the high resolution image having low deterioration in the image quality can be generated.
As described above, in the case where only the reconstruction type super resolving method is used, the following processes are performed as an upsampling process.
(a) Component estimation for high frequency component (equal to or higher than the Nyquist frequency) based on aliasing component (aliasing) in the input screen
(b) Elimination of aliasing component (aliasing) in low frequency component (equal to or lower than the Nyquist frequency) and recovery of high frequency component (equal to or higher than the Nyquist frequency)
However, with respect to this method, in the case where there are a small number of the input images, there is a problem in that the estimation of the aliasing component (aliasing) is not appropriately performed. In addition, even in the case where the aliasing component may not be detected in the input image caused by an extreme deterioration of the input image, similarly, the high frequency performance is insufficient.
In the configuration of the invention, at the time of the upsampling process, the learning type super resolving process using the learned data is configured to be performed, and the upsampling process without occurrence of defects in the reconstruction type super resolving process as described above can be performed.
Next, an image processing apparatus according to a third embodiment of the invention is described with reference to
The basic configuration of the image processing apparatus according to the third embodiment is the same as that of the image processing apparatus 200 performing the reconstruction type super resolving process described above with reference to
However, the configuration and process of the high frequency estimator in the image processing apparatus according to the third embodiment are different from those of the image processing apparatus 200.
The basic configuration of the image processing apparatus according to the third embodiment is described with reference to
As illustrated in
In the process for a moving picture, the following definitions are used.
gt: one frame of a low resolution moving picture at a time point t
ft: one frame of a high resolution moving picture at a time point t
In this manner, a low resolution image gt is set to one frame of the low resolution moving picture at a time point t, and a high resolution image ft is the high resolution image obtained as a result of the super resolving process applied on the low resolution image gt.
In the image processing apparatus 700 illustrated in
The moving picture initial image generation unit 701 is input with the previous-frame moving picture super resolving process results (ft−1) and (gt) and outputs the generated initial image to the moving picture super resolving processor 702. Details of the moving picture initial image generation unit 701 are described later.
The moving picture super resolving processor 702 generates the high resolution image (ft) by applying the low resolution image (gt) to the input initial image and outputs the high resolution image (ft). Details of the moving picture super resolving processor 702 are described later.
The high resolution image output from the moving picture super resolving processor 702 is output to the image buffer 703 at the same time of being output to an external portion, so that the high resolution image is used for the super resolving process for the next frame.
Next, a detailed configuration and process of the moving picture initial image generation unit 701 are described with reference to
First, a process of combining the resolution of the low resolution image gt and the resolution of the to-be-generated high resolution image is performed by performing the upsampling process by the resolution converter 706 constructed with, for example, an upsampling filter.
The motion detector 705 detects a size of the motion between the previous-frame high resolution image ft−1 and the upsampled low resolution image gt. More specifically, the motion detector 705 calculates the motion vector.
In the motion corrector (MC) 707, the motion corrector (MC) 707 performs a motion correction process on the high resolution image ft−1 by using the motion vector detected by the motion detector 705. Therefore, the motion correction is performed on the high resolution image ft−1, so that a motion correction image where a position of the subject is set to be the same as that in the upsampled low resolution image gt is generated.
The MC non-applied area detector 708 detects an area where the motion correction (MC) is not well applied by comparing the high resolution image generated by the motion correction (MC) process with the upsampled low resolution image. The MC non-applied area detector 708 sets appropriateness information α [0:1] of MC application in units of a pixel and output the appropriateness information.
The blend processor 709 is input with the motion correction resulting image for the high resolution image ft−1, which is generated by the motion corrector (MC) 707, the upsampled image which is obtained by upsampling the low resolution image gt in the resolution converter 706, and the MC non-applied area detection information which is detected by the MC non-applied area detector 708.
The blend processor 709 outputs the moving picture super resolution initial image as a blend result based on the following equation by using the above input information.
moving picture super resolution initial image(blend result)=(1−α)(upsampled image)+α(motion correction resulting image)
Next, a configuration and process of the moving picture super resolving processor 702 in the image processing apparatus 700 performing the reconstruction type super resolving process illustrated in
The moving picture super resolving processor 702 is input with the moving picture super resolution initial image as the aforementioned blend result from the moving picture initial image generation unit 701 illustrated in
Furthermore, the moving picture super resolving processor 702 is input with the low resolution image gt and the user setting value α as an image adjustment parameter to generate the high resolution image (ft) as a process result and outputs the high resolution image (ft).
A detailed configuration and process of the moving picture high frequency estimator 711 in the moving picture super resolving processor 702 are described with reference to
The spatial filter 801 illustrated in
The downsampling processor 802 performs a downsampling process on the high resolution image down to the resolution equal to that of the input image. The process corresponds to the process (refer to
After that, the low resolution image generated through the downsampling of the high resolution image in the downsampling processor 802 is input to the learning type super resolving processor 803.
The learning type super resolving processor 803 has the same configuration as that of the learning type super resolving process performing unit 340 described above with reference to
The low resolution image generated through the downsampling provided from the downsampling processor 802 corresponds to the low resolution image 371 illustrated in
At the same time, the learning type super resolving processor 804 is input with the low resolution image (gt) which is input to the moving picture high frequency estimator 711. The learning type super resolving processor 804 has the same configuration as that of the learning type super resolving process performing unit 340 described above with reference to
In addition, the learning type super resolving processor 803 and the learning type super resolving processor 804 perform an upsampling process as a learning type super resolving process using the learned data including data corresponding to feature amount information of a localized image area of the low resolution image and the high resolution image generated based on the low resolution image and image transform information for converting the low resolution image into the high resolution image.
In addition, the learning type super resolving processor 803 and the learning type super resolving processor 804 may have a configuration where only the input data are different and the same processes are simultaneously performed or a configuration where individual processes using learned data or algorithm optimized to each process are performed in advance.
Through these processes, the learning type super resolving processor 803 generates a first high resolution image, and the learning type super resolving processor 804 generates a second high resolution image.
The first high resolution image generated by the learning type super resolving processor 803 is a high resolution image generated by inputting the low resolution image generated through the downsampling process of the initial super resolving image input from the moving picture initial image generation unit 701 and performing the learning type super resolving process.
The second high resolution image generated by the learning type super resolving processor 804 is a high resolution image generated by inputting the low resolution image (gt) input to the moving picture high frequency estimator 711 and performing the learning type super resolving process.
The adder 805 generates difference image data by subtracting corresponding pixels of the second high resolution image generated by the learning type super resolving processor 804 from the first high resolution image generated by the learning type super resolving processor 803. The difference data are output to the adder 714.
The image quality controller 712 of the moving picture super resolving processor 702 illustrated in
The Laplacian transformation portion is input with the moving picture super resolution initial image as the aforementioned blend result generated by the moving picture initial image generation unit 701 and applies the Laplacian operator (L) two times on the moving picture super resolution initial image to output the process result to the multiplier 713 illustrated in
The scale calculator 715 illustrated in
The scale calculator 715 is input with the gradient vector ((a) gradient vector in the aforementioned supper resolution convergence equation (Equation 12) illustrated in
The scale calculator 715 obtains the coefficient β based on these inputs so that the cost E(fm+1) expressed in the cost calculation equation described as the aforementioned equation (Equation 10) illustrated in
As a process of calculation of the coefficient β, the coefficient β for the minimization is calculated by generally using a method such as binary search. In addition, in the case where the reduction in the cost of the calculation is desired, a configuration where a constant number is output regardless of an input may be used.
As a result, the coefficient β by which the minimum cost can be set is determined. The coefficient (β) is output to the multiplier 716. The multiplier 716 multiplies the gradient vector (Va) (refer to
In other words, ft=f0−β(Va)
The super resolving process result ft is calculated based on the above equation. This equation corresponds to the aforementioned supper resolution convergence equation (Equation 12) illustrated in
The moving picture super resolving processor 702 outputs the super resolving process result and stores the super resolving process result to the image buffer 703.
Next, an image processing apparatus according to a fourth embodiment of the invention is described with reference to
The basic configuration of the image processing apparatus according to the fourth embodiment is the same as the aforementioned configuration of the third embodiment illustrated in
Similarly to the third embodiment, the moving picture initial image generation unit 701 has the configuration illustrated in
The difference between the fourth embodiment and the third embodiment is the configuration of the moving picture high frequency estimator 711 in the moving picture super resolving processor 702 illustrated in
In the third embodiment, the moving picture high frequency estimator 711 is described to have the configuration illustrated in
The configuration and process of the moving picture high frequency estimator 711 included in the moving picture super resolving processor 702 (refer to
The moving picture high frequency estimator 711 calculates the correction value for recovering the high frequency of the image. The moving picture high frequency estimator 711 is input with the moving picture super resolution initial image as the aforementioned blend result generated by the moving picture initial image generation unit 701 and the low resolution image (gt) and outputs the process result to the adder 714.
The spatial filter 851 illustrated in
The downsampling processor 852 performs a downsampling process on the high resolution image down to the resolution equal to that of the input image. The process corresponds to the process (refer to
After that, the low resolution image generated through the downsampling of the high resolution image in the downsampling processor 852 is input to the subtractor 853.
The subtractor 853 calculates the difference value for each pixel between the low resolution image generated by the downsampling processor 852 and the low resolution image gk input to the moving picture high frequency estimator 711.
The difference image as a difference value calculated by the subtractor 853 is input to the learning type super resolving processor 854.
The learning type super resolving processor 854 has the same configuration as that of the learning type super resolving process performing unit 340 described above with reference to
The difference image data generated by the subtractor 853, that is, the difference image data which are constructed with difference values of pixels between the low resolution image generated through the downsampling of the high resolution image and the input low resolution images gk correspond to the low resolution image 371 illustrated in
The learning type super resolving processor 854 performs the learning type super resolving process using the learned data stored in advance in the database to generate the high resolution difference image corresponding to the difference image. In other words, the learning type super resolving processor 854 generates the high resolution difference image constructed with the difference data as data corresponding to the high resolution image 372 illustrated in
In addition, the learned data stored in the database, which are used for the learning type super resolving process, are the learned data for generating the difference data corresponding to the high resolution difference image from the difference image data which are constructed with the difference values of pixels in the low resolution images.
In this manner, the learning type super resolving processor 854 performs a learning type super resolving process in the upsampling process on the difference image between the downsampling processed image, which is converted to have the same resolution as that of the low resolution image through a downsampling process of the processed image constructed with the high resolution images, and the low resolution image input as a processing object image of the super resolving process.
In addition, the learning type super resolving processor 854 performs an upsampling process as a learning type super resolving process using the learned data including data corresponding to feature amount information of a localized image area of the difference image between the low resolution image and the high resolution image generated based on the low resolution image and image transform information for converting the difference image into the high resolution difference image.
The high resolution difference image data generated by the learning type super resolving processor 854 are output to the adder 714. The subsequent processes are the same as those of the third embodiment.
In other words, the image quality controller 712 of the moving picture super resolving processor 702 illustrated in
The Laplacian transformation portion is input with the moving picture super resolution initial image as the aforementioned blend result generated by the moving picture initial image generation unit 701 and applies the Laplacian operator (L) two times on the moving picture super resolution initial image to output the process result to the multiplier 713 illustrated in
The scale calculator 715 illustrated in
The scale calculator 715 is input with the gradient vector ((a) gradient vector in the aforementioned supper resolution convergence equation (Equation 12) illustrated in
The scale calculator 715 obtains the coefficient β based on these inputs so that the cost E(fm+1) expressed in the cost calculation equation described as the aforementioned equation (Equation 10) illustrated in
As a process of calculation of the coefficient β, the coefficient β for the minimization is calculated by generally using a method such as binary search. In addition, in the case where the reduction in the cost of the calculation is desired, a configuration where a constant number is output regardless of an input may be used.
As a result, the coefficient β by which the minimum cost can be set is determined. The coefficient (β) is output to the multiplier 716. The multiplier 716 multiplies the gradient vector (Va) (refer to
In other words, ft=f0−β(Va)
The super resolving process result ft is calculated based on the above equation. This equation corresponds to the aforementioned supper resolution convergence equation (Equation 12) illustrated in
The moving picture super resolving processor 702 outputs the super resolving process result and stores the super resolving process result to the image buffer 703.
The fourth embodiment is different from the third embodiment in that, in the third embodiment, the learning type super resolving process is performed not on the image difference data but on individual images, and the process of calculating the resulting differences is performed, and in the fourth embodiment, the difference data are generated in advance, and the learning type super resolving process is performed on the difference data.
As described above, in the third and fourth embodiments, in the image processing apparatus performing a process in the case where the super resolving processing object is a moving picture, the upsampling process is configured to be performed as a learning type super resolving process using learned data.
In the third embodiment, the upsampling process which is performed as a process of generating the high resolution image from the low resolution image is configured to be performed as a learning type super resolving process using learned data.
In addition, in the fourth embodiment, the upsampling process for the difference image between the low resolution images is configured to be performed as a learning type super resolving process using learned data.
In other words, as illustrated in
As described with reference to
For example, the upsampling process is performed as a learning type super resolving process using learned data, so that the subjective result of the super resolving result is improved. In addition, in the case where the number of input low resolution images is small or even in the case where the input image is extremely deteriorated, the high resolution image having low deterioration in the image quality can be generated.
As described above, in the case where only the reconstruction type super resolving method is used, the following processes are performed as an upsampling process.
(a) Component estimation for high frequency component (equal to or higher than the Nyquist frequency) based on aliasing component (aliasing) in the input screen
(b) Elimination of aliasing component (aliasing) in low frequency component (equal to or lower than the Nyquist frequency) and recovery of high frequency component (equal to or higher than the Nyquist frequency)
However, with respect to this method, in the case where there are a small number of the input images, there is a problem in that the estimation of the aliasing component (aliasing) is not appropriately performed. In addition, even in the case where the aliasing component may not be detected in the input image caused by an extreme deterioration of the input image, similarly, the high frequency performance is insufficient.
In the configuration of the invention, at the time of the upsampling process, the learning type super resolving process using the learned data is configured to be performed, and the upsampling process without occurrence of defects in the reconstruction type super resolving process as described above can be performed.
In addition, in the aforementioned first to fourth embodiments, although examples of processes where the upsampling process is performed as a process using the learned data is described, with respect to the downsampling process performed in the image processing apparatus, the learned data may be prepared in advance, and the downsampling process using the learned data may be configured to be performed.
Finally, an example of a hardware configuration of the image processing apparatus performing the aforementioned processes is described with reference to
The CPU 901 is connected to an input/output interface 905 via the bus 904. An input unit 906 constructed with a keyboard, a mouse, a microphone and an output unit 907 constructed with a display, a speaker, or the like are connected to the input/output interface 905. The CPU 901 performs various processes according to commands input from the input unit 906 and outputs the process result to, for example, the output unit 907.
The storage unit 908 connected to the input/output interface 905 is constructed with, for example, a hard disk to store the program performed by the CPU 901 or various data. A communication unit 909 communicates with external apparatuses through a network such as the Internet or a local area network.
A drive 910 connected to the input/output interface 905 drives a magnetic disk, an optical disk, a magneto-optical disk, a removable media 911 such as a semiconductor memory, or the like to acquire the recorded programs, data, or the like. The acquired programs or data are transmitted and stored in the storage unit 908 if necessary.
Hereinbefore, the invention is described in detail with reference to specific embodiments. However, it is obvious that modifications and alterations of the embodiments can be made by the ordinarily skilled in the related art without departing from the spirit of the invention. In other words, the invention is disclosed through exemplary embodiments, and thus, the embodiments should not be analyzed in a limited meaning. In the determination of the spirit of the invention, the claims should be considered.
In addition, a series of the processes described in the specification can be implemented in a hardware configuration, a software configuration, or a combination thereof. In the case of performing the process in the software configuration, a program recording the process sequence may be installed in a memory in a computer assembled with dedicated hardware to be performed, or the program may be installed in a general-purpose computer which various types of processes can be performed. For example, the program may be recorded in a recording medium in advance. In addition to the installation of the program from the recording medium to the computer, a program may be received via a network such as a LAN (Local Area Network) or the Internet and installed in a recording medium such as an embedded hard disk.
In addition, various types of the processes described in the specification may be performed in a time sequence according to the description and simultaneously or individually according to a processing capability of an apparatus performing the processes or if necessary. In addition, a term “system” in the specification denotes a logical set configuration of a plurality of apparatuses, but it is not limited to a system where the apparatus of each configuration is contained in the same casing.
The present application contains subject matter related to that disclosed in Japanese Priority Patent Application JP 2010-043699 filed in the Japan Patent Office on Mar. 1, 2010, the entire contents of which are hereby incorporated by reference.
It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and alterations may occur depending on design requirements and other factors insofar as they are within the scope of the appended claims or the equivalents thereof.
Number | Date | Country | Kind |
---|---|---|---|
2010-043699 | Mar 2010 | JP | national |