This nonprovisional application claims priority under 35 U.S.C. § 119(a) on Patent Application No. 2007-201860 filed in Japan on Aug. 2, 2007, the entire contents of which are hereby incorporated by reference.
1. Field of the Invention
The present invention relates to an image processing apparatus and an image processing method, which are used for generating a high-resolution image from a plurality of low-resolution images, in particular. Also, the present invention relates to an electronic appliance utilizing the image processing apparatus.
2. Description of Related Art
Recently, image sensing apparatuses for obtaining digital images by using a solid-state image sensing device such as a CCD (Charge Coupled Device) or a CMOS (Complimentary Metal Oxide Semiconductor) image sensor as well as display devices for displaying the digital images have become widely available along with development of various digital techniques. The image sensing apparatus is, for instance, a digital still camera or a digital video camera while the display device is, for instance, a liquid crystal display or a plasma television set. As for the image sensing apparatus and the display device, an image processing technique is proposed, in which a plurality of digital images obtained at different time points are used for converting a resolution of the images into higher one. Hereinafter, this conversion is referred to as a high resolution conversion.
The image processing technique with the high resolution conversion is used for generating primary color images of R (red), G (green) and B (blue) having the same resolution as a CFA (Color Filter Array) image from the CFA image obtained by using a single solid-state image sensing device having a micro primary color filter of R, G and B or the like arranged in a tessellated manner, or for generating a CFA image having a higher resolution. The digital images including the CFA image are generated by obtaining image information of subject images at digitized sampling positions. If the sampling positions are shifted from each other between different frames, a plurality of CFA images of different sampling positions among frames are obtained. As an example of such a plurality of CFA images,
On this occasion, an interpolation process based on the high resolution conversion is utilized so that an R image made up of only R signals, a G image made up of only G signals and a B image made up of only B signals can be generated as shown in
In addition, it is possible to perform the high resolution conversion so that a positional relationship among R, G and B pixels in the CFA image is maintained as it is. Thus, it is possible to generate the CFA image as the high-resolution image having an increased pixel number as shown in
As a process for realizing the high resolution conversion described above, a super-resolution process is proposed, in which a plurality of low-resolution images having position errors (displacements) from each other are used for estimating one high-resolution image. As one type of the super-resolution process, there is a one called a reconstruction type. The reconstruction type super-resolution process includes estimating a process in which the low-resolution image is generated from the high-resolution image, and then generating the high-resolution image by performing a process corresponding to a reverse process of the estimated process on the obtained low-resolution image.
In a method other than the reconstruction type method, a uniformly sampled high-resolution image is obtained by resampling based on a plurality of low-resolution images having a nonuniform sampling period among different frames, and then blur generated in the obtained high-resolution image is removed by using an image restoring process or the like. In this other method, pixel values at a sampling point in each of the plurality of low-resolution images are used for calculating the high resolution conversion, so that a weighting factor is set with respect to each of the pixel values in accordance with a distance between the resampling point in the high-resolution image and the corresponding sampling point in the low-resolution image. Then, a weighted average of pixel values in the plurality of low-resolution images that are used for calculating the high resolution conversion is calculated in accordance with the set weighting factor, so that a pixel value of the resampling point in the high-resolution image is obtained.
For instance, as shown in
In this case, the weighting factors w1 to w8 are calculated so as to have a low pass characteristic. For instance, values of the weighting factors w1 to w8 are varied exponentially in accordance with the distances L1 to L8. Although blur may occur due to the weighted average, the blur can be cancelled by the image restoring process performed after the weighted average.
In contrast, as to a repeating computational algorithm that is a calculation method for realizing the super-resolution process by the reconstruction type described above, an initial high-resolution image is estimated first from the plurality of low-resolution images in STEP 1. Next, in STEP 2, the original low-resolution images constructing the high-resolution image are estimated by reverse conversion based on the current high-resolution image. After that, in STEP 3, the original low-resolution images are compared with the estimated low-resolution images, and a new high-resolution image is estimated so that a difference between pixel values of the compared images at each pixel position becomes small based on a result of the comparison in STEP 4. The process from the STEP 2 to the STEP 4 is performed repeatedly so that the difference converges, and thus the high-resolution image becomes close to an ideal one.
As the super-resolution process method that can be realized by the repeating calculation (repeating computational algorithm), some methods are proposed, including an ML (Maximum-Likelihood) method, an MAP (Maximum A Posterior) method, a POCS (Projection Onto Convex Set) method, an IBP (Iterative Back Projection) method and the like. In the ML method, the square errors between the pixel values of the low-resolution images estimated from the high-resolution image and the pixel values of the real low-resolution images are taken as an evaluation function, and the high-resolution image that minimizes this evaluation function is generated. In other words, the super-resolution process of this ML method is a process based on maximum likelihood estimation.
In the MAP method, probability information of the high-resolution image is added to the square errors between the pixel values of the low-resolution images estimated from the high-resolution image and the pixel values of the actual low-resolution images, and the sum is taken as the evaluation function. Then, the high-resolution image is generated so as to minimize the evaluation function. In other words, the MAP method obtains an optimal high-resolution image by estimating the high-resolution image for maximizing occurrence probability in a post probability distribution based on prescient information. Note that the prescient information here is one with respect to the high-resolution image.
In the POCS method, simultaneous equations are made with respect to the pixel values of the high-resolution image and the pixel values of the low-resolution images. Then, the simultaneous equations are solved sequentially, so that the optimal values of the pixel values of the high-resolution image are obtained for generating the high-resolution image. In the IBP method, the errors between the low-resolution images estimated from the high-resolution image calculated temporarily and actually obtained low-resolution images are reversely projected onto the temporary high-resolution image in a repeated manner (corresponding to a repeated reverse projection method), so as to obtain the high-resolution image with high definition.
As a method for obtaining a high-resolution image by using such the super-resolution process, a conventional method 1 is also proposed as described below. In the conventional method 1, an image region of the high-resolution image is divided into predetermined small regions. Then, a mean value of pixel values in the low-resolution image included in the small region is calculated for each of the small regions, so that pixel values of the small region is represented by the mean value (in other words, the mean value is used as a representative value of the pixel values in the small region) for speeding up of the super-resolution process. More specifically, it is necessary for other conventional methods different from the conventional method 1 to perform the estimation operation with respect to every observed pixel included in the small region for estimating the high-resolution image from the low-resolution image. In contrast, the super-resolution process according to the conventional method 1 requires one time of estimation operation for each small region. Thus, an operation quantity of the calculation process necessary for estimating the high-resolution image can be reduced, so that the super-resolution process can be sped up.
In addition, a super-resolution process method (hereinafter referred to as a conventional method 2) is also proposed, in which the evaluation function corresponding to the conventional method 1 is utilized for speeding up the calculation of the evaluation function and a differentiation calculation with respect to the high-resolution image of the evaluation function. In the conventional method 2, four types of images are used in the evaluation function and a differential equation of the evaluation function so as to speed up the operation. The four types of images include a high-resolution image obtained by the super-resolution process, an average observation image obtained by approximating pixel positions with nonuniform intervals viewed upon alignment of the plurality of low-resolution images to be pixel positions of the high-resolution image, a PSF image made up of a “point spread function” that is used for multiplication with the high-resolution image, and a weight image having pixel values as pixel numbers obtained by the approximation for each pixel position when the average observation image is formed.
Furthermore, the MAP method is regarded to provide the most powerful process with highest precision in the super-resolution process of the reconstruction type that is adopted in the conventional methods 1 and 2. The MAP method will be described in detail as follows. When each process of the above-mentioned STEP 1 to STEP 4 is performed, the evaluation function is used for estimating the low-resolution images from the high-resolution image, so that a calculation process for calculating an update quantity of the high-resolution image is performed. This evaluation function will be described mainly for the description of the MAP method.
In the super-resolution process based on the MAP method, a plurality of low-resolution images obtained by actual shooting or the like (hereinafter may be referred to as an actual low-resolution image in particular) are used for estimating one high-resolution image. All the pixel values of the high-resolution image to be estimated expressed by a vector are represented by “x”, all the pixel values of the plurality of actual low-resolution images that is used for estimating one high-resolution image and are expressed by a vector are represented by “y”.
Therefore, if one high-resolution image is made up of 400 pixels for instance, the vector x becomes a 400-dimensional vector, and values of 400 elements constituting the vector x are indicated by 400 pixel values forming the high-resolution image. In addition, if four actual low-resolution images are used for estimating one high-resolution image and each of the actual low-resolution images is made up of 100 pixels for instance, the vector y becomes a 400-dimensional vector and values of 400 elements constituting the vector y are indicated by total 400 pixel values of four actual low-resolution images. The vector x is formed by listing pixel values of the estimated high-resolution image, so “x” can be also referred to as pixel values (a pixel value group) of the high-resolution image. Similarly, the vector y is formed by listing pixel values of the plurality of actual low-resolution images, so “y” can be also referred to as pixel values (a pixel value group) of the actual low-resolution image.
When the reverse conversion in the above-mentioned STEP 2 is performed, a plurality of processes including the first to the third processes below are performed in turn. The first process is an appropriate low pass filter process performed on the high-resolution image, the second process is a process for performing rotation and parallel displacement corresponding to a position error between the low-resolution images, and the third process is a thinning process from the pixel number of the high-resolution image to the pixel number of the low-resolution image. Note that the low-resolution image estimated by the reverse conversion performed on the high-resolution image is also referred to as an estimated low-resolution image in particular.
Characteristic of the process that is a combination of the above-mentioned first to third processes is expressed by a matrix A. More specifically, a relationship between the pixel value y of the obtained actual low-resolution image and the pixel value x of the estimated high-resolution image is expressed by the matrix equation (A2) below. Note that NOIZE in the equation (A2) indicates noise generated when the low-resolution image is obtained.
y=Ax+NOIZE (A2)
In the MAP method, an evaluation function E[x] expressed by the equation (A3) below is defined based on a square error between the estimated low-resolution image expressed by Ax and the actual low-resolution image expressed by y, and the high-resolution image (i.e., x) such that the evaluation function E[x] is minimized is calculated.
E[x]=∥y−Ax∥
2
+f(x) (A3)
A square error between a pixel value of the estimated low-resolution image and a pixel value of the actual low-resolution image is determined for each pixel position, and a total sum of the square errors determined for individual pixel positions corresponds to the first term ∥y−Ax∥2 in the right-hand side of the equation (A3). The second term f(x) in the right-hand side of the equation (A3) is defined by prescient information based on a prior probability model and is referred to as a normalization term in general. This prescient information is information with respect to the high-resolution image to be estimated. The term f(x) is set based on prior knowledge that a high-resolution image has little high frequency components, for instance. More specifically, for instance, f(x) is expressed by the equation (A4) below, using a matrix P formed by a high-pass filter such as a Laplacian filter and a parameter λ indicating strength of weight of the normalization term with respect to the evaluation function E[x].
f(X)=λ∥Px∥2 (A4)
When the normalization term according to the equation (A4) is substituted into the equation (A3), the evaluation function E[x] is expressed as shown in the equation (A5) below. The equation (A5) includes unknown quantities corresponding to the pixel number of the high-resolution image. Therefore, for instance, if the image to be a target of the super-resolution process has a normal size like 1280×960 pixels, it is difficult to solve the equation because of the large pixel number. Therefore, a repeating computational such as the steepest descent method or the conjugate gradient method is used in general.
E[x]=∥y−Ax∥
2
+λPx∥
2 (A5)
In contrast, the pixel value x of the high-resolution image can be calculated directly from the equation (A5) by using the property that a derivative of the evaluation function E[x] becomes zero when the evaluation function E[x] is minimized. More specifically, when the derivative ∂E[x]/∂x of the high-resolution image with respect to the pixel value x becomes zero, i.e., when the equation (A6) below holds, the equation (A7) also holds. Therefore, the pixel value x of the high-resolution image can be calculated in accordance with the equation (A7). Here, the matrix with the superscript T indicates a transposed matrix of an original matrix. Therefore, for instance, AT indicates a transposed matrix of the matrix A (the same is true on the matrix P and the like).
∂E[x]/∂x=−AT(y−Ax)+λPTPx=0 (A6)
x=(ATA+λPTP)−1ATy (A7)
When the pixel value y of the actual low-resolution image is multiplied by the (ATA+λPTP)−1AT of the equation (A7), the pixel value x of the high-resolution image can be obtained. Actually, a filter having filter factors as elements of the matrix expressed by (ATA+λPTP)−1AT is formed, and this filter is made to act on the pixel value y so that the pixel value x is obtained. However, the number of filter factors of this filter depends on the number of actual low-resolution images to be used for the super-resolution process and the pixel number thereof, and the pixel number of the high-resolution image.
Therefore, as described above, if the image to be a target of the super-resolution process has a normal size like 1280×960 pixels, the number of filter factors for calculating the pixel value of the high-resolution image becomes too large. As a result, a circuit scale of an arithmetic circuit constituting the filter for performing the super-resolution process becomes large. In addition, a quantity of the calculation also becomes a massive amount so that the calculation cannot be performed. In view of these circumstances, the conventional super-resolution process includes updating the high-resolution image repeatedly based on a gradient quantity obtained by using a gradient method or the like so as to obtain a final high-resolution image. However, it is necessary to increase the number of repeating the update process in order to obtain a high-resolution image with high reproducibility, resulting in a lot of time necessary for the calculation. In addition, since there is an upper limit of period of time for taking one frame of picture of a moving image or the like, there is also a limit of the number of repeating the above-mentioned process. As a result, it is difficult to obtain a high-resolution image with high reproducibility.
An image processing apparatus according to an embodiment of the present invention includes a high resolution processing portion for generating a high-resolution image from a first low-resolution image to be a datum frame and M (M is an integer of one or larger) second low-resolution images, the high-resolution image having a higher resolution than the low-resolution images, and a region cutting out portion for setting a first target region in an image region of the first low-resolution image and for setting a second target region in an image region of the second low-resolution image. The high resolution processing portion calculates a pixel value of a region corresponding to the first target region in an image region of the high-resolution image based on pixel values of the first and the second target regions set by the region cutting out portion. The region cutting out portion scans a position of the first target region to be set in the first low-resolution image and sets the second target region at a position corresponding to the position of the first target region after the scan every time when the high resolution processing portion calculates the pixel value of the high-resolution image.
More specifically, for instance, if the integer M is one, the image processing apparatus may further include a motion amount calculation portion for calculating an amount of motion between the first low-resolution image and the second low-resolution image, and the region cutting out portion may set the position of the second target region based on the amount of motion.
More specifically, for instance, if the integer M is two or larger, the image processing apparatus may further include a motion amount calculation portion for calculating an amount of motion between the first low-resolution image and the second low-resolution image for each of the second low-resolution images, and the region cutting out portion may set the position of the second target region based on the amount of motion for each of the second low-resolution images.
In addition, for instance, the high resolution processing portion may be made up of a filter for calculating the pixel value of the high-resolution image from pixel values of the first and the second target regions, and a filter factor of the filter may be updated based on a positional relationship between the first and the second target regions set by the region cutting out portion every time when the position of the first target region is scanned.
In addition, for instance, the positional relationship may be classified into a plurality of types of positional relationships, the image processing apparatus may further include a filter factor storage portion for storing a filter factor of the filter for each of the types of the positional relationships, and a filter factor corresponding to the positional relationship between the first and the second target regions set by the region cutting out portion may be read out from the filter factor storage portion, so that the read-out filter factor is set as the filter factor of the filter constituting the high resolution processing portion.
Further, when the high resolution processing portion performs the high resolution processing on pixels in the first and the second target regions, one or more pixels positioned at the middle of the first target region may be handled as target pixels so that the pixel values of the pixels on the high-resolution image corresponding to the target pixels can be calculated. On this occasion, as the filter factor, only the filter factor of the line corresponding to the pixel obtained in the high-resolution image may be used for the calculation for performing the high resolution processing.
In addition, the filter may be made up of a matrix (ATA+λPTP)−1AT obtained by the equation (A7) in order to make the derivative of the evaluation function for the super-resolution process to be zero. In addition, a filter factor of the filter constituting the high resolution processing portion may be made up of a factor obtained when the calculation is performed two times by the super-resolution process of the reconstruction type.
An electronic appliance according to an embodiment of the present invention has the above-mentioned image processing apparatus and obtains (M+1) images as an external input or by exposure so that an image signal of the (M+1) images is supplied to the image processing apparatus. The (M+1) images include the first low-resolution image and the M second low-resolution images.
An image processing method according to an embodiment of the present invention includes a high resolution processing step for generating a high-resolution image from a first low-resolution image to be a datum frame and M (M is an integer of one or larger) second low-resolution images, the high-resolution image having a higher resolution than the low-resolution images, and a region cutting out step for setting a first target region in an image region of the first low-resolution image and for setting a second target region in an image region of the second low-resolution image. The high resolution processing step includes calculating a pixel value of a region corresponding to the first target region in an image region of the high-resolution image based on pixel values of the first and the second target regions set by the region cutting out step. The region cutting out step includes scanning a position of the first target region to be set in the first low-resolution image and setting the second target region at a position corresponding to the position of the first target region after the scan every time when the pixel value of the high-resolution image is calculated in the high resolution processing step.
Hereinafter, an embodiment of the present invention will be described with reference to the attached drawings. In the drawings to be referred to, the same part is denoted by the same reference numeral or symbol so that overlapping description of the same part will be omitted as a rule. In the description below, an image sensing apparatus such as a digital camera or a digital video is exemplified mainly as an electronic appliance equipped with an image processing apparatus (corresponding to an image processing portion that will be described later) performing the image processing according to the present invention. However, as described later, it is possible to form a display device performing the digital image processing with a similar image processing apparatus (such as a liquid crystal display or a plasma television set). Note that the definitions of symbols (such as the symbol A representing the matrix in the equation (A2)) described in “BACKGROUND OF THE INVENTION” are also applied to the description of the embodiment.
[Structure of Image Sensing Apparatus]
First, an internal structure of the image sensing apparatus according to the embodiment of the present invention will be described with reference to the drawings.
The image sensing apparatus shown in
When the operating portion 15 instructs to perform an exposure operation for taking a moving image in this image sensing apparatus, an analog image signal is obtained by a photoelectric conversion action of the image sensor 1 and is delivered to the AFE 2. On this occasion, the timing control signal is supplied from the timing generator 12 to the image sensor 1, so that horizontal scanning and vertical scanning are performed in the image sensor 1, so that data of individual pixels of the image sensor 1 are output as the image signal. In the AFE 2, the analog image signal is converted into the digital image signal. When the digital image signal is supplied to the image processing portion 4, various types of image processing is performed on the image signal. The image processing includes a signal conversion process for generating a luminance signal and a color difference signal.
When the operation for obtaining a high resolution image is performed with respect to the operating portion 15, the image processing portion 4 performs the super-resolution process based on the image signal of a plurality of frames supplied from the image sensor 1. The image processing portion 4 generates the luminance signal and the color difference signal based on the image signal obtained by performing the super-resolution process. Note that an amount of motion between different frames is calculated when the super-resolution process is performed, and alignment between frames is performed corresponding to the amount of motion (that will be described later in detail).
The image sensor 1 can expose successively at a predetermined frame period under control of the timing generator 12, so that an image sequence arranged in time series is obtained by the successive exposure. Each image constituting the image sequence is referred to as a frame image or a frame simply.
The image signal after the image processing (that may include the super-resolution process) performed by the image processing portion 4 is supplied to the compression processing portion 6. On this occasion, the analog audio signal obtained when the microphone 3 receives sounds is converted into the digital audio signal by the audio processing portion 5, which is supplied to the compression processing portion 6. The compression processing portion 6 compresses and codes the digital image signal and the audio signal from the image processing portion 4 and the audio processing portion 5 in accordance with the MPEG compression method, and it makes the external memory 20 store them via the driver portion 7. In addition, the compressed signal recorded in the external memory 20 is read out by the driver portion 7 and supplied to the expansion processing portion 8, which performs the expansion process so that the image signal based on the compressed signal is obtained. This image signal is supplied to the display portion 9, which displays the subject image obtained by the image sensor 1.
Although the operation when the moving image is obtained is described above, an operation when an exposure action for obtaining a still image is instructed with respect to the operating portion 15 is the same as that when the moving image is obtained. However, when obtaining a still image is instructed, the obtaining process of the audio signal by the microphone 3 is not performed, and the compressed signal including only the image signal is recorded in the external memory 20. In addition, not only the obtained still image is recorded but also the currently shot image is displayed. The compressed signal of the currently shot image is supplied to the display portion 9 via the expansion processing portion 8, so that the user can confirm the image obtained by the image sensor 1 at the present time. Note that it is possible to supply the image signal generated by the image processing portion 4 to the display portion 9 as it is without the compression coding process and the expansion process.
The image sensor 1, the AFE 2, the image processing portion 4, the audio processing portion 5, the compression processing portion 6 and the expansion processing portion 8 perform operations in accordance with the timing control signal from the timing generator 12 in synchronization with the exposure operation for each frame performed by the image sensor 1. Furthermore, when a still image is to be obtained, the timing generator 12 supplies the timing control signal to the image sensor 1, the AFE 2, the image processing portion 4 and the compression processing portion 6 so that their action timings are synchronized.
In addition, when reproduction of the moving image stored in the external memory 20 is instructed via the operating portion 15, the compressed signal corresponding to the moving image stored in the external memory 20 is read out by the driver portion 7 and is supplied to the expansion processing portion 8. Then, expansion processing portion 8 expands and decodes the read-out compressed signal based on the MPEG compression method, so that the image signal and the audio signal are obtained. The obtained image signal is supplied to the display portion 9 for displaying the image, and the obtained audio signal is supplied to the speaker portion 11 via the audio output circuit portion 10 for reproducing and outputting sounds. In this way, the moving image based on the compressed signal recorded in the external memory 20 is reproduced together with sounds. Furthermore, if the compressed signal includes only the image signal, the image is only reproduced and displayed on the display portion 9.
As described above, the image processing portion 4 is formed to be capable of performing the super-resolution process. The super-resolution process enables to generate one high-resolution image from a plurality of low-resolution images. The image signal of the high-resolution image can be recorded in the external memory 20 via the compression processing portion 6. The resolution of the high-resolution image is higher than that of the low-resolution image, and the pixel numbers in the horizontal direction and in the vertical direction of the high-resolution image are larger than those of the low-resolution image. For instance, when exposure of a still image is instructed, a plurality of frames (frame images) are obtain as the plurality of low-resolution images, and the super-resolution process is performed on them so that the high-resolution image is generated. Alternatively, for instance, when a moving image is shot, the super-resolution process is performed on a plurality of frames (frame images) as the obtained plurality of low-resolution images.
In the image sensing apparatus according to this embodiment, a plurality of low-resolution images obtained by using the image sensor 1 is used for estimating one high-resolution image. The low-resolution image obtained by using the image sensor 1 is referred to as the actual low-resolution image.
The high-resolution image is generate with reference to one of the plurality of actual low-resolution images. The actual low-resolution image to be the reference is referred to as a datum frame. Among the plurality of actual low-resolution images for generating the high-resolution image, one that is different from the datum frame is referred to as a consulted frame.
Furthermore, in the following description, abbreviations of the low-resolution image and the like may be used by adding reference symbols. For instance, in the following description, if “Fa” is assigned as a symbol indicating a certain actual low-resolution image, the actual low-resolution image Fa may be referred to as an “image Fa” simply, which represents the same matter.
[Basic Action of the Super-Resolution Process]
A basic action of the super-resolution process performed by the image processing portion 4 shown in
When the image processing portion 4 performs the super-resolution process, an amount of motion between the actual low-resolution image as a consulted frame and the actual low-resolution image as the datum frame is calculated first. If a plurality of consulted frames exist, the amount of motion between each of the consulted frames and the datum frame is calculated. The amount of motion between the two images indicates a quantity of a position error (displacement) between the two images. The amount of motion is a two-dimensional quantity and is also called a motion vector or a displacement vector in general. The amount of motion includes an amount of motion in a translational direction and an amount of motion in a rotational direction. In other words, the amount of motion is divided into a translational component and a rotational component. The amount of motion in the translational direction can be further divided into a horizontal component and a vertical component. The image processing portion 4 is formed to be capable of calculating the amounts of motion in the translational and the rotational directions. In the image processing portion 4, the alignment between the datum frame and the consulted frame is performed based on the amount of motion between actual low-resolution images that are used for the super-resolution process. The alignment is realized by translational and/or rotational motion of one of the two images such that the position error corresponding to the amount of motion can be cancelled. Note that the term “alignment” has the same meaning as “position error correction” that will be described later.
For instance, if the amount of motion between two actual low-resolution images Fa and Fb includes the translational component and the rotational component, a positional relationship between the actual low-resolution images Fa and Fb after the alignment is the positional relationship as shown in
After the alignment is performed between the plurality of actual low-resolution images to be a target of the super-resolution process, a region in which the super-resolution process is performed (hereinafter referred to as a super-resolution target region) is set with respect to each of the actual low-resolution images after the alignment. On this occasion, pixel to be a target of the super-resolution (hereinafter referred to as a super-resolution target pixel) is selected from pixels forming the actual low-resolution image as the datum frame, and the super-resolution target region including the super-resolution target pixel and a plurality of pixels surrounding the super-resolution target pixel is set with respect to the actual low-resolution image as the datum frame.
The super-resolution target pixel is made up of one or more plurality of pixels. It is supposed that the super-resolution target pixel is made up of Tx×Ty pixels. When the super-resolution process on a certain super-resolution target pixel is completed, the position of the super-resolution target pixel is shifted by Tx pixels in the horizontal direction, and the super-resolution target region is set again for the super-resolution target pixel after shifting the position (Tx and Ty are natural numbers). Such scanning of the position of the super-resolution target pixel in the horizontal direction is performed sequentially, so that the super-resolution process is performed for one line of pixels set as the super-resolution target pixel. Then, the line of pixels is shifted in the vertical direction by Ty pixels so that pixels on the line are selected as the super-resolution target pixel for setting the super-resolution target region corresponding to the selected contents. This scan of changing the position of the super-resolution target pixel and the super-resolution target region sequentially is referred to as a “raster scan”.
More specifically, in the raster scan, the positions of the super-resolution target pixel and the super-resolution target region are shifted in the horizontal direction in turn while selection of the super-resolution target pixel and the super-resolution target region is performed for one line. After that, the pixel and the region to be selected as the super-resolution target pixel and the super-resolution target region are shifted in the vertical direction by Ty pixels, and the positions of the super-resolution target pixel and the super-resolution target region are shifted in the horizontal direction in turn while selection of the super-resolution target pixel and the super-resolution target region is performed again. “Tx×Ty pixels” means a group of total (Tx×Yy) pixels in which Tx pixels are arranged in the horizontal direction and Ty pixels are arranged in the vertical direction. Expressions “1×1 pixel” and “3×3 pixels” that will be described later are also interpreted in the same manner.
For instance, as shown in
The actual low-resolution images Fa and Fb after the alignment in the positional relationship as shown in
As shown in
In other words, along with the scan of the region Rta in the image Fa, the region Rtb is also scanned in the image Fb. However, since the images Fa and Fb have the positional relationship as shown in
This situation of the scan of the region Rtb will be described with reference to
The rectangular region Rta2 of the broken line indicates the super-resolution target region Rta that is set next to the region Rta1 by the raster scan in the horizontal direction, and the rectangular region Rta3 of the broken line indicates the super-resolution target region Rta that is set next to the region Rta2 by the raster scan in the horizontal direction. The positions (center positions) of the regions Rta2 and Rta3 are shifted with respect to the position (center position) of the region Rta1 in the horizontal direction by one pixel and by two pixels, respectively.
The rectangular region Rtb2 of the broken line is a region including 3×3 pixels shifted from the region Rtb1 by one pixel in the horizontal direction of the image Fb. The rectangular region Rtb2′ of the solid line is a region including 3×3 pixels shifted from the region Rtb2 by one pixel in the vertical direction of the image Fb. The rectangular region Rtb3′ of the dashed dotted line is a region including 3×3 pixels shifted from the region Rtb2′ by one pixel in the horizontal direction of the image Fb. Note that the individual regions are shown at positions shifted upward, downward, rightward or leftward a little from their original positions so that different regions can be distinguished from each other in
It is supposed that an influence of the rotational component of the amount of motion between the images Fa and Fb can be ignored. Then, if the super-resolution target region Rta in the image Fa is the region Rta2, the super-resolution target region Rtb in the image Fb is to be the region Rtb2. However, the amount of motion includes the rotational component in this example. Therefore, if the region Rtb is scanned only in the horizontal direction while the region Rta is scanned in the horizontal direction, the amount of motion (position error amount) between the regions Rta and Rtb will increase every time when the regions Rta and Rtb are scanned.
In order to satisfy the requirement that the overlapping area between the regions Rta and Rtb should be as large as possible, the image processing portion 4 compares a size of the overlapping area between the regions Rta2 and Rtb2 with a size of the overlapping area between the regions Rta2 and Rtb2′ when the super-resolution target region Rtb corresponding to the region Rta2 is set in the image Fb. Then, if the former is larger than latter, the image processing portion 4 sets the region Rtb2 as the super-resolution target region Rtb corresponding to the region Rta2 in the image Fb. On the contrary, if the latter is larger than the former, the image processing portion 4 sets the region Rtb2′ as the super-resolution target region Rtb corresponding to the region Rta2 in the image Fb. If the region Rtb2′ is set as the super-resolution target region Rtb, the super-resolution target region Rtb to be set in the image Fb corresponding to the super-resolution target region Rta3 becomes the region Rtb3′.
In this way, the super-resolution target region is set with respect to the actual low-resolution image as the datum frame by the raster scan corresponding to the pixel number of the super-resolution target pixels. In contrast, the super-resolution target region with respect to the actual low-resolution image as a consulted frame is set based on the amount of motion between itself and the datum frame so that a size of the overlapping area between the super-resolution target regions set with respect to the datum frame and the consulted frame is as large as possible.
When the super-resolution target region is set with respect to each of the plurality of actual low-resolution images to be used for the super-resolution process, calculation based on pixel values of pixels in the set super-resolution target region is performed, so that the pixel value at the pixel position on the high-resolution image corresponding to the position of the super-resolution target pixel in the datum frame is calculated. For instance, as shown in
The position of the region Gh on the high-resolution image to be generated is defined with respect to the position of the super-resolution target pixel Gt on the actual low-resolution image as the datum frame. For instance, the center position of the region Gh is made to be identical to the center position of the pixel Gt on the image coordinate system in which arbitrary images including the actual low-resolution image and the high-resolution image are commonly defined. Therefore, every time when the super-resolution target region is moved by the raster scan on the datum frame, the position of the region Gh also moves on the high-resolution image, so that the pixel values on the high-resolution image with respect to the individual pixel positions are calculated sequentially.
If the enlargement ratio of the resolution of the high-resolution image with respect to the low-resolution image in the vertical direction and in the horizontal direction are V and H times respectively, the pixel values on the high-resolution image are calculated for pixels of (V×H) times the number of super-resolution target pixels set in the datum frame. For instance, if V=3 and H=4, the pixel numbers in the vertical and in the horizontal directions of the high-resolution image are respectively three times and four times the pixel numbers in the vertical and in the horizontal directions of the actual low-resolution image. Further, if the number of the super-resolution target pixel set in the datum frame is one, (V×H), i.e., (3×4) pixel values of high-resolution image are calculated with respect to one super-resolution target pixel. Then, all the pixels constituting the actual low-resolution image as the datum frame are set as the super-resolution target pixel one by one, so that the pixel values of all pixels of the high-resolution image to be generated can be obtained.
[Basic Concept of Super-Resolution Process]
Next, basic concept of the super-resolution process according to the embodiment will be described. As described above in “BACKGROUND OF THE INVENTION”, the relational expression “y=Ax+NOIZE” of the above equation (A2) holds between the vector x in which all the pixel values of the estimated high-resolution image are made to be a vector and the vector y in which all the pixel values of the plurality of actual low-resolution images to be used for estimating the high-resolution image. Then, the evaluation function E[x] as expressed by the above equation (A5) is defined based on the equation (A2), and the pixel value x of the high-resolution image can also be determined so that the evaluation function E[x] is minimized. Actually, for instance, the pixel value x such that the derivative ∂E[x]/∂x becomes zero is determined so that the high-resolution image can be estimated.
In this way, the super-resolution process can be performed by performing the calculation so that the derivative ∂E[x]/∂x of the evaluation function E[x] becomes zero. In the super-resolution process using the repeating computational algorithm, the original low-resolution images (i.e., actual low-resolution images) are estimated from the high-resolution image that is once estimated. Then, the derivative ∂E[x]/∂x is determined based on a difference between the low-resolution images obtained by the estimation and the actual low-resolution images, and the high-resolution image is reconstructed so that a value of the derivative ∂E[x]/∂x becomes close to zero. The low-resolution images obtained by estimating the original low-resolution images from the high-resolution image that is once estimated are each also called an estimated low-resolution image in particular. Note that the super-resolution process of the reconstruction type can also be realized by using the repeating computational algorithm. Therefore, it is also possible to adopt a reconstruction type super-resolution process using the repeating computational algorithm as the super-resolution process using the repeating computational algorithm.
A general outline of the super-resolution process using the repeating computational algorithm will be described in more detail in relationship with the process from the STEP 31 to the STEP 34 with reference to
In each of
It is supposed that at the time point T1 luminance of the subject is sampled at the sampling points S1, (S+ΔS) and (S1+2ΔS) (see
It is supposed that the time point T1 and the time point T2 are different from each other and that there is a deviation between positions at the sampling points S1 and S2 due to hand vibration or the like. Therefore, the actual low-resolution image Fb shown in
The actual low-resolution images Fa and Fb shown in
It is supposed that the actual low-resolution image Fa is the datum frame (In this case, actual low-resolution image Fb is the consulted frame). Then, the pixel values of the pixels P1, P2 and P3 in the high-resolution image Fx1 are pixel values pa1, pa2 and pa3 in the actual low-resolution image Fa. The pixel value of the pixel P4 is, for instance, a pixel value of a pixel closest to the pixel position of the pixel P4 among the pixels (P1, P2 and P3) in the actual low-resolution images Fa and Fb after the position error correction. The pixel position of a certain noted pixel indicates the center position of the noted pixel. It is supposed that the pixel position of the pixel P1 in the actual low-resolution image Fb after the position error correction is closest to the pixel position of the pixel P4. Then, the pixel value of the pixel P4 is to be pb1. The pixel value of the pixel P5 is determined in the same manner, and it is supposed that the pixel value of the pixel P5 is to be pb2. In this way, the high-resolution image in which the pixel values of the pixels P1 to P5 are set to be pa1, pa2, pa3, pb1 and pb2 respectively can be estimated as the high-resolution image Fx1.
After that, a conversion equation having parameters of a down sampling quantity, a blur quantity due to a low resolution process and a position error amount (corresponding to the amount of motion) is exerted on the high-resolution image Fx1, so that estimated low-resolution images Fa1 and Fb1 as estimated images of the actual low-resolution images Fa and Fb are generated as shown in
In the first STEP 32, the pixel values at the sampling points S1, (S1+ΔS) and (S1+2ΔS) are estimated based on the high-resolution image Fx1, and the estimated low-resolution image Fa1 having the estimated pixel values pall, pa21 and pa31 as the pixel values of the pixels P1, P2 and P3 is generated. Similarly, the pixel values at the sampling points S2, (S2+ΔS) and (S2+2ΔS) are estimated based on the high-resolution image Fx1, and the estimated low-resolution image Fb1 having the estimated pixel values pb11, pb21 and pb31 as the pixel values of the pixels P1, P2 and P3 is generated.
Then, as shown in
The difference values (pa11−pa1), (pa21−pa2) and (pa31−pa3) of the pixel values of the pixels P1, P2 and P3 between the estimated low-resolution image Fa1 and the actual low-resolution image Fa are pixel values of the difference image ΔFa1. In addition, the difference values (pb11−pb1), (pb21−pb2) and (pb31−pb3) of the pixel values of the pixels P1, P2 and P3 between the estimated low-resolution image Fb1 and the actual low-resolution image Fb are pixel values of the difference image ΔFb1.
Then, the pixel values of the difference images ΔFa1 and ΔFb1 are combined, and difference values of the pixels P1 to P5 are calculated, so that the difference image ΔFx1 with respect to the high-resolution image Fx1 is generated. When the pixel values of the difference images ΔFa1 and ΔFb1 are combined so as to generate the difference image ΔFx1, the square error is used as the evaluation function in the MAP (Maximum A Posterior) method and the ML (Maximum-Likelihood) method (However, a normalization term is added to the evaluation function in the MAP method). More specifically, the value of the evaluation function in the MAP method or the ML method becomes a sum of square values of the pixel values of the difference images ΔFa1 and ΔFb1 between frames. Therefore, the gradient as the derivative of the evaluation function corresponds to a value proportional to two times the pixel values of the difference images ΔFa1 and ΔFb1. Therefore, the difference image ΔFx1 with respect to the high-resolution image Fx1 is calculated by using the value proportional to two times the pixel values of the difference images ΔFa1 and ΔFb1.
After the difference image ΔFx1 is generated, as shown in
When the process from the STEP 32 to the STEP 34 is performed repeatedly, the pixel value of the difference image ΔFxn obtained in the STEP 33 decreases so that the pixel value of the high-resolution image Fxn converges to the pixel value matching substantially to the luminance distribution of the subject shown in
When the reconstruction type super-resolution process is performed utilizing the repeating calculation described above, the above-mentioned process from the STEP 31 to the STEP 34 is performed for each of the super-resolution target regions set in the actual low-resolution image. In this case, the process from the STEP 32 to the STEP 34 is performed repeatedly for each of the super-resolution target regions set in the actual low-resolution image, so that pixel value of the pixel in the high-resolution image corresponding to the super-resolution target pixel is obtained. Then, every pixel of the actual low-resolution image as the datum frame is handled sequentially as the super-resolution target pixel so that the process from the STEP 31 to the STEP 34 is performed with respect to every pixel of the actual low-resolution image as the datum frame. Thus, pixel values of all the pixels in the high-resolution image can be obtained.
On the other hand, the pixel value x of the high-resolution image can be obtained also by multiplying the pixel value y of the actual low-resolution image by the (ATA+λPTP)−1AT in the above equation (A7) as described above in “BACKGROUND OF THE INVENTION”, so that the super-resolution process can be realized. More specifically, an FIR (Finite Impulse Response) filter having elements of the matrix expressed by the (ATA+λPTP)−1AT as its filter factors is formed, and the pixel value of the super-resolution target pixel in the low-resolution image is supplied to the FIR filter, so that the pixel value of the high-resolution image can be obtained.
The basic action of the process of obtaining the pixel value of the high-resolution image by using the FIR filter will be described. In the description of the basic action of this process, it is supposed that the super-resolution target region set in the actual low-resolution image is a 3×3 pixel region (see
Under this assumption, although being different from the description in “BACKGROUND OF THE INVENTION”, it is considered that the vector y is a vector of the pixel values of the pixels in four super-resolution target regions set with respect to the four actual low-resolution images, and that the vector x to be obtained from the vector y is a vector of pixel values of the pixels in the 6×6 pixel region (corresponding to the region Rh) on the high-resolution image. Therefore, since 4 (frames)×3 (pixels)×3 (pixels)=36, the vector y is a 36-dimensional vector having 36 pixel values as its vector elements. In addition, since 6 (pixels)×6 (pixels)=36, the vector x is also a 36-dimensional vector having 36 pixel values as its vector elements.
Then, the matrix expressed by the (ATA+λPTP)−1AT becomes a matrix having 36×36 elements, so the filter size of the FIR filter having the matrix elements as its filter factors is also 36×36. In other words, the FIR filter is made up of 36×36 matrix elements. The pixel values in the region Xa of the high-resolution image (corresponding to the region Rh in
When the vector having pixel values of pixels x[1, 1] to x[6, 6] (i.e., total 36 pixel values) in the region Xa as its elements is expressed by the vector x, the pixel values of the pixels x[3, 3], x[3, 4], x[4, 3] and x[4, 4] become respectively the 15th, the 16th, the 21st and the 22nd elements constituting the vector x. Therefore, in order to obtain the pixel values of the pixels x[3, 3], x[3, 4], x[4, 3] and x[4, 4], a sum of products should be calculated between the filter factors (elements) of the 15th line, the 16th line, the 21st line and the 22nd line in the FIR filter expressed by (ATA+2PTP)−1AT and the pixel values of the pixels in the super-resolution target regions in four actual low-resolution images (i.e., the elements of the vector y). For instance, a sum of products is calculated between the total 36 filter factors belonging to the 15th line of the FIR filter and 36 elements forming the vector y, so that the pixel value of the pixel x[3, 3] is obtained.
In this way, in order to obtain the high-resolution image by the super-resolution process using the FIR filter expressed by the (ATA+λPTP)−1AT, the filter factor on a specific line of the FIR filter should be calculated for each of the super-resolution target regions. The specific line means a line for calculating the pixel value of the pixel on the high-resolution image corresponding to the super-resolution target pixel among lines constituting the FIR filter. In the example described above, it corresponds to the 15th line, 16th line, 21st line and 22nd line. Since positions of the super-resolution target pixel and the super-resolution target region are changed sequentially by the raster scan, the filter factor on the specific line in the FIR filter is calculated every time when they are changed. Then, the sum of products between the filter factors on the specific line and the pixel values of the pixels in the super-resolution target regions in the four actual low-resolution images is determined, so as to determine the pixel value of the pixel on the high-resolution image corresponding to the super-resolution target pixel.
Although the super-resolution process based on the MAP (Maximum A Posterior) method is described above, it is possible to utilize another super-resolution process based on the ML (Maximum-Likelihood) method, the POCS (Projection Onto Convex Set) method or the IBP (Iterative Back Projection) method. For instance, while the super-resolution process using the repeating calculation is performed, the FIR filter for setting the repeating times to be two (which will be described later). In this case, it is possible to adopt the ML method without the normalization term (constrained term) by a prior probability model in order to determine the filter factor of the FIR filter easily, for instance.
Methods for realizing concretely the super-resolution process as described above will be described as first to third examples. The items described above are applied to the first to the third examples appropriately. In the following description, it is supposed that the FIR filter for performing the super-resolution process is disposed in the image processing portion 4 and that the filter factors of the FIR filter are stored. The image processing portion 4 shown in
A structure of the image processing portion 4 according to the first example of the present invention will be described with reference to
The image processing portion 4 shown in
The filter process in the filter portion 47 is realized by the FIR filter that is provided to the filter portion 47. In the first example, the FIR filter includes elements of the matrix expressed by (ATA+λPTP)−1AT as its filter factors.
When the high-resolution image is generated based on the actual low-resolution images of the plurality of frames from the AFE 2 in the image processing portion 4 having the structure described above, each of the blocks disposed in the image processing portion 4 works, so that the super-resolution process is performed for each of the super-resolution target regions as described above. In order to realize this operation, the pixel values of the actual low-resolution images of a plurality of frames are read out from the frame memory 41 for each of the super-resolution target regions, which will be described later in detail.
In this case, the filter factors corresponding to the amount of motion between the super-resolution target regions of different frames are given to the filter portion 47 by the filter factor storage portion 46. The filter process based on the given filter factors is performed on the pixel values read out from the frame memory 41, so that the super-resolution process is performed for each of the super-resolution target regions. Then, the pixel values obtained by the super-resolution process is performed for each of the super-resolution target regions are supplied to the frame memory 48 and are stored in the same as the pixel values of pixels in the high-resolution image.
If the operation for requesting the high resolution processing on the image is not given to the operating portion 15, the image signal converted into the digital signal in the AFE 2 is supplied to the signal processing portion 49 one by one frame. Then, the signal processing portion 49 generates the luminance signal and the color difference signal from the supplied image signal. Then, the obtained luminance signal and the color difference signal are supplied to the compression processing portion 6 one by one flame so that the compression processing portion 6 performs a compressing and coding process of the signals.
[Detection of Amount of Motion]
As to the image processing portion 4 having the structure shown in
The first, the second . . . the (F−1)th and the F-th actual low-resolution images arranged in time series are obtained sequentially, and the motion amount calculation portion 42 first handles the one of the two actual low-resolution images that are adjacent on the time base as a reference image and the other as a non-reference image. Then, the motion amount calculation portion 42 detects the amount of motion between the two actual low-resolution images that are adjacent on the time base (i.e., detects the amount of motion between the neighboring frames). This detection is performed sequentially with respect to between the first and the second actual low-resolution images, between the second and the third actual low-resolution images, . . . , and between the (F−1)th and the F-th actual low-resolution images. Next, a sum of the detected amounts of motion is determined so that the amount of motion between two actual low-resolution images that are not adjacent on the time base is determined. Thus, the amount of motion between the datum frame and each of the consulted frames can be detected. For instance, if the datum frame is the first actual low-resolution image and if the amount of motion between the actual low-resolution image as the datum frame and the third actual low-resolution image as the consulted frame is to be determined, a sum of the amount of motion between the first and the second actual low-resolution images and the amount of motion between the second and the third actual low-resolution image should be determined.
Note that it is possible to handle the actual low-resolution image that is the datum frame as the reference image and to handle any actual low-resolution image that is the consulted frame as the non-reference image, and then to determine the amount of motion between the reference image and the non-reference image so that the amount of motion between the datum frame and each of the consulted frames can be determined directly.
The motion amount calculation portion 42 detects the amount of motion between the reference image and the non-reference image forming the two actual low-resolution images that are adjacent on the time base, and objects of the detection includes the amount of motion in the translational direction and the amount of motion in the rotational direction. It is possible to adopt any known method as the method of detecting the amount of motions in the translational and the rotational directions.
The amount of motion to be detected has a resolution of a so-called sub pixel that is higher than the resolution of pixel interval of the actual low-resolution image. In other words, the amount of motion is detected with a minimum unit of distance shorter than the space between two neighboring pixels in the actual low-resolution image. Therefore, the process for detecting the amount of motion between the reference image and the non-reference image can be considered to include a motion amount detection process with a pixel unit and a motion amount detection process with a sub pixel unit. It is possible to perform the latter process after the former process by using a result of the former process. As the method of the motion amount detection process with a pixel unit and the motion amount detection process with the sub pixel unit, any known method can be adopted. An example of the processes is as described below. Note that the amount of motion to be detected in the example of the motion amount detection processes with a pixel unit and with a sub pixel unit described below is the amount of motion in the translational direction. In order to detect the amount of motion in the rotational direction between the reference image and the non-reference image, the method described in JP-A-11-195125 should be used, for instance.
Motion Amount Detection Process with Pixel Unit
In the motion amount detection process with a pixel unit, a well-known image matching method is used for detecting the amount of motion of the non-reference image with respect to the reference image with a pixel unit. As an example, a case of using a representative point matching method will be described. Of course, it is possible to use a block matching method or the like.
After setting the detection region E and the small region e in this way, an SAD (Sum of Absolute Difference) or an SSD (Sum of Square Difference) of the pixel value (luminance value) between the reference image and the non-reference image is calculated for each of the detection regions in accordance with the representative point matching method. Using a result of this calculate, a sampling point S having the highest correlation with the representative point R is determined for each of the detection regions, and a position variation quantity of the sampling point S with respect to the representative point R is determined with a pixel unit. Then, a mean value of the position variation quantity determined for each of the detection regions is detected as the amount of motion between the reference image and the non-reference image with a pixel unit.
More specifically, the following processes are performed. The small region e in the reference image and a small region e in the non-reference image that is located at the same position as the position of the small region e are noted. Then, a difference between a pixel value of the representative point R in the noted small region e in the reference image and a pixel value of the sampling point S in the noted small region e in the non-reference image is determined as a correlation value for each of the sampling points S. After that, the correlation values with respect to the sampling points S having the same relative position to the representative point R are added cumulatively by the small region e belonging to the detection region E for each of the detection regions E. If the number of the small regions e belonging to one detection region E is 48, the 48 correlation values are added cumulatively so that one cumulative correlation value is determined with respect to one sampling point S. The cumulative correlation values of the number corresponding to the number of sampling points S set in one small region e are determined for each of the detection regions E.
In this way, after a plurality of cumulative correlation values are determined for each of the detection regions E (i.e., the cumulative correlation values with respect to each of the sampling points S), a minimum value among the plurality of cumulative correlation values is detected for each of the detection regions E. The correlation between the representative point R and the sampling point S corresponding to the minimum value is considered to be higher than correlations with respect to other sampling points S. Therefore, the position variation quantity between the representative point R and the sampling point S corresponding to the minimum value is detected as the amount of motion with respect to the noted detection region E. This detection is performed for each of the detection regions E. The amounts of motion found one for each detection region E are averaged to obtain an average value, which is detected as the amount of motion with a pixel unit between the reference image and the non-reference image.
Motion Amount Detection Process with Sub Pixel Unit
After the amount of motion with a pixel unit is detected, the amount of motion with a sub pixel unit is further detected. The sampling point S having the highest correlation with the representative point R determined by the above-mentioned representative point matching method is denoted by SX. Then, the amount of motion with a sub pixel unit is determined for each of the small regions e based on pixel value of the pixel at the representative point R in the reference image, and pixel values of the pixel at the sampling point SX and the surrounding pixels in the non-reference image, for instance.
This process will be described with reference to
In addition, it is supposed that the pixel value changes linearly from Lb to Lc when the pixel position moves from the sampling point SX in the horizontal direction by one pixel as shown in
After that, the amounts of motion with a sub pixel unit determined for the small regions e are averaged, and the amount of motion obtained by the averaging process is detected as the amount of motion with a sub pixel unit between the reference image and the non-reference image. Then, the amount of motion with a sub pixel unit between the reference image and the non-reference image is added to the amount of motion with a pixel unit between the reference image and the non-reference image, and the sum is detected as the amount of motion between the reference image and the non-reference image to be obtained finally.
Using the method described above, the amount of motion between the two actual low-resolution images that are adjacent on the time base is detected. If the amount of motion between a certain actual low-resolution image and the actual low-resolution image as the datum frame that is not adjacent to it on the time base is to be determined, a sum of the amounts of motion with respect to a plurality of actual low-resolution images obtained while the both images are obtained should be determined as described above. The amount of motion between the datum frame and each of the consulted frames determined as described above is supplied to the motion amount storage portion 43 shown in
[Region Designation]
When the motion amount calculation portion 42 determines the amount of motion between the datum frame and each of the consulted frames as described above, the region designating portion 44 shown in
Hereinafter, it is supposed in the first example that four actual low-resolution images Fa, Fb, Fc and Fd are stored in the frame memory 41, and that the actual low-resolution image Fa is handled as the datum frame while the actual low-resolution images Fb, Fc and Fd are each handled as the consulted frame so that one high-resolution image is generated. In this case, the motion amount storage portion 43 stores the amounts of motion between image Fa and each of the images Fb, Fc and Fd. When the super-resolution target region is set on the image Fa, the super-resolution target regions on the images Fb, Fc and Fd are set at positions such that they overlap with the super-resolution target region on the image Fa.
In this case, the region designating portion 44 performs the alignment between the image Fa and each of the images Fb, Fc and Fd based on the amount of motion between the image Fa and each of the images Fb, Fc and Fd with respect to the image Fa. More specifically, since the image Fb can be regarded as an image having a position error (displacement) corresponding to the amount of motion between the images Fa and Fb with respect to the image Fa, coordinate values of pixels on the image Fb are converted into coordinate values on the image Fa by a geometric conversion so that the position error is canceled (the same is true on the images Fc and Fd). This conversion realizes the alignment. In this way, the alignment is performed on the actual low-resolution images Fb, Fc and Fd with respect to the image Fa based on the amounts of motion stored in the motion amount storage portion 43, so that the positions of the super-resolution target regions on the images Fb, Fc and Fd are specified, which should be set corresponding to the super-resolution target region set on the image Fa. In other words, it is possible to recognize the positions of the super-resolution target regions on the images Fb, Fc and Fd, each of which has the amount of motion smaller than one pixel between itself and the super-resolution target region on the image Fa.
As described above in “Basic action of the super-resolution process”, the position of the super-resolution target region set on the image Fa is changed sequentially by the raster scan, so the positions of the super-resolution target regions on the images Fb, Fc and Fd are also changed along with the change of the super-resolution target region set on the image Fa. However, as described above, when the position of the super-resolution target region on the image Fa is scanned in the horizontal direction, the position of the super-resolution target region on the image Fb, Fc or Fd can move not only in the horizontal direction but also in the vertical direction.
In this way, the region designating portion 44 designates the super-resolution target region for performing the super-resolution process with reference to each of the F actual low-resolution images (F=4 in this example) stored in the frame memory 41. On this occasion, the region address in the frame memory 41 storing the pixel value of the pixel in the designated super-resolution target region is set. The region designating portion 44 informs the region cutting out portion 45 about the region address set for each of the F actual low-resolution images.
The region cutting out portion 45 reads out the pixel value stored in the region address from the frame memory 41, so as to read out the pixel value of the pixel in the super-resolution target region in each of the F actual low-resolution images to be used for the super-resolution process. In other words, the region cutting out portion 45 reads out the pixel value of the pixel in the super-resolution target region in each of the images Fa to Fd.
The above description of “Basic action of the super-resolution process” exemplifies that the super-resolution target region is a 3×3 pixel region, but accuracy of the super-resolution process is insufficient with the 3×3 pixel region size. In order to enhance the accuracy of the super-resolution process to a sufficient extent, it is necessary to increase the size of the super-resolution target region to a size of approximately a 10×10 pixel to 20×20 pixel region. If the super-resolution target region set for each of the images Fa to Fd is a 10×10 pixel to 20×20 pixel region, the region cutting out portion 45 will read out 400 to 1600 pixel values at one time from the frame memory 41. In the following description of the first example, it is supposed appropriately that the super-resolution target region is a 10×10 pixel region.
[Setting of Filter Factor]
As described above, the region designating portion 44 designates the region address in the frame memory 41 storing the pixel value to be read out from the frame memory 41 by the region cutting out portion 45. On this occasion, the alignment between the actual low-resolution images is performed as described above based on the amounts of motion stored in the motion amount storage portion 43. Then, the super-resolution target region is set for each of the actual low-resolution images after the alignment, so that a size of the amount of motion (position error amount) between the super-resolution target regions becomes smaller than a size of one pixel on the actual low-resolution image.
The region designating portion 44 confirms the amount of motion between the super-resolution target region in the datum frame and the super-resolution target region in the consulted frame generated after the alignment, i.e., a position error amount between the position (center position) of the super-resolution target region in the datum frame and the position (center position) of the super-resolution target region in the consulted frame. A size of this amount of motion is smaller than a size of one pixel on the actual low-resolution image as described above. Hereinafter, the amount of motion (position error amount) between the super-resolution target region in the datum frame and the super-resolution target region in the consulted frame generated after the alignment is referred to as an “amount of motion smaller than one pixel between the super-resolution target regions”. The amount of motion smaller than one pixel between the super-resolution target regions is transmitted to the filter factor storage portion 46. Based on the amount of motion smaller than one pixel between the super-resolution target regions that is confirmed by the region designating portion 44, a filter factor corresponding to the amount of motion is read out from the filter factor storage portion 46. Then, the read out filter factor is supplied to the filter portion 47 as a filter factor of the FIR filter to be used for the super-resolution process (hereinafter also referred to an FIR filter factor).
It is supposed that the super-resolution target region is an M×N pixel region and that the enlargement ratios of the resolution of the high-resolution image with respect to the low-resolution image are V and H times in the vertical and horizontal directions, respectively. Then, the FIR filter to be used for the super-resolution process is a filter expressed by a matrix of (M×N×F)×(M×V×N×H). Here, M and N are natural numbers, which are usually integers of three or larger. In addition, V and H satisfy “V>1” and “H>1”, which are typically integers of two or larger, for instance. Furthermore, F indicates the number of actual low-resolution images to be used for the super-resolution process as described above. Then, if the pixel in the Mx×Nx pixel region positioned at the middle of the super-resolution target region in the actual low-resolution image as the datum frame is regarded as the super-resolution target pixel, the pixels in the high-resolution image corresponding to the super-resolution target pixel is to be positioned in a (Mx×V)×(Nx×H) pixel region (Mx and Nx are integers of one or larger).
Therefore, the filter portion 47 is not required to perform the calculation by using all the (M×N×F)×(M×V×N×H) filter factors constituting the FIR filter as described above in “Basic concept of super-resolution process”. Instead, the filter portion 47 should use (M×N×F)×(Mx×V)×(Nx×H) FIR filter factors for calculating pixel values in the (Mx×V)×(Nx×H) pixel region on the high-resolution image from (M×N×F) pixel values on the F actual low-resolution images.
On the other hand, concrete values of the (M×N×F)×(Mx×V)×(Nx×H) FIR filter factors should be changed corresponding to the amount of motion smaller than one pixel between the super-resolution target regions. In addition, if the F actual low-resolution images include the images Fa and Fb, Fc and Fd as described above, there are the amount of motion mab smaller than one pixel between the super-resolution target regions with respect to the images Fa and Fb, the amount of motion mac smaller than one pixel between the super-resolution target regions with respect to the images Fa and Fc and the amount of motion mad smaller than one pixel between the super-resolution target regions with respect to the images Fa and Fd as the amount of motion smaller than one pixel between the super-resolution target regions. Therefore, values corresponding to the amounts of motion mab, mac and mad are stored in the filter factor storage portion 46 as the (M×N×F)×(Mx×V)×(Nx×H) FIR filter factors for calculating the pixel values of the pixels on the high-resolution image corresponding to the super-resolution target pixel. When the filter factor storage portion 46 recognizes a combination of the amounts of motion mab, mac and mad, (M×N×F)×(M×V)×(Nx×H) FIR filter factors corresponding to the combination is read out.
Therefore, if “M=N=10”, “F=4”, “V=2” and “H=2” hold, for instance, the FIR filter of the filter portion 47 is made up of 400×400 (=(10×10×4)×(10×2×10×2)) filter factors. In addition, if the super-resolution target pixel in the image Fa as the datum frame is one pixel, the pixels on the high-resolution image corresponding to the super-resolution target pixel are positioned in a 2×2 pixel region.
As understood from the above description, if “M=N=10, F=4, V=2 and H=2” holds, the pixel values of the pixels on the high-resolution image corresponding to the super-resolution target pixel can be calculated by determining a sum of products between pixel values of total 400 pixels in the super-resolution target regions in the images Fa to Fd and the filter factors (elements) on a specific line of the FIR filter. Here, the specific line means a line in the FIR filter storing the filter factors with respect to four pixels on the high-resolution image corresponding to the super-resolution target pixel. Therefore, as to the FIR filter of the filter portion 47, the number of filter factors necessary for calculating the pixel values of the pixels on the high-resolution image corresponding to the super-resolution target pixel is 1600 (400×4 lines) since the number of filter factors belonging to one line is 400. Therefore, if “M=N=10, F=4, V=2 and H=2” holds, the filter portion 47 is provided with 1600 filter factors.
In addition, since the super-resolution target region is designated after the alignment between the actual low-resolution images, the amount of motion between the super-resolution target regions is sufficiently small even if the amount of motion between the actual low-resolution images is large. In addition, since the super-resolution target region is a region that is sufficiently smaller than the entire image region of the actual low-resolution image, the rotational component of the amount of motion between the super-resolution target regions becomes very small even if the amount of motion between the actual low-resolution images includes a rotational component that cannot be omitted. Therefore, the amount of motion between the super-resolution target regions can be regarded to have only the translational component.
In summary, a size of the amount of motion between the super-resolution target regions of the datum frame and any one of the consulted frames is smaller than a size of one pixel on the actual low-resolution image (more specifically, sizes of the horizontal component and the vertical component of the amount of motion between the super-resolution target regions are smaller than sizes of one pixel in the horizontal and the vertical directions on the actual low-resolution image, respectively). Further, the amount of motion between the super-resolution target regions of the datum frame and any one of the consulted frames can be regarded to have only the translational component.
Therefore, a positional relationship between the super-resolution target region in the datum frame and the super-resolution target region in the consulted frame can be defined only by the horizontal component and the vertical component of the amount of motion between the super-resolution target regions. In addition, sizes of the horizontal component and the vertical component of the amount of motion can be detected by digitizing them by α and β steps, respectively. More specifically, as shown in
When the quantization as shown in
Therefore, if “F=4” and “α=β=5” hold like the example shown in
In this way, the filter factor storage portion 46 stores FIR filter factors corresponding to combinations of the amount of motion between the super-resolution target regions of the F actual low-resolution images (in other words, combinations of the positional relationship between the super-resolution target regions of the F actual low-resolution images). The region designating portion 44 calculates the amount of motion between the super-resolution target regions of the datum frame and each of the consulted frames. When they are supplied to the filter factor storage portion 46, the FIR filter factor corresponding to the combination of the amount of motion is read out from the filter factor storage portion 46 and is supplied to the filter portion 47.
In this case, if the filter factor storage portion 46 stores the FIR filter factors considering the redundancy of the positional relationship in the super-resolution target regions between the consulted frames, the order of lines to which the read-out FIR filter factors should be assigned in accordance with the combination of the amounts of motion may be changed into one corresponding to the order of the actual low-resolution images to be supplied to the filter portion 47. In addition, instead of changing the order of the FIR filter factors, the order of the actual low-resolution images to be supplied to the filter portion 47 may be changed in accordance with the arrangement of the read-out FIR filter factors.
On the other hand, if the redundancy of the positional relationship in the super-resolution target regions between the consulted frames is not taken into account concerning the FIR filter factors stored in the filter factor storage portion 46, the FIR filter factors to be used is determined uniquely to the combination of the positional relationship (combination of the amounts of motion). In this case, therefore, the FIR filter factors are stored in the filter factor storage portion 46 for each of the combinations, and the FIR filter factor that is unique to the combination of the amounts of motion is read out from the filter factor storage portion 46 and is supplied to the filter portion 47 with respect to every combination of the amounts of motion indicating the positional relationship of the super-resolution target regions between the consulted frames.
[Super-Resolution Calculation Process]
When the FIR filter factors stored in the filter factor storage portion 46 are supplied to the filter portion 47, necessary FIR filter factors are supplied to the filter portion 47 among the FIR filter factors constituting the above-mentioned FIR filter expressed by (ATA+λPTP)−1AT. The necessary FIR filter factors mean FIR filter factors necessary for calculating the pixel values of the pixels disposed at the pixel position on the high-resolution image corresponding to the pixel position of the super-resolution target pixel, which correspond to the filter factors on the above-mentioned specific line. On the other hand, pixel values of the super-resolution target region corresponding to the region address designated by the region cutting out portion 45 are supplied to the filter portion 47 sequentially from the frame memory 41, and the filter portion 47 calculates the sum of products between the supplied pixel values and the filter factors from the filter factor storage portion 46.
More specifically, the filter portion 47 performs the calculation according to the above equation (A7), i.e., the equation “x=(ATA+λPTP)−1ATy” with respect to the super-resolution target regions of the actual low-resolution images, so as to calculate the pixel value x of the high-resolution image. Here, y is the pixel values of the pixels in the super-resolution target regions set with respect to the images Fa and Fb, Fc and Fd, which are made to be a vector, and x is the pixel values of the pixels on the high-resolution image corresponding to the super-resolution target pixel, which are made to be a vector. Upon this calculation, the FIR filter factors supplied to the filter portion 47 from the filter factor storage portion 46 are the above-mentioned necessary filter factors. In other words, only the FIR filter factors with respect to the line corresponding to the pixels arranged at the pixel positions on the high-resolution image corresponding to the super-resolution target pixel are supplied to the filter portion 47. The pixel values of the high-resolution image calculated in this way are supplied to the frame memory 48 and are stored in the same. On this occasion, the address position on the frame memory 48 is designated so that the calculated pixel value is stored at the address position corresponding to the pixel position, at which the pixel value is calculated, on the high-resolution image.
Note that since the amount of motion indicating the positional relationship between the super-resolution target regions is detected by quantization as shown in
In order to realize this method, the following process is performed, for instance. The amount of motion indicating the positional relationship between the super-resolution target regions is detected with a unit of the split region shown in
The above-mentioned operations of individual blocks in the image processing portion 4 shown in
This process is performed repeatedly, so that pixel values of the pixel positions on the high-resolution image corresponding to the super-resolution target pixel are obtained in turn. Then, if the above-mentioned calculation process is fished for every pixel in the datum frame as the super-resolution target pixel, the pixel values of all the pixels constituting the high-resolution image are obtained and are stored in the frame memory 48.
When the high-resolution image is stored in the frame memory 48, the image signal based on the pixel values of the high-resolution image stored in the frame memory 48 is supplied to the signal processing portion 49. The signal processing portion 49 generates the luminance signal and the color difference signal from the image signal indicating the supplied one frame of high-resolution image and sends them to the compression processing portion 6.
Next, a second example of the present invention will be described. The structure of the image processing portion 4 according to the second example of the present invention is the same as that shown in
As described above in “Basic concept of super-resolution process”, when the reconstruction type super-resolution process is performed by the repeating calculation (repeating computational algorithm), the original low-resolution images are estimated from the high-resolution image that is once estimated. Then, the high-resolution image is reconstructed based on a difference between the estimated low-resolution images and the actual low-resolution images so that a value of the derivative ∂E[x]/∂x of the evaluation function E[x] becomes close to zero. Therefore, instead of using the FIR filter expressed by (ATA+λPTP)−1AT in the filter portion 47 as shown in the first example, it is possible to use factors corresponding to the repeating calculation by the reconstruction type super-resolution process as the filter factors of the FIR filter.
The second example exemplifies the case where the FIR filter to be used in the filter portion 47 is made up of factors that are obtained by repeating the calculation by the reconstruction type super-resolution process two times. The following description will be performed by noting a relationship between the FIR filter in this example and the calculation by the reconstruction type super-resolution process. Furthermore, it is supposed in the following description of the second example that one high-resolution image is generated from three actual low-resolution images Fa, Fb and Fc (therefore, F=3), and that the actual low-resolution image Fa is the datum frame.
In the super-resolution process according to the second example, a gradient as the derivative ∂E[x]/∂x of the evaluation function E[x] is calculated based on the high-resolution image Fx1 set initially by the actual low-resolution images Fa, Fb and Fc, and the actual low-resolution images Fa, Fb and Fc. More specifically, the gradient ∂E[x]/∂x based on the square errors between the estimated low-resolution images Fa1, Fb1 and Fc1 estimated by the high-resolution image Fx1 and the actual low-resolution images Fa, Fb and Fc is calculated in accordance with the equation (A8) below. Here, a pixel value of the high-resolution image Fx1 expressed as a vector is denoted by “x”, and pixel values of the actual low-resolution images Fa, Fb and Fc expressed as a vector is denoted by “y”.
∂E[x]/∂x=2AT(Ax−y)+2λPTPx (A8)
Then, a pixel value x1 (=x−∂E[x]/∂x) of a new high-resolution image Fx2 is determined based on this gradient ∂E[x]/∂x. Note that “x1” denotes a pixel value of the high-resolution image Fx2 expressed in a vector. In addition, using the pixel value x1 of the high-resolution image Fx2 determined in this way, a gradient a E[x1]/∂x based on the pixel value y of the actual low-resolution images Fa, Fb and Fc is calculated in accordance with the equation (A9) below. When this gradient a E[x1]/∂x is subtracted from the pixel value x1 of the high-resolution image Fx2, the pixel value x2 (=x1−∂E[x1]/∂x) of the high-resolution image Fx3 by two times of updating actions is obtained. Note that “x2” denotes a pixel value of the high-resolution image Fx3 expressed in a vector.
∂E[x1]/∂x=2AT(Ax1−y)+2λPTPx1 (A9)
The FIR filter for performing this calculation is provided to the filter portion 47 according to the second example. Hereinafter, the calculation method of the FIR filter factor constituting the FIR filter will be described with reference to the drawings. Furthermore, it is supposed for a simple description below that the high-resolution image having resolution two times higher than the resolution of the actual low-resolution image in each of the horizontal and the vertical directions is generated by the super-resolution process (i.e., H=V=2). In addition, it is supposed that each of the actual low-resolution images Fb and Fc selected as the consulted frames from three actual low-resolution images stored in the frame memory 41 has a position error with respect to the actual low-resolution image Fa as the datum frame in one of the horizontal direction and the vertical direction, and that a unit of a size of this position error (i.e., a size of the amount of motion) is a pixel unit in the high-resolution image. In other words, it is supposed that the amount of motion between the image Fb and the image Fa is the amount of motion in the horizontal or the vertical direction of the image and that a size of the amount of motion is integer times the adjacent pixel interval of the high-resolution image. The same is true on the amount of motion between the image Fc and the image Fa.
In addition, it is supposed that a point spread function (hereinafter referred to as PSF) for generating the estimated low-resolution images from the high-resolution image is made up of a filter having a 3×3 filter size (a blur filter) 250 as shown in
Thus, when the filter 250 is exerted on the noted pixel x[p, q], the pixel value x[p−1, q−1], x[p, q−1], x[p+1, q−1], x[p−1, q], x[p, q], x[p+1, q], x[p−1, q+1], x[p, q+1] and x[p+1, q+1] are multiplied by the factors α11, k21, k31, k12, k22, k32, k13, k23 and k33, respectively.
Furthermore, in symbols indicating pixel such as the noted pixel, a left symbol in a square bracket “[ ]” (e.g., p in x[p, q]) denotes a horizontal position of the pixel. The horizontal position of the pixel goes to right as a value of the symbol increases. In symbols indicating pixel such as the noted pixel, a right symbol in a square bracket “[ ]” (e.g., q in x[p, q]) denotes a vertical position of the pixel. The vertical position of the pixel goes downward as a value of the symbol increases. The same is true on the symbols ya[p, q] and the like that will be described later.
Further, it is supposed that the amount of motion between the actual low-resolution image Fa and the actual low-resolution image Fb is an amount of motion in the horizontal direction and that a size of the amount of motion corresponds to one pixel of the high-resolution image. In addition, it is supposed that the amount of motion between the actual low-resolution image Fa and the actual low-resolution image Fc is an amount of motion in the vertical direction and that a size of the amount of motion corresponds to one pixel of the high-resolution image. When the adjacent pixel interval of the actual low-resolution image is denoted by ΔS, a size of the amount of motion between the images Fa and Fb as well as a size of the amount of motion between the images Fa and Fc, which corresponds to a width of one pixel of the high-resolution image (a width in the horizontal or the vertical direction), is denoted by ΔS/2.
More specifically, it is supposed that the image Fb has a position error with respect to the image Fa in the horizontal direction (specifically, in the right direction) by one pixel of the high-resolution image (i.e., by ΔS/2) as shown in
Then, if a pixel ya[1, 1] of the actual low-resolution image Fa and a pixel x[1, 1] of the initial high-resolution image Fx1 overlap with each other at their center positions as shown in
Hereinafter, it is supposed that the positional relationship between each of the pixels of the actual low-resolution images Fa, Fb and Fc and each of the pixels of the initial high-resolution image Fx1 have the relationship shown in
Pixel Value Deriving Process of Estimated Low-Resolution Image (First Element Process)
A process for deriving a pixel value of the estimated low-resolution image (hereinafter also referred to as a first element process) will be described. Pixel values A[p, q], B[p, q] and C[p, q] of each of the estimated low-resolution images Fa1, Fb1 and Fc1 estimated from the initial high-resolution image Fx1 are expressed by the equations (B1) to (B3) below.
Gradient Deriving Process at Each Pixel of High-Resolution Image (Second Element Process)
A process for deriving a gradient at each pixel of the high-resolution image (hereinafter also referred to as a second element process) will be described. When the pixel values of the estimated low-resolution images Fa1 to Fc1 are obtained as described above, the gradient ∂E[x]/∂x with respect to the initial high-resolution image Fx1 is calculated based on a difference between the obtained pixel values and the pixel values of the actual low-resolution images Fa to Fc. Hereinafter, the calculation method of the gradient ∂E[x]/∂x in the case where the noted pixel of the high-resolution image is each of the pixels x[2p−1, 2q−1], x[2p, 2q−1] and x[2p−1, 2q] will be described individually.
First, the calculation method of the gradient ∂E[x]/∂x in the case where the noted pixel is the pixel x[2p−1, 2q−1] will be described.
When the noted pixel is the pixel x[2p−1, 2q−1], the region 261 including 3×3 pixels x[2p−2, 2q−2] to x[2p, 2q] with the center pixel x[2p−1, 2q−1] shown in
More specifically, the following process is performed. As understood from
Second, the calculation method of the gradient ∂E[x]/∂x when the noted pixel is the pixel x[2p, 2q−1] will be described.
When the noted pixel is the pixel x[2p, 2q−1], the region 262 including 3×3 pixels x[2p−1, 2q−2] to x[2p+1, 2q] with the center pixel x[2p, 2q−1] shown in
More specifically, the following process is performed. As understood from
Third, the calculation method of the gradient ∂E[x]/∂x when the noted pixel is the pixel x[2p−1, 2q] will be described.
When the noted pixel is the pixel x[2p−1, 2q], the region 263 including 3×3 pixels x[2p−2, 2q−1] to x[2p, 2q+1] with the center pixel x[2p−1, 2q] shown in
More specifically, the following process is performed. As understood from
Pixel Value Updating Process of Each Pixel of High-Resolution Image (Third Element Process)
A process for updating the pixel value of each pixel of the high-resolution image (hereinafter referred to as a third element process) will be described. After the gradient ∂E[x]/∂x at each pixel of the high-resolution image is calculated as described above, the calculated gradient is subtracted from the pixel value of the initial high-resolution image so that a pixel value of the updated high-resolution image can be calculated. In other words, the pixel value at the pixel x[p, q] is updated by using the gradient ∂E[x]/∂x_x[p, q] at the pixel x[p, q], so that the pixel value at the pixel x1[p, q] in the high-resolution image Fx2 can be calculated. The symbol x1[p, q] indicates a pixel constituting the image Fx2 or its pixel value. Furthermore, “x1[p, q]=x[p, q]−∂E[x]/∂x_x[p, q]” holds.
When the first to the third element processes described above are performed, the high-resolution image is updated. In the second update, the pixel values of the estimated low-resolution images Fa2 to Fc2 is determined in the first element process based on the above equations (B1) to (B3). On this occasion, instead of the pixel values (x[2p, 2q] and the like) of the high-resolution image Fx1, the pixel values (x1[2p, 2q] and the like) of the high-resolution image Fx2 are used. More specifically, the pixel values of the estimated low-resolution images Fa2, Fb2 and Fc2 are expressed by the equations (B7) to (B9) below, respectively. Similarly to the images Fa1, Fb1 and Fc1, the pixel values of the images Fa2, Fb2 and Fc2 are also expressed by A [p, q], B[p, q] and C[p, q] for convenience sake.
After the pixel values of the estimated low-resolution images Fa2 to Fc2 are determined, the gradient a E[x1]/∂x with respect to each pixel in the high-resolution image is calculated based on the above equations (B4) to (B6). Then, this gradient a E[x1]/∂x is subtracted from the pixel value of the high-resolution image Fx2 so that the second updating process is performed for generating the high-resolution image Fx3.
As described above, when the calculation process is performed by using the PSF expressed in a 3×3 matrix, the image region IR shown in
Further, the pixel value of the estimated low-resolution image can be obtained by substituting pixel values of the 3×3 pixels of the initial high-resolution image Fx1 into the PSF as described above. On the other hand, the pixel values of the 3×3 pixels of the initial high-resolution image Fx1 are used for obtaining the pixel value of the pixel on the estimated low-resolution image, which is located at the pixel position other than the noted pixel x[p, q] in the reference image region IR. Therefore, in the action of the first updating process, 5×5 pixels x[p−2, q−2] to x[p+2, q+2] of the initial high-resolution image Fx1 located at a position inside a frame 280 shown in
More specifically, in order to generate the high-resolution image Fx2 by updating the initial high-resolution image Fx1 only once, pixel values of 5×5 pixels x[p−2, q−2] to x[p+2, q+2] of the initial high-resolution image Fx1 are necessary for the noted pixel x[p, q] in the initial high-resolution image Fx1. In addition, the reference image region made up of 3×3 pixels x1[p−1, q−1] to x1[p+1, q+1] of the initial high-resolution image Fx1 is noted, and pixel values of the pixels of the actual low-resolution images Fa to Fc positioned in the reference image region are necessary.
When the high-resolution image Fx2 obtained by the update of one time is further updated, the process is performed by using the pixel values after the update. More specifically, pixel values of 5×5 pixels x1[p−2, q−2] to x1[p+2, q+2] of the high-resolution image Fx2 are necessary for the noted pixel x1[p, q] in the high-resolution image Fx2. In addition, the reference image region made up of 3×3 pixels x1[p−1, q−1] to x1[p+1, q+1] of the high-resolution image Fx2 is noted, and pixel values of the pixels of the actual low-resolution images Fa to Fc positioned in the reference image region are necessary.
However, when the high-resolution image Fx2 is obtained from the initial high-resolution image Fx1 by updating the pixel value, pixel values of 5×5 pixels of the initial high-resolution image Fx1 and pixel values of the pixels of the actual low-resolution images Fa to Fc positioned in the reference image region of the initial high-resolution image Fx1 are necessary for each pixel. More specifically, the pixel values of 5×5 pixels x1[p−2, q−2] to x1[p+2, q+2] of the high-resolution image Fx2 that are used for the noted pixel x1[p, q] in the high-resolution image Fx2 are calculated in the first updating process by using pixel values of the 5×5 pixels of the initial high-resolution image Fx1 and pixel values of the pixels of the actual low-resolution images Fa to Fc positioned in the reference image region of the initial high-resolution image Fx1.
Therefore, if the initial high-resolution image Fx1 is updated two times, the noted pixel x[p, q] is further updated by using the updated pixel values of the 5×5 pixels x[p−2, q−2] to x[p+2, q+2]. In other words, this updating process is performed by using pixel values of 5×5 pixels of the initial high-resolution image Fx1 with the center pixel that is each of the 5×5 pixels x[p−2, q−2] to x[p+2, q+2]. Further, in this updating process, reference image regions of the initial high-resolution image Fx1 when each of the 5×5 pixels x[p−2, q−2] to x[p+2, q+2] is regarded as the noted pixel (total 25 reference image regions) are noted, and pixel values of the pixels of the actual low-resolution images Fa to Fc positioned in their reference image regions are also used. Therefore, if the initial high-resolution image Fx1 is updated two times, pixel values of 9×9 pixels x[p−4, q−4] to x[p+4, q+4] of the initial high-resolution image Fx1 positioned in the solid line frame 291 shown in
Therefore, the FIR filter constituting the filter portion 47 shown in
Factors of this FIR filter can be obtained as described below. First, the pixel values of the estimated low-resolution images obtained based on the above equations (B1) to (B3) are substituted into the equations (B4) to (B6) so that the gradient is determined. After that, the pixel values of the high-resolution image updated by the gradient based on the equations (B4) to (B6) are substituted into equations (B7) to (B9), and the obtained pixel values of the estimated low-resolution image are substituted into the equations (B4) to (B6) so that the new gradient (the second gradient) is determined. Then, expanding a subtract equation for updating the high-resolution image by using the new gradient, factors by which the “pixel values of 9×9 pixels of the initial high-resolution image Fx1” and the “pixel values of the pixels of the actual low-resolution images Fa to Fc positioned in the image region 292 of the initial high-resolution image Fx1” are multiplied, are determined. The determined factors can be obtained as the factors of the FIR filter.
Although it is supposed that an amount of motion with a pixel unit of the high-resolution image is generated between different actual low-resolution images in the above description, it is possible that an amount of motion with a sub pixel unit of the high-resolution image may be generated between them.
Furthermore, although the update quantity when the updating action that is repeated in the above-mentioned reconstruction type super-resolution process is performed two times is calculated in this example, it is possible to adopt the structure in which the update quantity when the updating action is repeated three times or more is calculated. For instance, it is supposed that the update quantity when the updating action in the super-resolution process is repeated HA times is calculated, and that the PSF that is, so to speak, a blur function is made up of (2−KA+1)×(2−KA+1) matrix. Here, HA is an integer of three or more, and KA is a natural number. In this case, the FIR filter constituting the filter portion 47 shown in
Next, a third example of the present invention will be described. Although the image sensing apparatus having the structure shown in
The display device shown in
When the display device shown in
Then, when an operation requesting the high resolution processing of the image is performed by the operating portion 15, the image processing portion 4 shown in
In addition, as to the image sensing apparatus shown in
The present invention can be applied to an electronic appliance (e.g., an image sensing apparatus or a display device) equipped with the image processing apparatus performing the high resolution processing of an image by the super-resolution process.
According to the present invention, the image region of the actual low-resolution image is divided into relatively small regions, and the high resolution processing is performed for each of the regions obtained by the division. Therefore, the number of factors in the calculation equation for performing the high resolution processing can be reduced compared with the conventional method in which the high resolution processing is performed on all the pixels of the actual low-resolution image at one time. As a result, setting of the factors in the calculation equation can be facilitated, and a quantity of the calculation for the high resolution processing can be reduced. In addition, when a filter is used for performing the high resolution processing, setting of the filter factor can be facilitated.
In addition, when a super-resolution target region is set for each of the actual low-resolution images so as to obtain pixel values of the high-resolution image from a plurality of actual low-resolution images, it is possible to make the image processing apparatus store the filter factors based on a positional relationship between the super-resolution target regions since the number of the filter factors to be set can be reduced. If the filter factors stored in this way are used for performing the filter process, it is possible to perform the high resolution processing of an image easily at a high speed.
Number | Date | Country | Kind |
---|---|---|---|
JP2007-201860 | Aug 2007 | JP | national |