This invention relates to a method for simulating an image captured with a long exposure time.
Camera shake occurs while the shutter is open and exposing the image sensor in a digital camera. Any movement of the camera will show up in the image as motion lines, ghost images, and other motion blurs. This often happens in low light where a longer shutter speed is needed to fully expose the image. Long focus further exaggerates the camera shake. One solution to camera shake is to use a tripod to stabilize the camera. Of course, this solution is inconvenient as the user has to carry the tripod.
Thus, what is needed is a method that addresses camera shake for a digital camera.
Use of the same reference numbers in different figures indicates similar or identical elements.
In one embodiment of the invention, a method for simulating an image captured at a long exposure time (“simulated image”), includes (1) capturing each of first, second, and third images at a short exposure time, (2) determining a first relative motion between the first and the second images, (3) transforming the first image to remove the first relative motion, (4) determining a second relative motion between the third and the second images, (5) transforming the third image to remove the second relative motion, and (6) combining the first, the second, and the third images to form the simulated image. Relative motions between images are determined by matching blocks at multiple resolutions to determine corresponding points between the images. Transformation to remove relative motion is determined by fitting corresponding points between the images using a minimum square error (MSE) algorithm in a random sample consensus (RANSAC) framework.
In embodiments of the invention, three images are each captured with a short exposure time and then combined to simulate an image captured with a long exposure time. Due to the short exposure time, the three images will not have any motion blur due to camera shake. The three images are motion-compensated so the simulated image will not have any motion blur due to the change in the camera position in-between shots.
In step 102, the processor detects a user attempting to take an image using a long exposure time. In one embodiment, the processor detects that the user has set the exposure time to greater than ⅕ second and has pressed the shutter release button to capture the image.
In step 104, the processor instructs the digital camera to take a number of images each with a short exposure time. In one embodiment, the processor instructs the digital camera to capture three images 302-1, 304-1, and 306-1 (
In step 106, the processor determines the corresponding points in the three images. In one embodiment, the processor selects second image 304-1 as the reference image. The processor compares first image 302-1 with second image 304-1 to match blocks between them, and then compares third image 306-1 with the second image 304-1 to match blocks between them. From the center points of these matching blocks, the processor determines the corresponding points between the two pairs of images.
In step 202, the processor down-samples images 302-1 and 304-1 to two additional resolutions. In one embodiment, images 302-1 and 304-1 are first down-sampled to ½ of their original resolution (shown as images 302-2 and 304-2 in
In step 204, the processor performs block matching between two images 302-1 and 304-1 at ⅛ resolution. In one embodiment, the processor breaks the images into blocks. For blocks in the current image, the processor searches for corresponding blocks in the reference image that satisfy some minimum sum of absolute difference (SAD).
In step 206, the processor performs block matching between the two images 302-1 and 304-1 at ½ resolution. The results of the block matching at ⅛ resolution are propagated to the blocking matching at ½ resolution. Specifically, the location of the best matched blocks in reference image 304-1 at ⅛ resolution are used as the starting points for searching in reference image 304-1 at ½ resolution. Once the best matching blocks are located, the processor has identified corresponding pixel points (the center points of the blocks) between images 302-1 and 304-1 at ½ resolution. This correspondence is propagated to images 302-1 and 304-1 at their original resolution.
Block matching is not performed for the two images at their original resolution because experiments show that block matching at ½ resolution is already sufficient for accurate motion estimation. Furthermore, as even images captured at the short exposure time (e.g., 1/25 sec) have motion blur (although imperceptible to the human eyes), block matching at the original resolution may not be able to achieve better performance than block matching at ½ resolution.
Returning to
where xi and yi are the coordinates of a pixel point in first image 302-1 (or third image 306-1); a, b, dx, and dy are the global motion parameters between the first image 302-1 (or third image 306-1) and second image 304-1; and xi′ and yi′ are the coordinates of the pixel point after motion compensation.
The processor then fits the corresponding points determined in step 106 into equation 1 using a minimum square error (MSE) algorithm. To improve the robustness of the motion estimation, the MSE algorithm is incorporated into a random sample consensus (RANSAC) framework.
In step 110, first image 302-1 and third image 306-1 are motion compensated so they match second image 304-1.
In step 112, images 302-1, 304-1, and 306-1 are linearly combined and then scaled as follows:
where I(i, j) is the pixel value of at a pixel located at (i, j) in the simulated image, I0(i, j) is the original pixel value of a pixel located at (i, j) in reference image 304-1, N is the number of images captured to generate the simulated image, I1′(i, j) is the pixel value of a pixel located at (i, j) in the images captured to generate the simulated image after motion compensation, and k is a linear coefficient for scaling the results. In one embodiment where N=3, k is set as 5.0/3.0.
As the pixel values are enhanced by N×k times, it is important to determine the resulting signal-to-noise ratio (SNR) of the simulated image. Suppose the value of a particular point on the image is s and it is corrupted by the additive noise n, then the observed value is r=s+n. If noise n ε N(0,σ2), where σ is the deviation, then SNR becomes s2/σ2. A linear combination is as follows:
where r′ is the total observed value after combining the images, s′ is the total pixel value after combining the images, and n′ is the total noise value after combining the images.
If motion estimation is perfect, then sl=s and nl ε E N(0,σ2), so s′=Ns,n′ε ( , Nσ2). Thus, the SNR is (Ns)2/(Nσ2)=N×(s2/σ2), which is N times the original SNR of each of the three images. Thus, the SNR is increased by combining the images to generate the simulated image.
Various other adaptations and combinations of features of the embodiments disclosed are within the scope of the invention. Numerous embodiments are encompassed by the following claims.