The invention is related to image focusing, and in particular but not exclusively, to a method and device for automatic focus (auto-focus) using depth-from-defocus in point-and-shoot digital cameras.
The standard auto-focusing methods currently used in point-and-shoot digital cameras are collectively called “depth from focus.” In the depth-from-focus method, the whole range of focus positions is scanned (from infinity to the closest possible distance). At each focus position, an image is taken, and a metric quantifying the sharpness of the region in the image on which the camera is to be focused is calculated. The focus position having the highest sharpness metric is then used for acquiring the still image. Some kind of gradient operator is usually employed to define the sharpness metric.
The second class of auto-focus methods is collectively called “depth from defocus.” Unlike depth-from-focus, depth-from-defocus is not used in digital cameras, but is used in academic applications, such as optical measurement instruments or astrophysics. Depth from defocus is a method that estimates the depth map of a scene from a set of two or more images of the scene taken from the same point of view. The images are obtained by varying the camera parameters (typically the focus position, the zoom position, and/or the aperture size/iris). The information about the distance to the object is contained in the blur quantification of the defocused images.
Non-limiting and non-exhaustive embodiments of the present invention are described with reference to the following drawings, in which:
Various embodiments of the present invention will be described in detail with reference to the drawings, wherein like reference numerals represent like parts and assemblies throughout the several views. Reference to various embodiments does not limit the scope of the invention, which is limited only by the scope of the claims attached hereto. Additionally, any examples set forth in this specification are not intended to be limiting and merely set forth some of the many possible embodiments for the claimed invention.
Throughout the specification and claims, the following terms take at least the meanings explicitly associated herein, unless the context dictates otherwise. The meanings identified below do not necessarily limit the terms, but merely provide illustrative examples for the terms. The meaning of “a,” “an,” and “the” includes plural reference, and the meaning of “in” includes “in” and “on.” The phrase “in one embodiment,” as used herein does not necessarily refer to the same embodiment, although it may. As used herein, the term “or” is an inclusive “or” operator, and is equivalent to the term “and/or,” unless the context clearly dictates otherwise. The term “based, in part, on”, “based, at least in part, on”, or “based on” is not exclusive and allows for being based on additional factors not described, unless the context clearly dictates otherwise.
Briefly stated, a camera performs multiple iterations of a calculation of an estimate of the distance from the camera to an object for auto-focusing. Each estimate is made using depth-from-defocus with at least two images used to make the estimate. When the two images are taken, the camera parameters are different. For example, the focus position, aperture, or zoom may be different between the two images. In each subsequent iteration, the previous estimate of distance from the camera to the object is used so that the parameter varied (focus position or zoom) is closer to that corresponding to the estimated distance from the camera to the object, so that the estimated distance is closer to the actual distance in each iteration.
As discussed above, one camera parameter (such as focus, zoom, or aperture), may be modified, while keeping the other parameters constant. In other embodiments, two or more parameters may be adjusted.
For example, in one embodiment two images are taken with the camera while the focus position v is at two different values, but all of the other parameters remain the same. A portion of the first image and the same portion of the second image are compared and used to estimate the distance from the camera to the object (u). The estimate of the distance u may be based on, for example, the blur widths measured in the two images. The focus position v relates to the distance u according to the lensmaker's equation 1/u+1/v=1/f, where f represents the focal length, and f is fixed.
In the first iteration, in the embodiment under discussion, two arbitrary focus positions v1 and v2 are used for the two focus positions. In some embodiments, in order to avoid the situation when the estimation error in the first iteration is very high, the two focus positions should not be placed too close to each other or too close to the ends of the focus range. For example, in some embodiments, the first focus position v1 may be placed at the distance of one-third of the entire focus range from the focus position corresponding to the closest possible distance, while the second focus position v2 may be placed at the distance of one-third of the entire focus range from the focus position corresponding to infinity. After u is obtained, a maximum error Δu is also obtained based on, in one embodiment, estimates of the errors in the blur widths measured in the images. In the next iteration in this embodiment, the first position v1 is changed to the value corresponding to u−Δu (again, using the lensmaker's equation 1/u+1/v1=1/f to determine the value of v1 that corresponds to u−Δu), and second focus position is set to the focus position corresponding to u+Δu. In this embodiment, a comparison of the two images created at the new focus positions is made to calculate a final value for u, which is translated to the final focus value that auto-focuses the camera on the object. Additional iterations may be made if increased accuracy is needed.
Light enters device 100 through lens 112 and is received by sensor 114. In embodiments in which device 100 is a digital camera, sensor 114 may be a charged coupled device (CCD) sensor, a complementary metal oxide semiconductor (CMOS) sensor, or the like. Sensor 114 provides an image to controller 110.
Controller 110 may provide various processing for the image received from sensor 114. For example, in some embodiments, controller 110 performs functions such as filtering, interpolation, anti-aliasing, and/or the like. Controller 110 also performs auto-focusing.
In some embodiments, controller 110 includes a processor and a memory, wherein instructions from the memory are loaded into the processor to enable actions to be performed, such as the actions shown in the flow chart of the embodiment illustrated in
Controller 110 is configured to perform iterative auto-focusing. The iterative auto-focusing may be enabled by processor-readable code stored in an article of manufacture including a processor-readable memory in the controller, which enables the actions discussed below when executed by one or more processors. The memory may stored on RAM, ROM, a disc, or the like. In the iterative auto-focusing, two or more iterations of depth from defocus are used to determine the distance from the device 100 to the object from which the image is taken. In general, the closer the two or more images are to the in-focus image, the more accurate the estimation is. Depth from defocus is used to estimate the distance from the device 100 to the object in each iteration, in which each of the two or more images used in the iteration has one or more parameters varied, such as aperture, zoom, and focus position. For example, in one embodiment, the parameter varied is the focus position. After the first estimation, the lens is moved closer to or actually to the estimated position, and additional two (or more) images are acquired, from which a more accurate estimation of the actual location is calculated.
In one embodiment, controller 110 is configured to perform process 220, illustrated in
The process then advances to block 224, where a second image of the object is obtained using a second set of camera parameters. The second set of parameters is not identical to the first set of parameters. In one embodiment, one of the parameters is changed while the other parameters remain the same. For example, in one embodiment, the second set of parameters is the same as the first set of parameters, except that the focus position is different.
The process then moves to block 226, where a first estimate of a distance from the camera to the object is generated from at least the first image and the second image using depth from defocus. Although not shown in
The process then proceeds to block 228, where a third image of the object is obtained using a third set of camera parameters. The process then advances to block 230, where a fourth image of the object is obtained using a fourth set of camera parameters. The fourth set of parameters is not identical to the third set of parameters. In one embodiment, one of the parameters is changed while the other parameters remain the same. For example, in one embodiment, the fourth set of parameters is the same as the third set of parameters, except that the focus position is different.
The third and fourth sets of parameters are based, in apart, on the first estimate. In particular, the camera focus position may be changed to the two values equidistant from the value determined by the first estimate. As a result, they are selected to be closer to the position corresponding to the correct focus position so that, for the estimate of the distance from the camera to the object that is generated, it is closer to the actual focus position.
The process then moves to block 232, where a second estimate of a distance from the camera to the object is generated from at least the third image and the fourth image using depth from defocus. In some embodiments, more than two images may be used to generate the second estimate. Any suitable depth from defocus method may be used to generate the second estimate, including any depth from defocus method known in the prior art, or one of the new depth from defocus methods described herein may be employed to generate the second estimate.
After block 232, the process proceeds to the return block, where other processing is resumed. Although not shown in
The process then advances to block 326, where a first estimate of u (where u represents the distance from the camera to the object) is generated from the first set of images with a depth-from-defocus algorithm. Any suitable depth-from-defocus algorithm may be employed. For example, either of the depth-from-defocus algorithms shown in
The process then proceeds to block 327, where Δu, the maximum error of the estimation of u, is calculated. Examples of the calculation of Δu are described in greater detail below. The process then moves to block 328, where a second set of at least two images of the object are obtained. The images are the same, except for the focus position, which is different in each image. At block 328, the focus position used is the focus corresponding to u−Δu for the first image, and the focus corresponding to u+Δu for the second image. The u and Δu referred to are the ones calculated during the previous iteration at blocks 326 and 327. The focus position v relates to u according to the lensmaker's equation:
Accordingly, the focus position (v) corresponding to u−Δu based on the above equation is used for the first image, and the focus position (v) corresponding to u+Δu based on the above equation is used for the second image. If desired, these may be designated as u1 for the first estimate and Δu1 for the maximum error of the first estimate, to distinguish them from the subsequent estimates. The process then advances to block 332, where a second estimate of u (u2) is generated from the second set of images with a depth from defocus algorithm.
The process then proceeds to block 334, where Δu2 is calculated for the second estimate, i.e. u2. The process then moves to block 336, where a third set of at least two images of the object are obtained. The images are the same, except for the focus position, which is different in each image. At block 336, the focus position (v) used is the focus position corresponding to u2−Δu2 for the first image, and the focus (v) corresponding to u2+Δu2 for the second image. The u2 and Δu2 referred to are the ones calculated during the previous iteration at blocks 332 and 334.
The process then advances to block 338, where u is generated from the third set of images with a depth from defocus algorithm. The process then proceeds to a return block, where other processing is resumed.
Although a particular embodiment of iterative auto-focusing is illustrated and described above with respect to
It is preferable to perform the iterative auto-focusing with a depth-from-defocus method that is relatively stable, that is, one that provides substantially the same result when repeated. If the method used is not completely accurate with one iteration, that is acceptable, because the accuracy can be improved with repeated iterations, thus giving an accurate result after multiple iterations are performed. Since existing prior art methods of depth-from-defocus are relatively unstable, it is preferred to use a depth-from-defocus method that is more stable than prior art methods, such as those discussed in greater detail below. However, as discussed above, using the iterative auto-focusing with prior art depth-from-defocus algorithms is also within the scope and spirit of the invention.
The process then proceeds to block 446, where a coarse registration is performed between images I1 and images I2. The coarse registration is performed to account for possible camera and/or scene motion. The coarse registration aligns the two images and finds the overlap region. The coarse registration does not need to be very accurate, so it can be done even though the images are blurred, for instance using scaled-down images.
The process then moves to block 448, where the gradient of the pixel values in image I1 in the part of the region of interest common to I1 and I2 is calculated. The image may be graphed three-dimensionally, as a two dimensional image with the pixel value used at the height of the image at each point in the two-dimensional image. The gradient of the image I1 is therefore the slope or grade at each point of this three-dimensional image. The process then advances to block 450, where the maximum gradient d1 of the pixel values of image I1 is calculated.
The maximum gradient is calculated as follows. In the following equations, I1(x, y), x=0, . . . , N1x−1, y=0, . . . , N1y−1, and I2(x, y), x=0, . . . , N2x−1, y=0, . . . , N2y−1 are the first and the second images after inverse gamma correction. In the calculations, a number δ much smaller than 1 (for example, δ=10−4) is chosen, and d1 is a value such that the ratio of: the number of pixels in the first image whose absolute gradient value is greater than or equal to d1, to the total number of pixels in the first image, is equal to δ:
The process then advances to block 452, where the gradient of I2 in the part of the region of interest common to I1 and I2 is calculated. The process then proceeds to block 454, where the maximum gradient d2 of image I2 is calculated, as follows. In the calculations, d2 is a value such that the ratio of the number of pixels in the second image whose absolute gradient value is greater than or equal to d2, to the total number of pixels in the second image, is equal to δ:
The process then proceeds to block 456, where the distance u is calculated, as follows (as previously discussed, u represents the distance from the camera to the object).
In the calculation an edge profile in an ideal image may be modeled as a step function of height A smoothed by a Gaussian of width σ0:
In the calculations, E1(x; u) represents the edge response of the optical system when an object located at distance u from the camera is photographed with set S1 of camera parameters. In a similar way, E2(x; u) denotes the edge response of the optical system when an object located at distance u from the camera is photographed with the set S2 of camera parameters. In the calculations, E1(x; u) and E2(x; u) are also each modeled as a step function smoothed by Gaussians of widths σ1 and σ2:
The edge profiles in defocused images 1 and 2 are obtained as the convolution between the edge profile in the ideal image and the point spread function of the optical system. The point spread function of the optical system can be viewed as a filter to which the ideal image could be passed through to obtain the blurred, de-focused images. The convolution between the edge profile in the ideal image and the point spread function of the optical system is:
where widths {tilde over (σ)}1 and {tilde over (σ)}2 are given by
{tilde over (σ)}1√{square root over (σ02+σ12)} (3)
{tilde over (σ)}2√{square root over (σ02+σ22)} (4)
The maximum gradients of the defocused images are therefore
In the calculation of u, the distance u is found that simultaneously satisfies the following two equations:
In de-focused images, due to blurring, edges of objects are blurred. The widths of these edges are therefore wider than those in the ideal image. The widths of the point spread functions forming the de-focused images are referred to as “blur widths”. An edge of an image is identified based on the pixel value changing relatively quickly at the edge. An edge can be identified by high gradient values, with the edge traced in the direction of maximum change (the direction of the gradient), until the maximum/minimum value is reached on both sides of the edges. In one embodiment, the edge width is defined as the distance in pixels from 10% to 90% of the edge height in the direction of the maximum change (wherein the pixel value is the height). A comparison of the amount of blurring between the two images is used to determine the distance u. Two images are needed because the characteristics of the object in the ideal image are not known. Two ways of quantifying blur in each image are 1) the maximum gradient; and 2) the minimum edge width. The variables σ1 and σ2 are the widths of the point spread functions of images 1 and 2, respectively, and the variables {tilde over (σ)}1 and {tilde over (σ)}2 are the edge widths of images 1 and 2, respectively.
If the steepest edges in the ideal, i.e. unblurred, image are much narrower than the blur widths, then σ1, σ2>>σ0. In this case, the algorithm is based on the fact that the ratio d1/d2 approximately follows the ratio of the gradients of E1(x; u) and E2(x; u), or in other words, the reciprocal of the ratio of the widths of E1(x; u) and E2(x; u), i.e. σ1 and σ2:
At the calibration stage, the edge responses E1(x; u) and E2(x; u) are measured or calculated for the entire range of distances u. At the focusing stage, d1 and d2 are extracted from the images at hand. The object's distance is then estimated as the distance u for which the ratio of the gradients of E1(x; u) and E2(x; u) is closest to the ratio d1/d2. In some cases, there are two different values of u giving the same ratio of the gradients of E1(x; u) and E2(x; u) (more than one solution to the equations). In these cases, a third image is then taken to resolve the ambiguity (to determine which of the solutions to the equations is correct).
The process then moves to a return block, where other processing is resumed.
Although a particular embodiment of the invention as shown in process 440 is discussed above, many variations are within the scope and spirit of the invention. For example, in some embodiments, in place of the gradient magnitude of the image, other derivative-like operators are used. For example, in one embodiment, the derivatives in the x- or the y-direction may be used. In addition, in some embodiments, the derivatives may be calculated on a scaled-down or a filtered version of the image.
Further, in some embodiments, the maximum gradients d1 and d2 may be defined in a way not exactly identical to Eqs. (1-2). A modification of the algorithm may be used which considers not only the maximum gradients, but also, for example, the average value of the gradients that are above a certain threshold.
Additionally, in some embodiments, the image can be divided into smaller regions, and the distance can be estimated for each region.
The process then proceeds to block 566, where a coarse registration is performed between images I1 and images I2. The coarse registration is performed to account for possible camera and/or scene motion. The coarse registration aligns the two images and finds the overlap region. The coarse registration does not need to be very accurate, so it can be done even though the images are blurred, for instance using scaled-down images. The process then moves to block 568, where the minimum edge width of I1 ({tilde over (σ)}1) in the part of the region of interest common to I1 and I2 is calculated.
The minimum edge width of I1 ({tilde over (σ)}1) may be calculated, for example, as follows in one embodiment. At a first stage, the image is analyzed to find candidates for edges in different directions. In order to find edges in direction φ, one can use, for example, the template-matching technique with the template depicted in
In this embodiment, after edge candidates have been found, the cross-section of every candidate edge is analyzed to check whether it is a single, isolated edge, so that there is as little influence as possible from other adjacent edges. Then the width of each single, isolated edge is taken note of. In one embodiment, the edge width is defined as the distance in pixels from 10% to 90% of the edge height in the direction of the maximum change. For each direction, the minimum edge width is found. Finally, under the assumption that the optical point spread function is isotropic, the minimum edge width is defined as the minimum value over all directions.
The process then advances to block 570, where the minimum edge width of I2 ({tilde over (σ)}2) in the part of the region of interest common to I1 and I2 is calculated. Minimum edge width of I2 ({tilde over (σ)}2) may be calculated as described above with regard to I1. The process then proceeds to block 572, where equations (3) and (4) above are solved simultaneously for σ0 and u (in conjunction with the other equations for the values of σ1 and σ2)
In other embodiments, a relation even more precise than equations (3) and (4) between σ0, σ1, and σ2 may be established and used in the algorithm, in order to achieve higher accuracy of the distance estimation, if the shape of the lens point spread function deviates substantially from a Gaussian.
Process 440 (of
The maximum error of the estimation, Δu, may be calculated as follows. The distance estimation is
where r1 and r2 are the blur widths measured from the images, f is the focal length, v1 and v2 are the distances between the lens and the sensor at positions 1 and 2, respectively, and u is the estimate for the object's distance. The error (Δu) in u is caused by errors in r1 and r2 which can be estimated, for example, based on the noise level in the images. The error in u can be then calculated using the standard error analysis technique.
The above specification, examples and data provide a description of the manufacture and use of the composition of the invention. Since many embodiments of the invention can be made without departing from the spirit and scope of the invention, the invention also resides in the claims hereinafter appended.
This application claims the benefit of U.S. Provisional Patent Application No. 61/094,338, filed on Sep. 4, 2008 and U.S. Provisional Patent Application No. 61/101,897 filed on Oct. 1, 2008, the benefit of the earlier filing date of which is hereby claimed under 35 U.S.C. §119(e) and which is further incorporated by reference.
Number | Name | Date | Kind |
---|---|---|---|
4804831 | Baba et al. | Feb 1989 | A |
5070353 | Komiya et al. | Dec 1991 | A |
5148209 | Subbarao | Sep 1992 | A |
5151609 | Nakagawa et al. | Sep 1992 | A |
5193124 | Subbarao | Mar 1993 | A |
5231443 | Subbarao | Jul 1993 | A |
5475429 | Kodama et al. | Dec 1995 | A |
5793900 | Nourbakhsh et al. | Aug 1998 | A |
7389042 | Lin et al. | Jun 2008 | B2 |
20050218231 | Massieu | Oct 2005 | A1 |
20080075444 | Subbarao et al. | Mar 2008 | A1 |
20080095312 | Rodenburg et al. | Apr 2008 | A1 |
20080297648 | Furuki et al. | Dec 2008 | A1 |
Number | Date | Country |
---|---|---|
62284314 | Dec 1987 | JP |
63127217 | May 1988 | JP |
Number | Date | Country | |
---|---|---|---|
20100053417 A1 | Mar 2010 | US |
Number | Date | Country | |
---|---|---|---|
61094338 | Sep 2008 | US | |
61101897 | Oct 2008 | US |