The invention is related to digital cameras, and in particular but not exclusively, to a method and device for stitching individual images into a panoramic view.
Panoramic photography often employs specialized cameras, equipment and/or software, to capture a sequence of images that are reconstructed into a single image that takes the form of a wide strip with an elongated field of view. Panoramic photography is sometimes known as wide format photography. Typically, a panoramic image shows a field of view that is greater than that of a film camera equipped with a wide angle lens. And a typical film camera equipped with a wide angle lens can generally capture an image with a field of view that covers about 90 degrees across the diagonal of the captured image, e.g., a 35 millimeter film camera with a 22 degree lens can capture such an image. One way to capture a panoramic image is to mount a film camera on a tripod and as the camera is physically rotated about its axis, a succession of images of a scene are taken that are subsequently stitched together by physically cutting and pasting together strips of exposed film where the boundaries between the edges of the film are carefully aligned. In some cases, a wider than usual strip of film can be used with a film camera that employs special movable or stereo optics. In other film cameras, conventional format film, such as 35 millimeter, can be masked during the exposure in the camera to provide a wide aspect or panoramic effect.
Recently, the benefits of electronic photography have led to the general acceptance of digital cameras, which, unlike their film-based counterparts, store captured images in a digital memory such as flash memory. And some digital cameras can also provide a “panorama” feature, which allows a user of the digital camera to capture a sequence of adjacent images that are subsequently “stitched” together into a single image with a wide coverage of field. For example, some digital cameras with a panoramic feature can interface with a personal computer that provides software to externally join together two or more images at their edge boundaries to generate a single image with a wide panoramic format for display on the personal computer. And other digital cameras can employ internal software for in-camera stitching of multiple images into a single image with a wide panoramic effect. However, in-camera stitching based on software processes alone is often hampered by a relatively poor alignment of the images and a relatively long period of time to compose an image having a wide panoramic format based on a plurality of captured images of a scene.
Non-limiting and non-exhaustive embodiments of the present invention are described with reference to the following drawings, in which:
a and 2b graphically illustrates a cylindrical side view and a cylindrical top down view of an image, where a projection is performed with back tracing rays from the sphere surface to a camera center through the image plane;
Various embodiments of the present invention will be described in detail with reference to the drawings, where like reference numerals represent like parts and assemblies throughout the several views. Reference to various embodiments does not limit the scope of the invention, which is limited only by the scope of the claims attached hereto. Additionally, any examples set forth in this specification are not intended to be limiting and merely set forth some of the many possible embodiments for the claimed invention.
Throughout the specification and claims, the following terms take at least the meanings explicitly associated herein, unless the context dictates otherwise. The meanings identified below do not necessarily limit the terms, but merely provide illustrative examples for the terms. The meaning of “a,” “an,” and “the” includes plural reference, and the meaning of “in” includes “in” and “on.” The phrase “in one embodiment,” as used herein does not necessarily refer to the same embodiment, although it may. As used herein, the term “or” is an inclusive “or” operator, and is equivalent to the term “and/or,” unless the context clearly dictates otherwise. The term “based, in part, on”, “based, at least in part, on”, or “based on” is not exclusive and allows for being based on additional factors not described, unless the context clearly dictates otherwise. The term “coupled” means at least either a direct electrical connection between the items being connected, or an indirect connection through one or more passive or active intermediary devices.
Briefly stated, embodiments of the invention are related to a camera that provides for a panorama mode of operation that employs internal software and internal acceleration hardware to stitch together two or more captured images to create a single panorama image with a wide format. In the panorama mode, a plurality of captured images from a live view of a scene are initially projected/converted from rectilinear coordinates into cylindrical coordinates. This cylindrical projection employs look up tables (LUT), sparse sampling, and interpolation acceleration hardware to quickly generate the cylindrical coordinates from the rectilinear coordinates. Second, matches are quickly determined between each pair of images with a block based search that employs motion estimation acceleration hardware. Third, a set of affine transformations are identified that can use the Random Sample of the Consistency (RANSAC) process to align the captured images with each other. Fourth, the identified affine transformations are applied to the images using the interpolation acceleration hardware. Optionally, the color and exposure between the images may be adjusted by utilizing the knowledge of camera parameters for each image or by detecting color transformation between each image pair based on at least an analysis of the overlap (warp) region between adjacent images. The camera parameters may include, but are not limited to, exposure time, aperture, and white balance. Finally, a determination is made for an optimal seam to stitch images together in the overlap region by finding a path which cuts through relatively non-noticeable regions. And once the optimal seams are identified, the images are stitched together into a single image with a wide panoramic effect. Typically, the relatively non-noticeable regions are where the image pairs are substantially similar and there are relatively few details, edges, and the like.
The combination of internal software and specialized acceleration hardware enables significantly faster processing than other embodiments that do not employ the specialized acceleration hardware. Also, the invention provides for improved alignment (registration) and fewer artifacts in panoramic images. In particular, the invention compensates for un-modeled distortions such as camera motion through the affine warping of the projected images.
I. Exemplary Camera Device
In operation, the image sensors 102 receive input light through the optics 101 and, in response, produce analog output primary color signals such as Red, Green and Blue to the A/D converters. The A/D converters convert those input color signals to digital form, which are provided to Integrated Circuit 111.
Integrated Circuit 111 includes processor 104 as well as specialized acceleration hardware, e.g., Image Interpolation Accelerator 109 and Motion Estimation Accelerator 110. Processor(s) 104 and Accelerators 109 and 110 may perform any of various well-known types of processing on those input color signals. The processor(s) 104 and Accelerators 109 and 110 may be or include, for example, any one or more of: a programmed microprocessor or digital signal processor (DSP), a microcontroller, an application specific integrated circuit (ASIC), a programmable logic device (PLD), etc. Integrated Circuit 111 may perform various processes, such as the process illustrated in
The memory 105 may be or include, for example, any one or more of: flash memory, read-only memory, random access memory (RAM), etc.
Processed or raw color data can be output to the display device 106 for display and/or to one or more external devices, such as a computer, printer, video game console, another mobile electronic device, and the like.
II. Exemplary Cylindrical Projection Process
a graphically illustrates a cylindrical side view and
In at least one embodiment, look up tables (LUTs) are employed to calculate trigonometric functions required for the projection, and the cylindrical coordinate values are subsequently interpolated out of these LUTs. Also, an image can be warped quickly into a cylindrical projection by the use of dedicated acceleration hardware, such as Image Interpolation Accelerator 109 shown in
III. Exemplary Motion Estimation Process
For at least one embodiment, an assumption is made that the overlap between two consecutive and adjacent images is predetermined (e.g. 20% of the angular field of view is overlapping between each two consecutive images). Also, an assumption is made that these images are roughly aligned within a range of ±80 pixels in all directions.
Since the images are typically taken from a hand-held mobile device such as a camera, there are several reasons for the images to not be precisely aligned. One reason for poor alignment might be that when an image is taken, the user sees a relatively low resolution preview image. However, the full panorama image is much more detailed so misalignment which might have been unnoticeable on the low resolution preview image to the user could be quite noticeable when the panorama image is later viewed by the user on a high resolution monitor. Another reason for poor alignment might be that the user's hand shakes and moves randomly when the image is taken. Also, another reason for poor alignment could be that the user does not perform the ideal rotation motion between two consecutive images, which can cause un-modeled distortions between each pair of consecutive images.
To compensate for the alignment inaccuracy, automatic detection of the precise image alignment is necessary. The alignment is achieved by estimating the parameters of an image transformation that minimizes some error metric between the images. An example of such a parametric transformation is the affine transformation:
x′=a11x+a12y+a13 y′=a21x+a22y+a23
The transformation is estimated and applied on the images after they have been warped to a cylindrical surface.
The affine transform is relatively simple to evaluate; it is a generalization of the translation transformation required in the ideal case of camera pure rotation; and it is relatively easy to estimate even in a system where fixed-point arithmetic is available.
Correspondences are found between two (projected) images, which are locations in one image which match locations in another image. This is done by selecting the overlap region of one of the images as a ‘target’, and the corresponding overlap region of the other image as ‘reference’, and performing block based motion search from target to reference.
For motion search, many techniques can be employed. However, in at least one embodiment, the motion search is a mean-normalized SAD (sum of absolute differences) based: i.e., the motion of a block of pixels in the target frame is determined by the location of the block in the reference frame for which the MNSAD is minimal. Also, the MNSAD algorithm is described in greater detail in at least a recent publication by Tzur, M., Pinto, V., and Pinhasov, E., in Published Patent Application No. 2008/0291288 A1, pub. date Nov. 27, 2008, entitled “TECHNIQUE OF MOTION ESTIMATION WHEN ACQUIRING AN IMAGE OF A SCENE THAT MAY BE ILLUMINATED WITH A TIME VARYING LUMINANCE.”
Where:
Then:
Where BX, BY define the size of the block, and AX, AY define the search area.
The motion search is performed quickly by a specific hardware accelerator such as implemented on Integrated Circuit 111 with Motion Estimation Accelerator 110 in
The motion search can be performed on a low resolution copy of the images, and then refined by performing an additional search in higher resolution. The motion search can be further improved by applying corner detection on the image, since corners are more likely to return reliable motion vectors. Also, robustness measure can be extracted by observing the MNSAD map as a function of the (u,v) motion vector and seeing if the minima that produced MV(x,y) is unique.
After motion search there is a set of correspondences in each image pair:
(xi, yi)(xi′, yi′)=(xi+MVx(xi, yi), yi+MVy(xi, yi))
An assumption is made that there is a set of K correspondences between the two images:
(xi, yi)(xi′, yi′)
And there is a need to estimate an affine function that links the images:
{circumflex over (x)}′=h1x+h2y+h3 ŷ′=h4x+h5y+h6
By minimizing an SSE (sum of squared error) energy function:
This is a linear regression problem with a closed solution, which we will show explicitly below.
The correspondences reported by motion estimation do not, in general, contain only ‘true’ correspondences. There can be many factors which contribute to ‘false’ correspondences:
Local Motion—if an object moved within the scene, than its motion is not related to the global displacement caused by the camera rotation. The motion vectors related to it should be discarded.
Disappearance/Occlusion—a target block could, in some cases, not exist in the reference image; this can be because it has moved out of the image area or been occluded due to local motion/motion parallax. In this case false motion vectors are usually introduced.
False Motion—though the motion estimation flow is robust, it is not perfect. Sometimes the search yields erroneous motion vectors; these should be discarded.
A RANSAC (random sample consensus) algorithm is employed for the estimation of the transformation from these correspondences. The RANSAC algorithm is discussed in greater detail in at least an academic publication by Fischler, Martin A. and Bolles, Robert C., entitled “RANDOM SAMPLE CONSENSUS: A PARADIGM FOR MODEL FITTING WITH APPLICATIONS TO IMAGE ANALYSIS AND AUTOMATED CARTOGRAPHY”, Communications of the ACM, vol. 24, num. 6, June 1981. In the process, transformation estimates and reject outliers are iteratively built. The final transformation is calculated by taking the group of all inliers (which are labeled I), and estimating a transformation by the least squares (linear regression) equations which solves the minimization problem presented above.
Where the summation is over iεI, and N=|I|
Once an affine transformation has been estimated linking each pair of images, each image is rendered on the panorama canvas by applying the appropriate affine transformation. This is done very quickly using the aforementioned Image Interpolation Accelerator hardware.
IV. Exemplary Image Stitching
The final stage in generating the panoramic image is finding the optimal seam between each pair of consecutive adjacent images which decides where to locate pixels of each image. This is done by selecting the path of least energy which crosses the overlap region. The energy which is minimized is given by:
The path is found using the dynamic programming algorithm which is described in greater detail in at least a publication by Milgram, David. L., entitled “ADAPTIVE TECHNIQUES FOR PHOTOMOSAICING,” IEEE Transactions On Computers, Vol. C-26 Issue 11, November 1977 and also described in greater detail in at least another publication by Milgram, David L., entitled “COMPUTER METHODS FOR CREATING PHOTOMOSAICS,” IEEE Transactions on Computers, Vol. C-24 issue 11, November 1975.
In this way, the path avoids pixels in which there is significant difference between the two images but also avoids edges and detailed regions characterized by large gradient size.
V. Exemplary Panoramic Image Building
A flow chart of exemplary process 500 for building the panorama canvas is shown in
Additionally,
At block 708, a scaling and translation transforms are computed so that all of the captured consecutive images can fit onto a panoramic image canvas. At block 710, a first consecutive image is scaled and translated and then warped onto the panoramic canvas. At block 712, the computed scaling and translation transformations are performed on the next consecutive image which is warped onto the panoramic canvas. At block 714, the optimal stitch lines between the previous and next warped image in an overlay region are stitched together in the panoramic canvas.
At decision block 716, a determination is made as to whether a next image is available. If true, the process loops back to block 712 and performs substantially the same actions as listed above. However, if the determination at decision block 712 is negative, then the process steps to block 718 where the panoramic canvas is stored as a single image for subsequent display to a user. The process then returns to performing other actions.
VI. Exemplary Image Interpolation Acceleration Hardware
The Image Interpolation Acceleration hardware is employed to quickly perform any kind of transformation to an image.
Assume there is a transformation for the image coordinates (x′,y′)=T(x,y). (x′,y′) are coordinates in the output image (for example, for a cylindrical transformation (x′,y′) is actually (φ,h)). Assume also that the hardware can evaluate the inverse transformation (x,y)=T−1(x′,y′).
Warping an image means, for each output pixel at (x′,y′):
(x,y)=T−1(x′,y′)
If (x,y) is inside the input image
OutputPixel(x′,y′) =Interpolate from the input pixels around (x,y)
Otherwise
OutputPixel(x′,y′)=0
The actual pixel value may be calculated by the bi-linear interpolation algorithm:
m=floor(x)
fm=x−m
n=floor(y)
fn=y−n
OutputPixel(x′,y′)=(1−fm)*(1−fn)*InputPixel(m,n)+fm*(1−fn)*InputPixel(m+1,n)+(1−fm)*fn*InputPixel(m,n+1)+fm*fn*InputPixel(m+1,n+1)
Assuming that there is prepared sparse samples of the inverse transformation. i.e., there is stored in memory a set of numbers TXi,j and TYi,j so that:
(TXi,j,Ti,j)=T−1(x′=BX·i,y′=BY·j)
Where Bx and By are the width and height of the sparse grid blocks. In this case, there can be given an approximation of the inverse transformation for any output pixel (x′,y′) by interpolating between the values, in a very similar manner to which the pixel values are interpolated.
T(x′,y′) is calculated by
i=floor(x′/Bx)
fi=x′/Bx−i
j=floor(y′/By)
fj=y′/By−y
T(x′,y′)≈(1−fi)*(1−fj)*(TXi,j,TYi,j)+fi*(1−fj)*(TXi+1,j,TYi+1,1)+(1−fi)*fj*(TXi,j+1, TYi,j+1)+fi*fj*(TXi+1,j+1,TYi+1,j+1)
In this way the hardware can very quickly evaluate any transformation, even the cylindrical transformation which involves evaluation of trigonometric functions.
The sparse grid can be pre-calculated in the camera calibration stage or calculated in real time by CPU. Since there are not many values in the sparse grid (typical values that are used for Bx and By are 16, so there are only 1/256 values in sparse grid as there are pixels in the image), it would not take long to evaluate every element in it. However for systems which do not have the resources to even do this, the look up table method can be utilized to quickly evaluate the trigonometric functions related to the cylindrical transformation.
The backwards transformation from cylindrical coordinates is given by:
(x,y)=T−1(φ,h)=(ƒ tan(φ), ƒ·h·sec(φ))
This transformation can be approximated quickly if there is a look up table of the tan(.) and sec(.) functions. For example, a look up table of the tan(.) function is a set of values Ti, i=0, . . . ,N−1 such that:
The table covers value through [φmin, φmax). To calculate an approximation of tan(.) value within the range, the lookup table can be used by interpolating between its values. Bilinear interpolation is used:
tan(φ) for φ in [φmin, φmax) using the LUT
i=floor(φ−φmin)/N)
fi=(φ−φmin)/N−i
tan(φ)≈(1−fi)*Ti+fi*Ti+1
By using look up tables of 128 values each in the range of [0, π/4], the image warping results are visually undistinguishable from the transformation which uses the precise functions.
VII. Exemplary Motion Estimation Acceleration Hardware
The exemplary motion estimation hardware is capable of quickly locating motion vectors between a target and reference image by minimizing the MNSAD over some region as stated above. In some embodiments it is also capable of detecting corners in the target image; corner detection may be performed using any of the well known algorithms used in the art, and it is useful for specifying specific points from which the motion estimation should originate
The motion estimation hardware reads the target and reference images from memory and performs the arithmetic and accounting to produce a list of motion vectors.
It will be understood that each block of the flowchart illustrations, and combinations of blocks in the flowcharts, may be implemented by a combination of hardware-based systems and software instructions. The software instructions may be executed by a processor to cause a series of operational steps to be performed by the processor to produce a computer-implemented process such that the instructions, which execute on the processor, provide steps for implementing some or all of the actions specified in the flowchart block or blocks.
Accordingly, blocks of the flowchart illustration support combinations of means for performing the specified actions, combinations of steps for performing the specified actions and program instruction means for performing the specified actions. It will also be understood that each block of the flowchart illustration, and combinations of blocks in the flowchart illustration, can be implemented by special purpose hardware-based systems, which perform the specified actions or steps, or combinations of special purpose hardware and computer instructions.
The various embodiments have been described above in light of certain mathematical relationships. A person skilled in the art would note that these mathematical relationships are subject to many possible computer implementations, which are all within the scope of the invention. Furthermore, it should be noted that the language of mathematics allows many ways to convey the same relationship. All such variations of the above described equations and relationships are naturally within the scope of the invention.
The above specification, examples, and data provide illustrative embodiments of the present invention. The above specification provides a complete description of the manufacture and use of the composition of the invention. Since many embodiments of the invention can be made without departing from the spirit and scope of the invention, the invention resides in the claims hereinafter appended.
This application is a utility patent application based on a previously filed U.S. Provisional Patent Application, Ser. No. 61/092,601 filed on Aug. 28, 2008, the benefit of which is hereby claimed under 35 U.S.C. §119(e) and incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
6023588 | Ray et al. | Feb 2000 | A |
6075905 | Herman et al. | Jun 2000 | A |
6677981 | Mancuso et al. | Jan 2004 | B1 |
6717608 | Mancuso et al. | Apr 2004 | B1 |
6731305 | Park et al. | May 2004 | B1 |
6771304 | Mancuso et al. | Aug 2004 | B1 |
6785427 | Zhou | Aug 2004 | B1 |
6834128 | Altunbasak et al. | Dec 2004 | B1 |
6885392 | Mancuso et al. | Apr 2005 | B1 |
6930703 | Hubel et al. | Aug 2005 | B1 |
7197192 | Edwards | Mar 2007 | B2 |
7289147 | Webb | Oct 2007 | B2 |
7373017 | Edwards et al. | May 2008 | B2 |
7375745 | Rai et al. | May 2008 | B2 |
7424218 | Baudisch et al. | Sep 2008 | B2 |
7460730 | Pal et al. | Dec 2008 | B2 |
7639897 | Gennetten et al. | Dec 2009 | B2 |
7656429 | Larson | Feb 2010 | B2 |
7711262 | Park et al. | May 2010 | B2 |
7746375 | Ketelaars et al. | Jun 2010 | B2 |
7746404 | Deng et al. | Jun 2010 | B2 |
7860343 | Tico et al. | Dec 2010 | B2 |
7965332 | Chiu et al. | Jun 2011 | B2 |
8279288 | Son et al. | Oct 2012 | B2 |
8350892 | Hayashi | Jan 2013 | B2 |
20010030693 | Fisher et al. | Oct 2001 | A1 |
20030063816 | Chen et al. | Apr 2003 | A1 |
20030103683 | Horie | Jun 2003 | A1 |
20040189849 | Hofer | Sep 2004 | A1 |
20040201755 | Norskog | Oct 2004 | A1 |
20050089244 | Jin et al. | Apr 2005 | A1 |
20060050152 | Rai et al. | Mar 2006 | A1 |
20060182437 | Williams et al. | Aug 2006 | A1 |
20070025723 | Baudisch et al. | Feb 2007 | A1 |
20070081081 | Cheng | Apr 2007 | A1 |
20070237423 | Tico et al. | Oct 2007 | A1 |
20080043093 | Song | Feb 2008 | A1 |
20080056612 | Park et al. | Mar 2008 | A1 |
20080062254 | Edwards et al. | Mar 2008 | A1 |
20080074489 | Zhang et al. | Mar 2008 | A1 |
20080074506 | Oh et al. | Mar 2008 | A1 |
20080158342 | Jeong et al. | Jul 2008 | A1 |
20080159653 | Dunki-Jacobs et al. | Jul 2008 | A1 |
20080170803 | Forutanpour | Jul 2008 | A1 |
20080291288 | Tzur et al. | Nov 2008 | A1 |
20090028462 | Habuka et al. | Jan 2009 | A1 |
20090208062 | Sorek et al. | Aug 2009 | A1 |
20100020190 | Kawakatsu et al. | Jan 2010 | A1 |
20100033553 | Levy | Feb 2010 | A1 |
20100054628 | Levy et al. | Mar 2010 | A1 |
20100265313 | Liu et al. | Oct 2010 | A1 |
20120177253 | Tseng et al. | Jul 2012 | A1 |
20130038680 | Mashiah | Feb 2013 | A1 |
Number | Date | Country |
---|---|---|
2242262 | Oct 2010 | EP |
2008004150 | Jan 2008 | WO |
Entry |
---|
U.S. Appl. No. 12/536,728, filed Aug. 6, 2009; Noam Levy, inventor. |
Milgram, David L., Computer Methods for Creating Photomosaics, IEEE Transactions on Computers vol. C-24 issue 11, Nov. 1975, pp. 1113-1119. |
Milgram, David L., Adaptive Techniques for Photomosaicking, IEEE Transactions on Computers vol. C-26 issue 11, Nov. 1977, pp. 1175-1180. |
Fischler, Martin A. and Bolles, Robert C., Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography, Communications of the ACM, vol. 24 No. 6, Jun. 1981, pp. 381-395. |
Ha, Seong Jong et al., Panorama Mosaic Optimization for Mobile Camera Systems, IEEE Transactions on Consumer Electronics, vol. 53 issue 4, Nov. 2007, pp. 1217-1225. |
International Searching Authority; International Search Report & Written Opinion dated Sep. 18, 2009 for PCT/US2009/053151, 7 pages. |
International Search Report and Written Opinion for International Patent Application No. PCT/US2009/055265 mailed Oct. 15, 2009. |
Search Report for British Patent Application No. GB1205402.9 dated Jul. 24, 2012. |
Official Communication for U.S Appl. No. 12/536,728 mailed Nov. 2, 2012. |
Official Communication for U.S. Appl. No. 12/536,728 mailed Feb. 27, 2013. |
Number | Date | Country | |
---|---|---|---|
20100054628 A1 | Mar 2010 | US |
Number | Date | Country | |
---|---|---|---|
61092601 | Aug 2008 | US |