An embodiment of the invention relates generally to devices having multiple cameras, and in particular, to devices having cameras with different focal lengths and a method of implementing cameras with different focal lengths.
Digital cameras are electronic devices that capture an image that is stored in a digital format. Other electronic devices, such as smart phones, tablets or other portable devices, are often equipped with a camera to enable the capture of images. As the demands for improved functionality of cameras or electronic devices having cameras has increased, multiple cameras having different functionality have been implemented in electronic devices. According to some implementations, a dual camera module in an electronic device may contain two different lenses/sensors. For example, a wide angle (wide) lens may be used with a telephoto (tele) lens to approximate the optical zoom effects. The two sensors are operated simultaneously to capture an image of the same scene, respectively referred to as a wide image and a tele image. Because the wide and tele lenses have different focal lengths, each will provide different fields of view (FOV). The wide image reflects a wider FOV, while the tele image has an FOV that may be approximately one half of the wide image for example, although the ratio of the tele image to the wide image could be some other value. While the two images are separately useful, combining portions of the two images together can be difficult.
Therefore, methods of using two images to generate a single image are beneficial.
A method of generating an image from multiple cameras having different focal lengths is described. The method comprising receiving a wide image and a tele image; aligning the wide image and the tele image to overlap a common field of view; correcting for photometric differences between the wide image and the tele image; selecting a stitching seam for the wide image and the tele image; and joining the wide image and the tele image to generate a composite image, wherein a first portion of the composite image on one side of the stitching seam is from the wide image and a second portion of the composite image on the other side of the stitching seam is from the tele image.
An electronic device is also described. The electronic device comprises a first camera having a first focal length; a second camera having a second focal length that is different than the first focal length; a processor coupled to receive images captured by the first camera and the second camera, wherein the processor: receives a wide image and a tele image; aligns the wide image and the tele image to overlap a common field of view; corrects for photometric differences between the wide image and the tele image; selects a stitching seam for the wide image and the tele image; and joins the wide image and the tele image to generate a composite image, wherein a first portion of the composite image on one side of the stitching seam is from the wide image and a second portion of the composite image on the other side of the stitching seam is from the tele image.
A non-transitory computer-readable storage medium having data stored therein representing instructions executable by a processor to perform a method comprising receiving a wide image and a tele image; aligning the wide image and the tele image to overlap a common field of view; correcting for photometric differences between the wide image and the tele image; selecting a stitching seam for the wide image and the tele image; and joining the wide image and the tele image to generate a composite image, wherein a first portion of the composite image on one side of the seam is from the wide image and a second portion of the composite image on the other side of the seam is from the tele image.
According to different implementations for electronic devices using portions of images from multiple cameras, a technique for fusing and stitching wide and tele images captured with different settings to produce a desired zoom image (i.e. a zoom image selected by a user of the electronic device in response to a selection by a user of the electronic device or an automatic zoom operation) is described. The technique takes advantage of the FOV of the wide image and the high-quality of the tele image. If a user zooms the camera to a field of view between the wide lens and tele lens, the wide image and the tele image can be fused/stitched to produce a composite image which can provide better resolution and detail in the center part of the wide field of view, and provide the view of the scene that is out of the field of view of the tele lens as well. As the wide and tele images come from different lenses, they may pass through different image signal processors (ISPs), and are captured at slightly different viewpoints, the images may have geometric misalignment (e.g., parallax effects or optical distortion), photometric misalignment, and resolution mismatch. Therefore, the tele image is registered with the overlap region in the wide image, the photometric difference is corrected, and a desirable stitching seam is found before the two images are blended together. As will be described in more detail below, various techniques can be implemented to generate a desired zoom image having reduced visible artifacts.
While the specification includes claims defining the features of one or more implementations of the invention that are regarded as novel, it is believed that the circuits and methods will be better understood from a consideration of the description in conjunction with the drawings. While various circuits and methods are disclosed, it is to be understood that the circuits and methods are merely exemplary of the inventive arrangements, which can be embodied in various forms. Therefore, specific structural and functional details disclosed within this specification are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the inventive arrangements in virtually any appropriately detailed structure. Further, the terms and phrases used herein are not intended to be limiting, but rather to provide an understandable description of the circuits and methods.
Turning first to
The processor circuit 102 may be coupled to a display 106 for displaying information to a user. The processor circuit 102 may also be coupled to a memory 108 that enables storing information related to data or information associated with achieving a goal. The memory 108 could be implemented as a part of the processor circuit 102, or could be implemented in addition to any cache memory of the processor, as is well known. The memory 108 could include any type of memory, such as a solid state drive (SSD), Flash memory, Read Only Memory (ROM) or any other memory element that provides long term memory, where the memory could be any type of internal memory of the electronic drive or external memory accessible by the electronic device. By providing a local memory, user preferences and other information which a user may desire to keep private is not compromised.
A user interface 110 is also provided to enable a user to both input data and receive data. Some activity tracking may require user's manual input. The user interface could include a touch screen user interface commonly used on a portable communication device, such as a smart phone, smart watch or tablet computer, and other input/output (I/O) elements, such as a speaker and a microphone. The user interface could also comprise devices for inputting or outputting data that could be attached to the mobile device by way of an electrical connector, or by way of a wireless connection, such as a Bluetooth or a Near Field Communication (NFC) connection. A user may also be able to log on to an account associated with an app that tracks a user's progress in achieving a goal.
The processor circuit 102 may also be coupled to other elements that receive input data or provide data, including various sensors 111, an inertial measurement unit (IMU) 112 and a Global Positioning System (GPS) device 113 for activity tracking. For example, an inertial measurement unit (IMU) 112 can provide various information related to the motion or orientation of the device, while GPS 113 provides location information associated with the device. The sensors, which may be a part of or coupled to a mobile device, may include by way of example a light intensity (e.g. ambient light or UV light) sensor, a proximity sensor, an environmental temperature sensor, a humidity sensor, a heart rate detection sensor, a galvanic skin response sensor, a skin temperature sensor, a barometer, a speedometer, an altimeter, a magnetometer, a hall sensor, a gyroscope, WiFi transceiver, or any other sensor that may provide information related to achieving a goal. The processor circuit 102 may receive input data by way of an input/output (I/O) port 114 or a transceiver 116 coupled to an antenna 118.
Turning now to
The processing stages of
The wide and tele images need to be aligned over the overlapping or common region for stitching. A block diagram of a geometric alignment block, such as the geometric alignment block 204, is shown in
A downscale block 502 is adapted to receive the wide luminance image IWY and the tele luminance image ITY at first and second inputs, where a downscaled version of the image having fewer pixels is coupled to a feature point detection and matching block 504 and a line segment detection and matching block 506. The downscale block 502 may be implemented to reduce the complexity of the data to be processed by the remaining elements of the geometric alignment block 204 by reducing the number of pixels that are processed. The feature point detection and matching block 504 enables the alignment of most regions, while the line segment detection and matching block 506 enables the alignment of particular lines or edges of objects. The output of the feature point detection and matching block includes matching feature pairs and a projective model H fitted to the matching feature points which enables global alignment. The outputs of both the feature point detection and matching block 504 and the line segment detection and matching block 506 are provided to a structure guided mesh warping (SGMW) block 508, an output of which is coupled to an interpolation and upscale block 510. A warping block 512 is configured to receive an output of the interpolation and upscale block 510 and the IT image to generate an ÎT image, which is a warped tele image as will be described in more detail below.
The structure guided mesh warping (SGMW) block 508 is sequentially employed to refine the rough alignment achieved by the global homography, as shown and described in reference to
ESGMW=Ep+λ1Eg+λ2Es+λ3El (2)
which includes the data term (Ep), global consistency term (Eq), smoothness term (Es), and line alignment term (El).
The data term measures the error between the feature points in the local-warped tele image and the corresponding feature points in the wide image. Let {(Pj, {tilde over (P)}j); j=1, . . . N} denote the feature point matching pairs in the tele image and wide image. With global homography (i.e. where each pixel is moved with the same transformation), the feature point Pj is transformed to
EP=Σj=1N∥Σm=14αj,m{circumflex over (V)}j,m−{tilde over (P)}j∥2 (3)
where N is the number of matching feature pairs.
For image regions without matched feature points, minimizing the data term only may distort them. Therefore, the global consistency term imposes a constraint that the local-warped image regions should be consistent with the global-warped result when there are no matching pairs in the neighborhood of these regions. Specifically, the global consistency term Eg encourages target vertices
Eg=ΣiWi∥{circumflex over (V)}i−
where Wi=1 indicates Vi has feature points in the neighborhood; otherwise, Wi=0.
The smoothness term encourages the mesh cells to undergo a similarity transformation to preserve object shape and image structures as much as possible during local warping. This constraint is applied on the triangles formed from mesh vertices. Consider a triplet Δ
where R rotates the vector 90 degrees,
The values ut and vt encode the relative relationship of the three vertices and remain the same under a similarity transformation. Therefore, the smoothness term is formulated as follows:
Es=Σt=1TMt∥{circumflex over (V)}t,1−({circumflex over (V)}t,2+ut({circumflex over (V)}t,3−{circumflex over (V)}t,2)+vtR(
where T is the number of triangles used. By way of example, four triangles may be built for each vertex. The saliency map Mt is computed by the color variance in each mesh cell, which promotes shape invariance more in salient regions than in non-salient regions.
A performance of structure guided mesh warping with the above defined terms is shown in
The above terms are used to ensure good alignment in image regions with feature points while avoiding severe local distortion. However, line discontinuity may occur if a line crosses the stitching boundary where there is no sufficient matching pairs. Human eyes are very susceptible to these broken lines. Therefore, a line alignment term is incorporated to ensure that the corresponding line segments are well aligned. Let (lk, {tilde over (l)}k, k=1, . . . K} denote the line segment pairs of the tele and wide images in the vicinity of the stitching boundary. The line alignment term requires that the sum of distances from the points Pk,s on the line lk of the tele image to the corresponding line {tilde over (l)}k of the wide image is the minimum. The points Pk,s are intersections when line segment lk goes across the mesh, and similarly their global-warped results
Ĩk*(Σm=14αk,s,m{circumflex over (V)}k,s,m)=0 (8)
The line alignment term is the accumulation of Eq (8) for all matching line segments,
where {tilde over (l)}k=[ãk, {tilde over (b)}k, {tilde over (c)}k], ãk, {tilde over (b)}k, {tilde over (c)}k are coefficients for line equation ãkx+{tilde over (b)}k,y+{tilde over (c)}k=0 and the normalizer √{square root over (ãk2+{tilde over (b)}k2)} is used to make sure each line segment contributes equally. Having defined all the energy terms, we can minimize the energy function with a sparse linear solver as it is quadratic. The parameters λ1, λ2, λ3 regularize the weight of each term. Once the unknown vertices {circumflex over (V)}i are obtained, the displacements between Vi and {circumflex over (V)}i can be interpolated to warp the whole tele image.
The images in
Turning now to
The wide and tele images are captured by different cameras using potentially different capture settings and may undergo different image processing operations. This may produce visible differences in luminance and chrominance reproduction between the two images. The photometric alignment block 206 corrects for such differences separately in luminance and chrominance channels. The luminance correction is done in two stages: a global correction to account for global luminance mismatch between the wide and tele images, followed by local correction to account for spatially varying luminance mismatches across the two images. In the first stage (i.e. the global luma tone correction block 902), a histogram matching technique may be used to match the global luminance profile of the tele image to that of the wide image. In the second stage (i.e. the local luma tone correction block 904), a spatially-varying gamma correction may be applied by taking the ratios of logs of downsampled averaged versions of the wide and tele image in a blockwise fashion. A series of images in
A first beneficial aspect of the photometric alignment block is in the global luma tone correction block. Since the human eye is most likely to discern a photometric mismatch in smooth regions of the image, the images may be first processed through a spatial frequency filter, and the histograms are computed from filtered results which are mostly smoothly-varying regions. According to one implementation of the special filter, a low pass filter such as an averaging or Gaussian filter is applied to the image and subtracted from the original image. The difference image represents high frequency components in the image. This difference is compared to a threshold, and only regions below the threshold (i.e. regions with little high frequency content) are considered in the histogram computation. A second beneficial aspect of the photometric alignment block lies in the fact that in both the global and local corrections, greater emphasis is placed on matching pixels in the vicinity of the stitching boundary between the wide and tele images. In the global stage, pixels near the boundary are given greater weight in the histogram calculation. In the local luma tone correction stage, gamma correction may be applied preferentially in the vicinity of the boundary, rather than within the entire tele image.
In general, it is difficult to perfectly align the tele image with the overlap region in the wide image, so a seam finding method is developed in stitching seam search block 208 to search a seam for optimal stitching in a way that the tele and wide images can be well stitched in local areas along the seam. According to one embodiment, the search for a seam can be cast into a graph cut problem. A constructed graph is shown in
In the graph, all of the pixels surrounding this margin are constrained to come from either the tele image or the wide image, indicating infinite edge weights from these pixels to its closest label. The edge weight between two graph nodes in the overlapping region is designed to encourage the seam going through texture regions rather than smooth regions and large-scale edges,
where s and t are adjacent pixels, GWd and GTd, are the texture maps in the wide and tele image along the direction d (d depends on the spatial relationship of s and t). This edge weight penalizes less on texture regions. The max-flow cut is used to seek the minimum cost cut of the graph that separates the overlapping regions into different labels, which avoid traversing across objects of recognizable visual importance; to avoid traversing across boundaries of geometrically misaligned objects; and to avoid traversing across of boundaries that could not be photometrically aligned.
Blending as a post-processing step aims to make the transition from one image to another more natural and smooth to achieve edge and texture homogeneity in the presence of color and resolution differences. As will be described in more detail below, the blending process can blend the pixels on both sides of the seam adaptively so that a wide transition band is used for smooth regions and texture regions while a narrow band or even no blending is used for edges to avoid visible ghosting artifacts.
Turning now to
E=ΣkGk⊕(Iw⊕Gg) (10)
where Gg is a fixed-size Gaussian blur kernel to mask out the texture regions whose scale is below the size of the kernel, and Gk denotes a set of filters to approximate first-order derivatives in two directions. In order to ensure no blending on salient edges, the edge map is binarized with a threshold T, where
E=0.5 if Ĩk is there; E=1.0 if E>T; or E=0.0 otherwise (11)
The purpose of the first case (i.e. where E=0.5) is to blend a little more on regions of line segments to distribute the pixel-level errors in stitching two line segments to the neighborhood. The final blur map B is yielded by subtracting the edge map E from the mask M,
B=M−E (12)
By performing a spatial-variant convolution, the adaptive alpha map for blending can be represented by,
α(x)=M⊕xGg,B(x) (13)
where x denotes the pixel location and Gg,B(x) is a Gaussian blur kernel with a size proportional to B(x). Then the stitching result is
IS=α*IT+(1−α)*IW (14)
where the multiplication symbol * indicates a pixel-wise multiplication.
The purpose of blurring wide image first is to mask out the texture regions whose scale is below the size of the blur kernel and finally to increase blending on those texture regions. In order to avoid no blending on salient edges, the edge map is binarized with a threshold. In addition, the edge map on regions of detected line segmentations is set to a lower value for more blending so as to distribute the pixel-level errors in stitching two line segments to the neighborhood. The final blur map is yielded by subtracting the edge map from the mask. By performing a spatial-variant convolution, the adaptive alpha map is obtained for blending the warped tele image and wide image.
Turning now to
The method may further comprises providing a geometric alignment of the wide image and the tele image using an alignment term to align corresponding line segments of the wide image and the tele image. Providing a geometric alignment may comprise establishing a global consistency term to make local warping consistent with global warping in regions where there are no matching feature points. Correcting for photometric differences may comprise providing global luminance correction using smooth regions only, and providing photometric alignment by matching pixels in the vicinity of a stitching boundary between the wide and tele images. Blending the wide image and the tele image may comprise providing adaptive alpha blending, and more particularly performing a spatial-variant convolution to obtain an adaptive alpha map.
While specific elements of the method are described, it should be understood that additional elements of the method, or additional details related to the elements, could be implemented according to the disclosure of
It can therefore be appreciated that new circuits for and methods of implementing a device having cameras with focal lengths has been described. It will be appreciated by those skilled in the art that numerous alternatives and equivalents will be seen to exist that incorporate the disclosed invention. As a result, the invention is not to be limited by the foregoing implementations, but only by the following claims.
Number | Name | Date | Kind |
---|---|---|---|
6853808 | Yasude | Feb 2005 | B1 |
7646404 | Liv | Jan 2010 | B2 |
7751619 | Isao | Jul 2010 | B2 |
7764309 | Deguchi | Jul 2010 | B2 |
8405732 | Ahiska | Mar 2013 | B2 |
9019426 | Han | Apr 2015 | B2 |
9061428 | Brada | Jun 2015 | B2 |
9185291 | Shabtay | Nov 2015 | B1 |
9462167 | Lin | Oct 2016 | B2 |
9578252 | Larola | Feb 2017 | B2 |
9621803 | Salivar | Apr 2017 | B2 |
10051201 | Wang | Aug 2018 | B1 |
20110141226 | Stec | Jun 2011 | A1 |
20110310219 | Kim | Dec 2011 | A1 |
20120002897 | Hamada | Jan 2012 | A1 |
20120075489 | Nishihara | Mar 2012 | A1 |
20130093842 | Yahata | Apr 2013 | A1 |
20150085174 | Shabtay | Mar 2015 | A1 |
20150138312 | Liu | May 2015 | A1 |
20160050374 | Shabtay | Feb 2016 | A1 |
20160360104 | Zhang | Dec 2016 | A1 |
20170150061 | Sabtay | May 2017 | A1 |
20170230585 | Nash | Aug 2017 | A1 |
20170330052 | Hiraga | Nov 2017 | A1 |
20180033155 | Jia | Feb 2018 | A1 |
20180048825 | Wang | Feb 2018 | A1 |
20180096487 | Nash | Apr 2018 | A1 |
20180211360 | Van Leeuwen | Jul 2018 | A1 |
20180253875 | Gorur Sheshagiri | Sep 2018 | A1 |
Entry |
---|
Seam-Driven Image Stitching by Junhong Gao , published May 2013. |
Tone Correction with Dynamic Objects for Seamless Image Mosaic by Yong-Ho Shin, published Sep. 2010. |
Computer Vision: Algorithms and Applications by Richard Szeliski, published Sep. 30, 2010. |
Number | Date | Country | |
---|---|---|---|
20180352165 A1 | Dec 2018 | US |
Number | Date | Country | |
---|---|---|---|
62515243 | Jun 2017 | US |