1. Technical Field
The non-limiting embodiments disclosed herein relate generally to multimedia systems incorporating cameras and, more particularly, to systems and methods that utilize multiple cameras of similar and dissimilar types that capture images from different viewpoints and operate together or independently to produce high quality images and/or meta-data.
2. Brief Description of Prior Developments
Array cameras and light-field (plenoptic) cameras use microlens arrays to capture 4D light field information. Such cameras require significant computation to produce nominal high quality images even if a disparity map or refocus ability is not desired. In addition, the use of such cameras does not provide the flexibility to trade-off output quality, computation load, or power consumption.
The following summary is merely intended to be exemplary. The summary is not intended to limit the scope of the claims.
In accordance with one embodiment, an apparatus comprises a main camera configured to produce a high quality image; at least two auxiliary cameras configured to produce images of lower quality; and electronic circuitry linked to the main camera and the at least two auxiliary cameras, the electronic circuitry comprising a controller having a memory and a processor, the electronic circuitry configured to operate on data pertaining to the high quality image and pertaining to the images of lower quality to produce an enhanced high quality image as output data.
In accordance with another embodiment, a method comprises acquiring data from a main camera, the data pertaining to a high quality image; acquiring data from at least two auxiliary cameras, the data pertaining to at least two images of lower quality; combining the data pertaining to the high quality image and the data pertaining to the at least two images of lower quality; producing metadata pertaining to the acquired data; enhancing the high quality image with the metadata; and outputting the high quality image as image data.
In accordance with another embodiment, a method comprises acquiring data pertaining to a high quality image and data pertaining to at least two images of lower quality; using a dense correspondence algorithm to generate dense correspondence between the data pertaining to the high quality image and the data pertaining to the at least two images of lower quality; linking correspondence points from the dense correspondence generated to disparity values; grouping the disparity values into levels; computing a best fit homography transform of the disparity values for each level; and transforming the disparity values for each level to a high quality image.
The foregoing embodiments and other features are explained in the following description, taken in connection with the accompanying drawings, wherein:
Referring to
In one example embodiment, the system 10 comprises a main camera 12 and two or more auxiliary cameras 14a and 14b, the main camera 12 and the auxiliary cameras 14a and 14b being disposed in communication with electronic circuitry in the form of a controller 16. More than two auxiliary cameras 14a and 14b may produce a denser light field. The example embodiments of the system 10 allow high quality image capture to produce optionally computable metadata such as disparity maps, depth maps, and/or occlusion maps. The high quality image is acquired from the main camera 12, while the disparity map (and other maps and/or metadata) is obtained using a combination of the images from the main camera 12 and images from the two or more auxiliary cameras 14a and 14b, which obtain images of lower quality. As used herein, high quality refers to high resolution (e.g., pixel resolution, which is typically about 12 megapixels (MP) to about 18 MP and can be has great as about 24 MP to about 36 MP), larger sensors (35 millimeters, APS-C, or micro 4/3), larger and superior optical lens systems, improved processing, higher ISO range, and the like. As used herein, lower quality refers to lower resolution as compared to the main camera 12 (e.g., cameras that are used in mobile phones have smaller sensors, resolutions of about 8 MP to about 12 MP, smaller lenses, very large depths of field (limited bokeh), and the like). Cameras of lower quality may be pinhole cameras where most parts of the images obtained therefrom are sharp. The example system 10 is more flexible than previous systems and addresses use-cases thereof more efficiently while at the same time requiring less computational power. For example, given a stereo image pair and a corresponding disparity map, one example method of using the system 10 may transfer a disparity map to a new view point from where an overlapping image is available. The configurations and settings of the main camera 12 and the auxiliary cameras 14a and 14b are optimized such that in the event that some parameters of the certain cameras are varied, the system 10 operates to produce expected results.
With regard to the two or more auxiliary cameras 14a and 14b, in one embodiment, both may be of the same type (for example, both may be color or both may be monochrome). In another embodiment, both of the two or more auxiliary cameras 14a and 14b may be slightly different (for example, one may be high resolution and the other may be low resolution (hence more sensitive to light since the pixels can be larger)). In another embodiment, the two or more auxiliary cameras 14a and 14b may be markedly different, where one is color and the other is monochrome or infrared (IR). In still another embodiment, where there are more than two of the auxiliary cameras 14a and 14b in the calibrated set, the auxiliary cameras may comprise a mixture of color, monochrome, IR, and the like.
As shown in
The main camera 12 is configured to acquire the high quality image 26, which in itself serves as a substantial portion of the overall photographic use-case. The auxiliary cameras 14a and 14b are configured to acquire the images 28 (or data pertaining to the images 28), which are combined with the image 26 (or data pertaining to the image 26) from the main camera 12 via the computational photography algorithms defined at least in part by the processor 20 to produce the metadata 34. Such metadata 34 includes, but is not limited to, disparity maps, depth maps, occlusion maps, defocus maps, sparse light fields, and the like. The metadata 34 can be used either automatically (for example, by autonomous processing by the processor 20) to enhance the high quality image 26 from the main camera 12, or it can be subject to user-assisted manipulation. The metadata 34 can also be used to gain additional information pertaining to the scene intended for capture by the main camera 12 and the auxiliary cameras 14a and 14b and hence can be used for efficient continuous image capture from the main camera 12 (for example, efficient autofocus, auto-exposure, and the like).
The unencumbered communication of intrinsic and extrinsic parameters between the cameras enables the processor 20 to perform accurate and efficient inter-image computations (such as disparity map computation) using the computational photography algorithms. In the system 10, the auxiliary cameras 14a and 14b are strongly calibrated with reference to each other, while the main camera 12 assumes varying parameters (for instance, focal length, optical zoom, optical image stabilization, or the like). As used herein, “strongly calibrated” refers to cameras having known parameters (that is, the intrinsic and extrinsic parameters are known for all operating conditions), and “weakly calibrated” refers to cameras having varying intrinsic and extrinsic parameters. Since the parameters of the main camera 12 are permitted to change during the operation of the system 10, only the approximate intrinsic and extrinsic parameters (between the main camera 12 and the auxiliary cameras 14a and 14b) leading to weak calibration are determined. This means that the inter-image computations between the main camera and the auxiliary cameras 14a and 14b become less efficient and inaccurate. To compensate for this decrease in efficiency and accuracy, the strong calibrations between the auxiliary cameras 14a and 14b can be used to combine obtained information with the weakly calibrated main camera to perform computations of increased efficiency and accuracy.
In some example embodiments, the requirement of strong calibration of the auxiliary cameras 14a and 14b relative to each other can be circumvented. However, doing so may lead to loss in computational efficiency and accuracy of the metadata 34. Since the strong calibration is generally only desired on the auxiliary cameras 14a and 14b and not on the main camera 12, such a requirement is readily amenable to cost effective manufacturing.
Referring now to
Referring now to
Referring now to
Referring now to
Referring back to
Furthermore, the system 10 as described herein produces a higher quality color image (as compared to previous systems) which in itself can be accepted as a final image in over 80% of use cases. However, with an optional additional computation, the auxiliary camera images are combined with the main camera image to produce a suitable quality disparity map (comparable to what previous systems are capable of producing) at a lower computational cost.
Moreover, most systems and methods that use array cameras and light-field cameras use direct warping of each individual disparity value using geometric information. This means that elements of an image are processed according to their image coordinates and outputs that are image coordinates in the resulting image are produced.
Additionally, the system 10 as described herein also capitalizes on the fact that many potential applications can be accomplished using a sparse light field.
The example systems as described herein may also provide higher degrees of control over image quality (in comparison to previous systems); zero-computation for nominal high-quality images; computation of disparity maps on an as-needed basis; automatic and semiautomatic image segmentation; occlusion map generation (auxiliary camera sees behind objects); increased blur (e.g., the use of bokeh) based on depth map; de-blurring of out-of-focus parts of an image; parallax views; stereo-3D images; and/or approximations of 3D models of a scene.
In one example embodiment, an apparatus comprises a main camera configured to produce a high quality image; at least two auxiliary cameras configured to produce images of lower quality as compared to the main camera; and electronic circuitry linked to the main camera and the at least two auxiliary cameras, the electronic circuitry comprising a controller having a memory and a processor, the electronic circuitry configured to operate on data pertaining to the high quality image and pertaining to the images of lower quality to produce an enhanced high quality image as output data.
The processor may utilize computational photography algorithms. The computational photography algorithms may utilize dense correspondence and best fit homography techniques. The output data produced may comprise a combination of high quality image data and metadata. The metadata may comprise one or more of disparity maps, depth maps, occlusion maps, defocus maps, and sparse light fields. The main camera may assume varying parameters related to the operation of the main camera. The at least two auxiliary cameras may have intrinsic and extrinsic operating parameters that are known for all operating conditions. The apparatus may comprise a point-and-shoot camera, a mobile camera, a professional camera, a medical imaging device, a camera for use in an automotive, aviation, or marine application, or a security camera.
In another example embodiment, a method comprises acquiring data from a main camera, the data pertaining to a high quality image; acquiring data from at least two auxiliary cameras, the data pertaining to at least two images of lower quality as compared to the high quality image; combining the data pertaining to the high quality image and the data pertaining to the at least two images of lower quality; producing metadata pertaining to the acquired data; enhancing the high quality image with the metadata; and outputting the high quality image as image data.
Producing metadata may comprise using computational photography algorithms embodied in a controller comprising a processor and a memory. Using computational photograph algorithms may comprise using a dense correspondence algorithm to generate dense correspondence between the acquired data pertaining to the high quality image and the acquired data pertaining to the at least two images of lower quality. A best fit homography transform may be computed from the dense correspondence generated. Enhancing the high quality image with the metadata may be one of controlled by a processor and controlled by a user.
In another example embodiment, a method comprises acquiring data pertaining to a high quality image and data pertaining to at least two images of lower quality as compared to the high quality image; using a dense correspondence algorithm to generate dense correspondence between the data pertaining to the high quality image and the data pertaining to the at least two images of lower quality; linking correspondence points from the dense correspondence generated to disparity values; grouping the disparity values into levels; computing a best fit homography transform of the disparity values for each level; and transforming the disparity values for each level to a high quality image.
Transforming the disparity values for each level to a high quality image may be an affine transformation. Transforming the disparity values for each level to a high quality image may comprise starting the dense correspondence algorithm from a level that corresponds to zero disparity and proceeds towards the level of highest disparity. Using the dense correspondence algorithm to generate dense correspondence may comprise using electronic circuitry comprising a controller having a memory and a processor. A dense correspondence map established by the data pertaining to a high quality image and the data pertaining to at least two images of lower quality may be used to reduce errors in a disparity map obtained using only the data pertaining to at least two images of lower quality.
In another example embodiment, a non-transitory computer readable storage medium, comprising one or more sequences of one or more instructions which, when executed by one or more processors of an apparatus, causes the apparatus to at least use a dense correspondence algorithm to generate dense correspondence between data pertaining to a high quality image and data pertaining to at least two images of lower quality as compared to the high quality image; link correspondence points from the dense correspondence generated to disparity values; group the disparity values into levels; and compute a best fit homography transform of the disparity values for each level. The disparity values for each level may be transformed to a high quality image.
In another example embodiment, an apparatus comprises a first camera configured to produce a high quality image; a second camera configured to produce images of lower quality; and electronic circuitry linked to the first camera and the second camera, the electronic circuitry comprising a controller having a memory and a processor, the electronic circuitry configured to operate on data pertaining to the high quality image and pertaining to the images of lower quality to produce an enhanced high quality image as output data. One of the first camera and the second camera may be strongly calibrated and the other of the first camera and the second camera may be weakly calibrated. In the alternative, the first camera and the second camera may be strongly calibrated relative to each other. When the first and second cameras are strongly calibrated relative to each other; defocus information in the first camera may be used as an additional cue to disambiguate disparity values to further enhance a disparity map.
Any of the foregoing example embodiments may be implemented in software, hardware, application logic, or a combination of software, hardware, and application logic. The software, application logic, and/or hardware may reside in the video player (or other device). If desired, all or part of the software, application logic, and/or hardware may reside at any other suitable location. In an example embodiment, the application logic, software, or an instruction set is maintained on any one of various conventional computer-readable media. A “computer-readable medium” may be any media or means that can contain, store, communicate, propagate, or transport instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer. A computer-readable medium may comprise a computer-readable storage medium that may be any media or means that can contain or store the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.
It should be understood that the foregoing description is only illustrative. Various alternatives and modifications can be devised by those skilled in the art. For example, features recited in the various dependent claims could be combined with each other in any suitable combination(s). In addition, features from different embodiments described above could be selectively combined into a new embodiment. Accordingly, the description is intended to embrace all such alternatives, modifications, and variances which fall within the scope of the appended claims.