The present invention relates to a camera that includes two sensors, each with its own color pattern. A processor in the image capture device produces an enhanced image using pixel values from an image from each sensor.
Stereo and multi-view imaging has a long and rich history stretching back to the early days of photography. Stereo cameras employ multiple lenses to capture two images, typically from points of view that are horizontally displaced, to represent the scene from two different points of view. The multiple images that result are displayed to a human viewer, to let the viewer experience an impression of 3D. The human visual system then merges information from the pair of different images to achieve the impression of depth.
Stereo cameras can come in any number of configurations. For example, a lens and a sensor unit are attached to a port on a traditional single-view digital camera to enable the camera to capture two images from slightly different points of view, as described in U.S. Pat. No. 7,102,686. In this configuration, the lenses and sensors of each unit are similar and enable the interchangeability of parts. Other cameras contain two or more lenses are described, such as in U.S. Patent Application Publication 2008/0218611, where a camera has two lenses and sensors and an improved image (with respect to sharpness, for example) is produced.
In another line of teaching, U.S. Pat. No. 6,476,865 describes an image sensing device containing both color and luminance photosites. The color photosites are covered with a transmissive color filter, such as red, green or blue which permit light energy from only a certain range of the visible spectrum to pass. This arrangement has the advantage of improved dynamic range because the luminance photosites have a desirable performance in low light situations, and the color photosites, which accumulate fewer photons in the same light exposure than the luminance photosites, have the desirable property that they do not clip, and have desirable performance in situations with more abundant light. In U.S. Pat. No. 6,373,523, a single-lens CCD camera with two CCDs having mutually different color filter arrays is described. A prism beam splitter is used to split the image into different colors that physically are read by two different color sensor patterns.
Further, there exist in the art many methods for image colorization. Colorization refers to the process of adding chrominance values to grayscale images. Existing methods of color image enhancement have focused upon transferring the “color mood” from one image to another. In these cases, the actual contents of the image can vary greatly between the images, and the images are not simultaneously presented to a viewer. In U.S. Pat. No. 4,984,072, a method of color enhancing regions in images having similar desired hues is described, in which color lookup tables are used in order to convert gray-scale values into unique values of hue, luminance and saturation. This method yields a one-to-one mapping within a region for each gray-scale value as the color lookup table is predetermined by the mapping of a gray-scale value in a region to a hue, luminance and saturation value. The color lookup table is generated from a similar image, resulting in similar colors being applied to the grayscale image. However, it does not enforce any spatial correspondence between the two images, resulting in images with potentially different color values for the same pixel in both images if applied to a stereo pair.
In accordance with the present invention, there is provided an image capture device for producing an enhanced digital image of a scene comprising:
(a) a lens arrangement having a first lens associated with a first digital image sensor and a second lens associated with a second digital image sensor; the first digital image sensor having photosites of a first predetermined color pattern for producing a first digital image of the scene; the second digital image sensor having photosites of a different second predetermined color pattern for producing a second digital image of the scene;
(b) a device for causing the lens arrangement to capture a first digital image from the first digital image sensor and a second digital image from the second digital image sensor at substantially the same time;
(c) a processor for aligning the first and second digital images; and
(d) the processor using values of the second image based on the alignment between the first and second images to operate on the first digital image to produce the enhanced digital image having corrected color values.
An advantage of the present invention is that it provides an effective way for capturing multiple views of a scene with high dynamic range and low noise by using different predetermined color filter patterns.
For convenience of reference, it should be understood that the image or video 132, 142 refers to both still images and videos or collections of images. Further, the images or videos 132, 142 are images that are captured with image sensors 130, 140. The images or videos 132, 142 can also have an associated audio signal. The system of
In some embodiments, the image sensors 130, 140 can also capture and cause a video clip to be stored. The digital data is stored in a RAM buffer memory 322 and subsequently processed by a digital processor 12 controlled by the firmware stored in firmware memory 328, which is flash EPROM memory. The digital processor 12 includes a real-time clock 324, which keeps the date and time even when the system and digital processor 12 are in their low power state.
The digital processor 12 operates on or provides various image sizes selected by the user or by the system. Images are typically stored as rendered sRGB image data is then JPEG compressed and stored as a JPEG image file in the memory. The JPEG image file will typically use the well-known EXIF (EXchangable Image File Format) image format. This format includes an EXIF application segment that stores particular image metadata using various TIFF tags. Separate TIFF tags are used, for example, to store the date and time the picture was captured, the lens F/# and other camera settings for the image capture device 30, and to store image captions. In particular, the ImageDescription tag is used to store labels. The real-time clock 324 provides a capture date/time value, which is stored as date/time metadata in each EXIF image file. Videos are typically compressed with H.264 and encoded as MPEG4.
In some embodiments, the geographic location is stored with an image captured by the image sensors 130, 140 by using, for example a GPS unit 329. Other methods for determining location can use any of a number of methods for determining the location of the image. For example, the geographic location is determined from the location of nearby cell phone towers or by receiving communications from the well-known Global Positioning Satellites (GPS). The location is preferably stored in units of latitude and longitude. Geographic location from the GPS unit 329 is used in some embodiments to regional preferences or behaviors of the display system.
The graphical user interface displayed on the display 90 is controlled by user controls 60. The user controls 60 can include dedicated push buttons (e.g. a telephone keypad) to dial a phone number; a control to set the mode, a joystick controller that includes 4-way control (up, down, left, and right) and a push-button center “OK” switch, or the like. The user controls 60 are used by a user to indicate user preferences 62 or to select the mode of operation or settings for the digital processor 12 and image capture devices 130, 140.
The display system can in some embodiments access a wireless modem 350 and the internet 370 to access images for display. The display system is controlled with a general control computer 341. In some embodiments, the system accesses a mobile phone network 358 for permitting human communication via the system, or for permitting signals to travel to or from the display system. An audio codec 340 connected to the digital processor 12 receives an audio signal from a microphone 342 and provides an audio signal to a speaker 344. These components are used both for telephone conversations and to record and playback an audio track, along with a video sequence or still image. The speaker 344 can also be used to inform the user of an incoming phone call. This is done using a standard ring tone stored in firmware memory 328, or by using a custom ring-tone downloaded from the mobile phone network 358 and stored in the memory 322. In addition, a vibration device (not shown) is used to provide a quiet (e.g. non audible) notification of an incoming phone call.
The interface between the display system and the general purpose computer 341 is a wireless interface, such as the well-known Bluetooth® wireless interface or the well-known 802.11b wireless interface. The images or videos 132, 142 are received by the display system via an image player 375 such as a DVD player, a network, with a wired or wireless connection, via the mobile phone network 358, or via the internet 370. It should also be noted that the present invention is implemented in a combination of software and hardware and is not limited to devices that are physically connected or located within the same physical location. The digital processor 12 is coupled to a wireless modem 350, which enables the display system to transmit and receive information via an RF channel 250. The wireless modem 350 communicates over a radio frequency (e.g. wireless) link with the mobile phone network 358, such as a 3GSM network. The mobile phone network 358 can communicate with a photo service provider, which can store images. These images are accessed via the Internet 370 by other devices, including the general purpose computer 341. The mobile phone network 358 also connects to a standard telephone network (not shown) in order to provide normal telephone service.
Referring again to
Further, the image processor 70 is applied to the images or videos 132, 142 based on user preferences 62 to produce the enhanced image 69 that is shown on the display 90. The image processor 70 improves the quality of the original images or videos 132, 142 by, for example, removing the hand tremor from a video.
The inventive image capture device has associated with it two or more image sensors that capture images 132, 142 at substantially the same time. The image processor 70 combines those images 132, 142 to produce the enhanced image 69.
In one embodiment, the image sensors 130, 140 each contain a different predetermined color pattern. As is well known, image sensors contain photosites arranged on a regular grid. Typically, a photosite is covered with a filter such as a red filter, a green filter, a blue filter, or a yellow filter that permits transmittance of certain wavelengths of light to enter the photosite. Note that having a photosite with no filter permits it to be sensitive to all wavelengths of light and is called a “luminance” photosite. In some cases, a luminance photosite is covered with a filter to prevent infrared sensitivity while permitting the photosite to maintain sensitivity to the visible spectrum. To produce a full color image where each pixel location has associated with it information about the intensity of light for a set of color primaries of light (typically red, green and blue); an algorithm called demosaicing (or color filter array interpolation) is applied. The predetermined color pattern typically contains a repeating color unit that repeats over the image sensor. For example, the common Bayer Filter Array has a 2×2 color unit containing two green photosites, one red photosite, and one blue photosite. The color pattern of the image sensors 130, 140 is typically fixed at the time of manufacture, and does not change (and is therefore predetermined). The predetermined color pattern is represented by the repeating color unit and its positions within the image sensor such that this repeating color unit is used to tile in a non-overlapping fashion over the image sensor. The same repeating color unit placed in different positions within different image sensors can produce image sensors with different predetermined color patterns. Some image sensors 130, 140 have a small repeating color unit such as the 2×2 Bayer pattern and the 2×2 pattern (red green blue and luminance) of U.S. Pat. No. 6,476,865. Other predetermined color patterns, such as that described in U.S. Pat. No. 6,909,461, have a larger repeating color unit of 2×4 pixels or 4×4 pixels.
In one embodiment, the enhanced image 69 is produced by combining information from two or more of the images 132, 142 captured by different image sensors 130, 140. In another embodiment, the enhanced image 69 is a full color image produced using information from two or more images 132142, wherein each of the images 132 and 142 are single color images where each pixel location is associated with only a single value corresponding to the intensity of light for a certain spectral description (the value of which is related to the transmittance of the color filter array and other factors, such as the sensitivity of the photosite to different wavelengths of light).
Each of the image sensors 130 and 140 produce a single channel digital image (the image or video 132 and 142, respectively). In this scenario, it is important to notice that the image captured with the image sensor 130 has improved signal to noise ratio because each photosite is sensitive to all wavelengths of light. However, the image from image sensor 130 does not naturally contain color information. On the other hand, the image or video 142 from the image sensor 140 has inferior signal to noise ratio (due to the fact that some quantity of the light energy never reached the sensitized portion of the photosites because of the color filters, but nevertheless, the image 142 does contain color information.
The image processor 70 inputs both images 132 and 142 and combines information from both images to produce an enhanced image 69. The method implemented by the image processor 70 to produce the enhanced image 69 is illustrated in
In step 101, the left image is received by the image processor 70, and in 102, the right image is received by the image processor 70. In step 103, the image processor detects point features in the left image, and in step 104, the image processor detects point features in the right image. The point features, often called feature points, are distinctive patterns of lightness and darkness that are identified across views of an object. Preferably, the method U.S. Pat. No. 6,711,293 is used to identify feature points called SIFT features, although other feature point detectors and feature point descriptions are used. Next, in step 105, the features are matched across the images to establish a correspondence between feature point locations in the left image and the right image. This matching process is also described in U.S. Pat. No. 6,711,293. Next, in step 106, the image processor 70 identifies high confidence feature point matches. Step 106 is performed by, for example, removing feature point matches that are weak (where the SIFT descriptors between putative matches are less similar than a predetermined threshold), or by enforcing geometric consistency between the matching points, as, for example, is described in Josef Sivic, Andrew Zisserman: Video Google: A Text Retrieval Approach to Object Matching in Videos. ICCV 2003: 1470-147. An illustration of the identified feature point matches is shown in
Next, in step 107, the image processor 70 computes an alignment warping function that warps the positions of feature points from one image to be more similar to the corresponding positions of the matching feature points. Essentially, the alignment warping function is able to warp one image (e.g. the right image) in a manner so that objects in the warped version of that image are at roughly the same position as the corresponding objects in the other image (e.g. the right image). The alignment warping function is any of several functions. In one embodiment, the alignment warping function is a linear transformation of coordinate positions. In a general sense, the warping alignment function maps pixel locations from one image to pixel locations into a second image. In many cases an alignment warping function is invertible, so that the alignment warping function also (after inversion) maps pixel locations in the second image to pixel locations in the first image. The alignment warping function is any of several types of warping functions known in the art, such as: translational warping (2 parameters), affine warping (6 parameters), perspective warping (8 parameters), and polynomial warping (number of parameters depend on the polynomial degree) or warping over triangulations (variable number of parameters). In this step, an alignment of the first and second digital images is found.
In equation form, let A be the alignment warping function. Then A(x,y)=(m,n) where (x,y) is a pixel location in the first image, and (m,n) is a pixel location in the second image. Then, (x,y)=A−1(m,n). The alignment warping function typically has a number of free parameters, and values for these parameters are determined with well-known methods (such as least square methods) by using the set of high confidence feature matches from the first and the second images. Other alignment warping functions exist in algorithmic form to map a pixel location (x,y) in the first image to the second image, such as, find the nearest feature point in the first image that has a corresponding match in the second image. In the first image, this feature point has pixel location (Xi,Yi) and corresponds to the feature point in the second image with location (Mi, Ni). Then, the pixel at position (x,y) in the first image is determined to map to the position (x−Xi+Mi, y−Yi+Ni) in the second image.
Once the alignment warping function A is determined, the image processor 70 performs step 111 to produce corrected color values, producing the enhanced image 69. The enhanced image 69 contains, at each pixel location, a value for each of a set of at least three color primaries (typically, a red, green and blue light intensity value for each pixel location (m,n)). The step 111 correct color values uses information from both the left and the right images, which each have only one channel of pixel values, and the pixel value at a given location corresponds to a particular color filter, to produce a multichannel image (the enhanced image 69) where each pixel location contains a value for a set of at least three color primaries.
Step 111 proceeds by determining the missing color values at a pixel location in a first image by using pixel values from both the first image, and from regions of the second image that, when the alignment warping function A is applied, are spatially close to the pixel location in the first image. For example, consider
L
2(2,6)=[g2(2,5)+g2(1,6)+g2(1,6)+g2(2,7)]/12+[r2(1,5)+r2(1,7)+r2(3,5)+r2(3,7)]/12+b2(2,6)/3
r
1(7,3)=L1(7,3)+[r2(1,5)+r2(1,7)+r2(3,5)+r2(3,7)]/4−L2(2,6)
g
1(7,3)=L1(7,3)+[g2(2,5)+g2(1,6)+g2(1,6)+g2(2,7)]/4−L2(2,6)
b
1(7,3)=L1(7,3)+b2(2,6)−L2(2,6)
Similar equations are constructed to determine missing color values for other locations in the first image.
In another embodiment, the image processor 70 produces two enhanced images for each of the number of image sensors 130 that are present on the image capture device. For example, if the image capture device contains a left image sensor 130 and a right image sensor 140 and captures a left image 132 and a right image 142, then the image processor 70 produces two enhanced images 112, 113 (corresponding to enhanced image 69 of
When the color filters on an image sensor include cyan, magenta, and yellow, they are generally referred to as secondary color filters in the known art. The image sensors 130 and 140 can have predetermined color patterns corresponding to primary and secondary color filters respectively, for example, one of them is primary colors and the other secondary colors. The collection of unique different color filters associated with a predetermined color pattern placed over an image sensor is the set of color filters associated with that image sensor, for example, the Bayer filter pattern's set of color filters is red, green, and blue. The image sensors 130 and 140 can have different sets of color filters corresponding to different color patterns. For example, in
The invention has been described in detail with particular reference to certain preferred embodiments thereof, but it will be understood that variations and modifications can be effected within the spirit and scope of the invention.
This application is a divisional of commonly-assigned U.S. patent application Ser. No. 12/913,819, filed Oct. 28, 2010, entitled “Camera With Sensors Having Different Color Patterns” by Andrew C. Gallagher et al, the disclosure of which is incorporated herein in its entirety. Reference is also made to commonly assigned U.S. patent application Ser. No. 12/913,828 filed Oct. 28, 2010, entitled “Combining Images Captured With Different Color Patterns” by Amit Singhal et al, the disclosure of which is incorporated herein.
Number | Date | Country | |
---|---|---|---|
Parent | 12913819 | Oct 2010 | US |
Child | 13613103 | US |