The present invention relates to a method and apparatus for transforming a distorted wide angle field-of-view image into a non-distorted, normal perspective image at any orientation, rotation, and magnification within the field-of-view, which is electronically equivalent to a mechanical pan, tilt, zoom, and rotation camera viewing system.
Camera viewing systems are utilized for a large variety of different purposes, including surveillance, inspection, security and remote sensing as well as mainstream applications such as consumer digital imaging and real time video conferencing. The majority of these systems use either a fixed-mount camera with a limited viewing field, or they utilize mechanical pan-and-tilt platforms and mechanized zoom lenses to orient the camera and magnify its image. While a mechanical solution may often be satisfactory when multiple camera orientations and different degrees of image magnification are required, the mechanical platform can be cumbersome, relatively unreliable because of the many moving parts it requires, and it can occupy a significant volume, making such a viewing system difficult to conceal or use in close quarters. As a result, several stationary cameras are often used to provide wide-angle viewing of a workspace.
More recently, camera viewing systems have been developed that perform the electronic equivalent of mechanical pan, tilt, zoom, and rotation functions without the need for moving mechanisms. One method of capturing a video image that can be electronically processed in this manner uses a wide-angle lens such as a fisheye lens. Fisheye lenses permit a large sector of the surrounding space to be imaged all at one time, but they produce a non-linear distorted image as a result. While ordinary rectilinear lenses map incoming light rays to a planar photosensitive surface, fisheye lenses map them to a spherical surface, which is capable of a much wider field of view. In fact, fisheye lenses may even encompass a field of view of 180°. By capturing a larger section of the surrounding space, a fisheye lens camera affords a wider horizontal and vertical viewing angle, provided that the distorted images on the spherical surface can be corrected and transformed in real time.
The process of transforming distorted images to accurate perspective images is referred to as “dewarping.” Dewarping the image restores the captured scene to proper perspective based upon the orientation of the perspective view. A (Digital Pan Tilt Zoom) DPTZ processor is generally employed to perform the dewarping process. Unfortunately, dewarping can be a computationally intensive process that requires significant processing resources, including a processor having a high data bandwidth and access to a large amount of memory.
In accordance with one aspect of the invention, a method is provided for rendering an image. The method includes capturing a distorted input image using a color filter array to obtain an input image pattern having a single color channel per pixel. the input image is transformed to an input image signal. At least a portion of the input image signal is dewarped to obtain an undistorted image signal by (i) identifying selected coordinate points in the input signal that correspond to coordinate points in the undistorted image signal and (ii) determining a first color channel value for at least one of the selected coordinate points with a color correlation-adjusted interpolation technique using at least one nearest neighbor pixel having a color channel different from the first color channel.
In accordance with another aspect of the invention, an imaging system provides an undistorted view of a selected portion of a lens-distorted optical image. The imaging system includes a lens for obtaining a lens-distorted input optical image and a digital image capture unit for capturing the input optical image to obtain an input image pattern having a single color channel per pixel. The imaging system also includes a processor transforming a selected portion of the input image pattern to produce an undistorted output image. The processor is configured to perform the transformation by dewarping the input image pattern in Bayer space using color correlation-adjusted linear interpolation.
As detailed below, a wide-angle camera viewing system is provided that produces the equivalent of pan, tilt, and zoom functions by efficiently performing real-time distortion correction processes that can be implemented on an embedded processor, ASIC or FPGA.
The principles of image transform described herein can be understood by reference to the illustrative camera viewing system 10 of
Camera 12 includes a photosensor pixel array such as a CCD or CMOS array, for example. A color filter array (CFA), or color filter mosaic (CFM) is arranged over the pixel array to capture color information. Such color filters are needed because the typical photosensors detect light intensity with little or no wavelength_specificity, and therefore cannot separate color information.
One example of a CFA is a Bayer filter, which gives information about the intensity of light in red, green, and blue (RGB) wavelength regions. When a Bayer pattern is used, filtering is provided such that every other pixel collects green light information (“green pixels”) and the pixels of alternating rows of the sensor collect red light information (“red pixels”) and blue light information (“blue pixels”), respectively, in an alternating fashion with pixels that collect green light information.
It should be noted that instead of a Bayer filter, other types of color filter arrays may be employed. Illustrative examples of such filters include an RGBE filter, a CYYM filter, a CYGM filter, an RGBW filter and the like. For purposes of illustration, however, the following discussion will primarily be presented in terms of a Bayer filter.
As noted above, due to the sampling by the color filter array, there is missing color values in each pixel of an image represented in Bayer space. The process to restore the color values is called demosaicing. Demosaicing algorithms estimate missing color information by interpolation of the known color information across different color planes. Many different algorithms exist. Such demosaicing algorithms estimate the missing color information for each given pixel position by evaluating the color information collected by adjacent pixels.
As noted above, the DPTZ processor 15 shown in
The transform between the desired output image and the captured input image can be modeled by first considering a standard pinhole camera. As illustrated in
The DPTZ processor 15 is used to construct the output image on the virtual image plane from the input image that is received on the image sensor plane. To do this, the virtual image plane is segmented into sample points. The sample points are mapped back onto the image sensor plane. The process of mapping (x,y) sample points in the virtual image plane back onto the image sensor (u,v) coordinates is called “inverse mapping.” That is, the inverse mapping process maps the (x,y) output image coordinates in the virtual image plane onto the (u,v) input image coordinates in the image sensor plan. Various algorithms are well known to perform the inverse mapping process.
Conventional dewarping or inverse mapping processes are generally performed in full color space. That is, the inverse mapping is performed after demosaicing has been performed to reconstruct an image that includes three color channels for each pixel. One problem that arises when dewarping or inverse mapping is performed on a demosaiced image is that the DPTZ processor 15 needs to process all three color channels, which requires the processor to have a high data bandwidth and large memory storage.
In order to reduce the computational burdens that are placed on the DPTZ processor 15 the camera viewing system 10 of
This reduction in image quality can be explained with reference to
Ixy=Iuv=f(Iw22, Iw23, Iw32, Iw33) (1)
As previously mentioned, this dewarping process illustrated in
Gxy=Guy=f(G11, G13, G31, G33) (2)
Similar equations can be written for other pixels in perspective image, such as shown in
Clearly, adjacent pixels of the same color channel are more widely spaced from one another in
Thus, in summary, dewarping an image pattern in Bayer space is computationally less complex than dewarping a full color image pattern, but at the expense of image quality.
As detailed below, the advantages of dewarping a Bayer image pattern can be maintained while achieving a higher image quality by using inter-color correlations between all adjacent pixels (even those pixels that differ in color) when performing interpolation during the dewarping process. In other words, within a small neighborhood on an image, it can be assumed that there is a correlation between the different color channels. For instance, in one color model the ratio between luminance and chrominance at the same position is assumed to be constant within the neighborhood.
G1=Guv=f(G44, G45, G54, G55) (3)
In contrast to equation 2, not all the values of G44, G45, G54 and G55 are known. Specifically, G45 and G54 are unknown. Rather, only the values B45 and R54 are known. That is, for these two pixels the only color channel information available is different from the color channel information that is needed. Accordingly, it is necessary to estimate the values of G45 and G54. This can be accomplished in a number of different ways, one of which will be presented herein. The illustrated technique examines a window in the neighborhood of each pixel G45 and G54. For example, in
The estimation of G45 and G54 within their respective windows, which are needed to interpolate perspective image points (e.g., G1 in
An example of the edge sensing algorithm is illustrated in
In other words, if the difference between B25 and B65 is smaller than the difference between B43 and B47, then the inter-color correlation is assumed to be stronger in the vertical direction than in the horizontal direction. As a consequence G45 is calculated to be the average of the vertical nearest neighbors G35 and G55. On the other hand, if the difference between B43 and B47 is smaller than the difference between B25 and B55 then the inter-color correlation is assumed to be stronger in the horizontal direction, in which case G45 is calculated to be the average of the horizontal neighbors G44 and G46. Thus, the pixels used to estimate G45 are selected based on the inter-color correlation strength of its nearest neighbors in different directions. The selected pixels are those that are distributed in the direction with the greater or stronger inter-color correlation. In window 520 of
Returning to
Once the green values of the pixels in the designated window (e.g., window 510) are known, other pixel values in the perspective image may be determined from the wide angle image in a similar manner. For instance, as shown in
R2=f(R45, R46, R55, R56) (6)
Since the values of R45, R46 and R55 are unknown, they may be estimated using a color correlation-adjusted linear interpolation technique such as the edge sensing algorithm to calculate the missing red channel from the blue channel and the missing red channel from the green channel. An illustrative calculation for the value of the red component of pixels B45, G46 and G55 is illustrated below based on a popular color correlation model within a local window, assuming that the difference between channels are assumed to be constant within the window:
R45=G45−½*((G34−R34)+(G36−R36)+(G54−R54)+(G56−R56))
R46=G46−½*((G36−R36)+(G56−R56))
R55=G55−½*((G54−R54)+(G56−R56)) (7)
Once again the edge sensing algorithm for red values illustrated above in connection with
The processes described above, including but not limited to those presented in connection with
An imaging system has been described that can efficiently produce the equivalent of pan, tilt, and zoom functions by performing real-time distortion correction on a lens-distorted image. This result is achieved by leveraging, during the dewarping process, color-correlations that exist among neighboring pixels. Among its other advantages, some of which have been noted above, the imaging system can avoid the need for a separate image signal processor that is often otherwise needed to perform the demosaicing process prior to the dewarping process. The extra processor can be eliminated because commercially available encoders that are typically used to compress the image after dewarping may in some cases also be used in the present arrangement to perform the demosaicing process.