This application claims the benefit of priority to utility application Ser. No. 10/132,875, filed in the United States on Apr. 24, 2002, and titled “High-Performance Sensor Fusion Architecture.”
(1) Technical Field
The present invention relates to general techniques for computer vision and object classification. More specifically, the present invention relates to use of constrained disparity with stereoscopic images to generate three-dimensional shape estimates.
(2) Discussion
Matching features in images of a scene taken from two different viewpoints (stereoscopic images) is a major problem in the art of machine vision systems. A variety of constraint-based solutions have been proposed, and have met with varying degrees of success because there is no general solution to the problem and a set of constraints applied to one scene may not be appropriate for other scenes.
In particular, a three-dimensional scene is reduced to two-dimensional images when captured by an imaging device. Thus, the pictures each contain less information than the original scene. In the art of stereoscopic imaging, a pair of two-dimensional images, each taken from a different location, is used to approximate the information from the original three-dimensional scene. However, in attempting to reconstruct the original three-dimensional scene, another problem arises—that of identifying corresponding points in the pair of images. In other words, for any individual pixel in one image, there are many potential corresponding pixels in the other image. Thus, a major difficulty in the art is determining corresponding pixels in the images so that disparities between the images may be determined in order to approximate the original three-dimensional scene.
Accordingly, there exists a need in the art for a fast and reliable system for approximating a three-dimensional scene from stereoscopic images. The present invention provides such a system, using a texture filter to generate a disparity estimate and refining the disparity estimate iteratively using disparity constraints until a final estimate is achieved.
The features of the present invention may be combined in many ways to produce a great variety of specific embodiments, as will be appreciated by those skilled in the art. Furthermore, the means which comprise the apparatus are analogous to the means present in computer program product embodiments and to the acts in the method embodiment.
The present invention teaches a method, an apparatus, and a computer program product for three-dimensional shape estimation using constrained disparity propagation. The invention performs an operation of receiving a stereoscopic pair of images of an area occupied by at least one object. Next, an operation of detecting pattern regions and non-pattern regions within each of the pair of images using a texture filter is performed. Then, an operation of generating an initial estimate of spatial disparities between the pattern regions within each of the pair of images is executed. Next, an operation of using the initial estimate to generate a subsequent estimate of the spatial disparities between the non-pattern regions based on the spatial disparities between the pattern regions using disparity constraints is performed. Subsequently, an operation of iteratively using the subsequent estimate as the initial estimate in the act of using the initial estimate to generate a subsequent estimate in order to generate further subsequent estimates of the spatial disparities between the non-pattern regions based on the spatial disparities between the pattern regions using the disparity constraints until there is no change between the results of subsequent iterations, thereby generating a final estimate of the spatial disparities is performed. Finally, an operation of generating a disparity map of the area occupied by at least one object from the final estimate of the three-dimensional shape is executed.
In a further aspect, the at least one object comprises a vehicle occupant and the area comprises a vehicle occupancy area, the method further comprising an act of processing the final estimate to provide signals to vehicle systems.
In a still further aspect, the invention performs an operation of capturing images from a sensor selected from a group consisting of CMOS vision sensors and CCD vision sensors.
In a yet further aspect, the signals comprise airbag enable and disable signals.
In another aspect, the act of extracting image features further comprises operations of processing the disparity map with at least one of the classification algorithms to produce object class confidence data.
In yet another aspect, the classification algorithm is selected from the group consisting of a trained C5 decision tree, a trained Nonlinear Discriminant Analysis network, and a trained Fuzzy Aggregation Network.
In a further aspect, an operation of data fusion is performed on the object class confidence data to produce a detected object estimate.
It will be appreciated by one of skill in the art that the “operations” of the present invention just discussed have parallels in acts of a method, and in modules or means of an apparatus or computer program product and that various combinations of these features can be made without departing from the spirit and scope of the present invention.
The objects, features and advantages of the present invention will be apparent from the following detailed descriptions of the preferred embodiment of the invention in conjunction with reference to the following drawings, where:
The present invention relates general techniques to computer vision and object classification. More specifically, the present invention relates to use of constrained disparity with stereoscopic images to generate three-dimensional shape estimates. The following description, taken in conjunction with the referenced drawings, is presented to enable one of ordinary skill in the art to make and use the invention and to incorporate it in the context of particular applications. Various modifications, as well as a variety of uses in different applications, will be readily apparent to those skilled in the art, and the general principles defined herein, may be applied to a wide range of embodiments. Thus, the present invention is not intended to be limited to the embodiments presented, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein. Furthermore it should be noted that unless explicitly stated otherwise, the figures included herein are illustrated diagrammatically and without any specific scale, as they are provided as qualitative illustrations of the concept of the present invention.
In order to provide a working frame of reference, first a glossary of terms used in the description and claims is given as a central resource for the reader. Next, a discussion of various principal embodiments of the present invention is provided. Finally, a discussion is provided to give an understanding of the specific details.
Before describing the specific details of the present invention, a centralized location is provided in which various terms used herein and in the claims are defined. The glossary provided is intended to provide the reader with a general understanding of the intended meaning of the terms, but is not intended to convey the entire scope of each term. Rather, the glossary is intended to supplement the rest of the specification in more clearly explaining the terms used.
Means—The term “means” as used with respect to this invention generally indicates a set of operations to be performed on a computer, and may represent pieces of a whole program or individual, separable, software modules. Non-limiting examples of “means” include computer program code (source or object code) and “hard-coded” electronics (i.e. computer operations coded into a computer chip). The “means” may be stored in the memory of a computer or on a computer readable medium.
Object: The term object as used herein is generally intended to indicate a physical object within a scene for which a three-dimensional estimate is desired.
Sensor: The term sensor as used herein generally includes any imaging sensor such as, but not limited to, optical sensors such as CCD cameras
The present invention has three principal “principal” embodiments. The first is a system for determining operator distraction, typically in the form of a computer system operating software or in the form of a “hard-coded” instruction set. This system may be incorporated into various devices, non-limiting examples of which include vehicular warning systems, three-dimensional modeling systems, and robotic vision systems incorporated in manufacturing plants. Information from the system may also be incorporated/fused with data from other sensors or systems to provide more robust information regarding the object observed. The second principal embodiment is a method, typically in the form of software, operated using a data processing system (computer). The third principal embodiment is a computer program product. The computer program product generally represents computer readable code stored on a computer readable medium such as an optical storage device, e.g., a compact disc (CD) or digital versatile disc (DVD), or a magnetic storage device such as a floppy disk or magnetic tape. Other, non-limiting examples of computer readable media include hard disks, read only memory (ROM), and flash-type memories. These embodiments will be described in more detail below.
A block diagram depicting the components of a computer system used in the present invention is provided in
An illustrative diagram of a computer program product embodying the present invention is depicted in
As shown in
Several choices are available for the selection of a texture filter 304 for recognizing regions of the image characterized by salient features, and the present invention may use any of them as suited for a particular embodiment. In an embodiment, a simple texture filter 304 was used for estimating the mean variance of the rows of a selected region of interest. This choice reflects the necessity of identifying those image blocks that present a large enough contrast along the direction of the disparity search. For a particular N×M region of the image I, where (x,y) are image coordinates and σ2 is variable, the following quantity:
is compared against a threshold defining the minimum variance considered sufficient to identify a salient image feature. Once the whole image has been filtered and the regions rich in texture have been identified, the disparity values of the selected regions are estimated minimizing the following cost function in order to perform the matching between the left and right image (where d(opt) is an optimal value and d is an offset distance):
During the disparity estimation act, a neighborhood density map is created. This structure consists of a matrix of the same size as the disparity map, whose entries specify the number of points in an 8-connected neighborhood where a disparity estimate is available. An example of such a structure is depicted in
Once the initialization stage is completed, the disparity information available is propagated starting from the denser neighborhoods. Two types of constraints are enforced during the disparity propagation. The first type of constraint ensures that the order of appearance of a set of image features along the x direction is preserved. This condition, even though it is not always satisfied, is generally true in most situations where the camera's base distance is sufficiently small. An example of allowed and prohibited orders of appearance of image elements is depicted in
d(i)min=d(i−1)−ε and (3)
d(i)max=d(i+1)+ε, where (4)
ε=|xi−xi−1|. (5)
This type of constraint is very useful for avoiding false matches of regions with similar features.
The local smoothness of the disparity map is enforced by the second type of propagation constraint. An example of a 3×3 neighborhood where the disparity of the central element has to be estimated is shown in
dmin=min{dεNij}−η and (6)
dmax=max{dεNij}+η, where (7)
Nij={pm,n}, m=i−1, . . . , i+1, and n=j−1, . . . , j+1. (8)
The concept is that very large local fluctuations of the disparity estimates are more often due to matching errors that to true sharp variations. As a consequence, enforcing a certain degree of smoothness in the disparity map greatly improves the signal-to-noise ratio of the estimates. In an embodiment, the parameter η is forced equal to zero, thus bounding the search interval of possible disparities between the minimum and maximum disparity currently measured in the neighborhood.
Additional constraints to the disparity value propagation based on the local statistics of the grayscale image are enforced. This feature attempts to lower the amount of artifacts due to poor illumination conditions and poorly textured areas of the image, and addresses the issue of propagation of disparity values across object boundaries. In an effort to reduce the artifacts across the boundaries between highly textured objects and poorly textured objects, some local statistics of the regions of interest used to perform the disparity estimation are computed. This is done for the entire frame, during the initialization stage of the algorithm. The iterative propagation technique takes advantage of the computed statistics to enforce an additional constraint to the estimation process. The results obtained by applying the algorithm to several sample images have produced a net improvement in the disparity map quality in the proximity of object boundaries and a sharp reduction in the amount of artifacts present in the disparity map.
Because the disparity estimation is carried out in an iterative fashion, the mismatch value for a particular image block and a particular disparity value usually need to be evaluated several times. The brute force computation of such cost function every time its evaluation is required is computationally inefficient. For this reason, an ad-hoc caching technique is preferred in order to greatly reduce the system time-response and provide a considerable increase in the speed of the estimation process. The quantity that is stored in the cache is the mismatch measure for a given disparity value in a particular point of the disparity grid. In a series of simulations, the number of hits in the cache averaged over 80%, demonstrating the usefulness of the technique.
The last component of the Disparity Map module 300 is an automatic vertical calibration subroutine (not shown in the figure). This functionality is particularly useful for compensating for hardware calibration tolerances. While an undetected horizontal offset between the two cameras usually causes only limited errors in the disparity evaluation, the presence of even a small vertical offset can be catastrophic. The rapid performance degradation of the matching algorithm when such an offset is present is a very well-known problem that affects all stereo camera-based ranging systems.
A fully automated vertical calibration subroutine is based on the principle that the number of correctly matched image features during the initialization stage is maximized when there is no vertical offset between the left and right image. The algorithm is run during system initialization and after periodically to check for the consistency of the estimate.
(b) System Performance
An example of a stereo image pair is shown in
Number | Name | Date | Kind |
---|---|---|---|
5247583 | Kato et al. | Sep 1993 | A |
5309522 | Dye | May 1994 | A |
5561431 | Peele et al. | Oct 1996 | A |
5995644 | Lai et al. | Nov 1999 | A |
6026340 | Corrado et al. | Feb 2000 | A |
6078253 | Fowler | Jun 2000 | A |
6295373 | Mahalanobis et al. | Sep 2001 | B1 |
6307959 | Mandelbaum et al. | Oct 2001 | B1 |
6452870 | Breed et al. | Sep 2002 | B1 |
6529809 | Breed et al. | Mar 2003 | B1 |
6701005 | Nichani | Mar 2004 | B1 |
6754379 | Xiong et al. | Jun 2004 | B2 |
6914599 | Rowe et al. | Jul 2005 | B1 |
6961443 | Mahbub | Nov 2005 | B2 |
7289662 | Keaton et al. | Oct 2007 | B2 |
7505841 | Sun et al. | Mar 2009 | B2 |
20020001398 | Shimano et al. | Jan 2002 | A1 |
20020134151 | Naruoka et al. | Sep 2002 | A1 |
20020191837 | Takeda et al. | Dec 2002 | A1 |
20030091228 | Nagaoka et al. | May 2003 | A1 |
20030204384 | Owechko et al. | Oct 2003 | A1 |
20040022418 | Oota | Feb 2004 | A1 |
20040022437 | Beardsley | Feb 2004 | A1 |
20040105579 | Ishii et al. | Jun 2004 | A1 |
20040240754 | Smith et al. | Dec 2004 | A1 |
20040247158 | Kohler et al. | Dec 2004 | A1 |
20040252862 | Camus et al. | Dec 2004 | A1 |
20040252863 | Chang et al. | Dec 2004 | A1 |
20040252864 | Chang et al. | Dec 2004 | A1 |
20050196015 | Luo et al. | Sep 2005 | A1 |
20050196035 | Luo et al. | Sep 2005 | A1 |
20050201591 | Kiselewich | Sep 2005 | A1 |
20070055427 | Sun et al. | Mar 2007 | A1 |
Number | Date | Country |
---|---|---|
WO 02 30717 | Apr 2002 | WO |