The method of the present invention is described herein in the context of a vision-based vehicle occupant recognition system, but it should be recognized that the method is equally applicable to other object recognition systems, whether vehicular or non-vehicular. Referring to
The cabin 12 is equipped with a monocular occupant recognition system including the active light source 14, a digital camera (DC) 20 and a digital signal processor (DSP) 22. Active and ambient light reflected from seat 10 and any occupant thereof is detected and imaged by camera 20, which typically includes an imaging lens 20a and a solid-state imaging chip 20b. The imaging chip 20b is a multi-pixel array that is responsive to the impinging light content, and creates a corresponding digital image. The DSP 22 processes images produced by imaging chip 20b, and typically functions to locate objects of interest in the image, such as human occupants or infant car seats. For example, DSP 22 can be programmed to recognize the presence of a human occupant, to classify the occupant, and to determine the position of a recognized occupant relative to an air bag deployment zone.
In general, the present invention is directed to a processing method carried out by DSP 22 for recognizing an imaged object based on its contour—in other words, its silhouette outline. In the realm of human subjects, well-known profiles (i.e., contours) include those of Alfred Hitchcock or John F. Kennedy, for example. Other less famous individuals, and non-human objects as well, are routinely perceived by their contours. According to the invention, the contours of the imaged object are first identified and then characterized for comparison with a library of objects that have been similarly characterized. If a match of sufficiently high confidence is not found, the image is distorted to simulate an incrementally different perspective of the imaged object, and the process of contour identification, characterization and comparison is repeated until a match of sufficiently high confidence is found. The cycle of image distortions allow two-dimensional images obtained from the monocular vision system of
The flow diagram of
Referring to
Returning to
The left assessment path characterizes the enumerated contours using a wavelet transformation. The block 62 computes wavelet coefficients (using a Haar wavelet transform, for example) that characterize the relative proportions of curvature along the enumerated contours, and block 64 compares the wavelet coefficient vectors to a library of vectors accumulated in offline training based on pre-defined contours. Horizontal, vertical or diagonal wavelets may be used, with either normal or over-complete spatial distribution. The calculated wavelet coefficient vectors can be compared to the library vectors using a dot-product calculation or some other measure of separation distance. The coefficient vectors for each contour will match the library vectors to varying degrees, and block 66 stores the highest-ranking matches along with the corresponding library object. In general, the rankings indicate the likelihood of a subset match (for example, JFK-forehead or Hitchcock-jowl), do not provide a sufficient basis to reliably discriminate a complete object.
The right assessment path characterizes the enumerated contours by slope sequence. First, the block 68 identifies a series of points along each enumerated contour, and then computes the slopes of lines connecting successive points. This sequence of numerical slope values characterizes the progression of angle changes along the contour. The block 70 then evaluates the slope sequences relative to library of sequences accumulated in off-line training based on pre-defined contours. Preferably, this is achieved by using Hidden Markov Models (HMM) to evaluate both the real-time and off-line slope sequences. The result of the HMM sequencing will be a list of candidate features for each of the enumerated contours. Block 72 identifies the candidate features that are common to two or more of the enumerated contours, and computes the distance and angle between them to determine the degree to which their spatial arrangement corresponds to a predefined object or contour. The computed distance and angle essentially represent a confidence metric, which is used to rank the identified candidate features. The overall ranking of block 74 is determined by comparing the rankings of the left and right paths, and using the wavelet-based ranking to boost the HMM ranking of features that are highly ranked by both paths. For example, if candidate features A and B are highly ranked based on the right assessment path, and the left assessment path identified candidate feature B as a close match, block 74 would increase the ranking metric of feature B. Block 74 then evaluates the radius and angle between the centroids of highly ranked features and matches them using a pattern matching technique such as neural network or support vector machine to create a meta-ranking of the candidate features.
After the candidate features have been ranked, the blocks 76 and 78 are executed to determine if an object classification has been achieved. This is done by combining the confidence metrics of the final candidates of block 74, and comparing the confidence to a threshold MATCH_THR such as 90%. The threshold MATCH_THR may be a fixed calibrated threshold as indicated or may be subject to variation, by an adaptive function for example. In any event, if the combined confidence metric is sufficiently high, the blocks 80, 82 and 84 are executed to reset a distortion grid index (DGI) to zero, to set MATCH FOUND to True, and to output the object classification.
If the combined confidence metric determined at block 76 is insufficient to reliably identify an object, the blocks 86, 88 and 90 are executed to warp the image data using a distortion grid, and blocks 30-34 and 60-78 are re-executed to check for a match. Warping the image with a distortion grid effectively changes the perspective of the imaged object (the seat occupant, for example), possibly offering a closer match with the library patterns. Several different kinds of distortion grids can be used to produce different effects.
In summary, the present invention provides an improved method of recognizing an imaged object based upon its contours. The contour characterization approximates the human perception of objects by their outlines, and the process of successively warping the image with different distortion grids allows two-dimensional images obtained from a monocular vision system to be analyzed for three-dimensional motion. The method can be used to recognize a specific object (as specific person, for example) or a certain class of objects (missiles and aircraft, for example). While the invention has been described in reference to the illustrated embodiment, it should be understood that various modifications in addition to those mentioned above would occur to persons skilled in the art. Accordingly, it is intended that the invention not be limited to the disclosed embodiment, but that it have the full scope permitted by the language of the following claims.