The present invention is directed to a method for classifying an object using a stereo camera.
Classification of an object using a stereo camera, in which classification is performed based on head size and, respectively, head shape, is known from German Published Patent Application No. 199 32 520.
By contrast, the method according to the present invention for classifying an object using a stereo camera has the advantage over the related art that model-based classification is now performed based on table-stored pixel coordinates of the stereo camera's left and right video sensors and their mutual correspondences. The models are stored for various object shapes and for various distances between the object and the stereo camera system. If, in terms of spatial location, an object to be classified is located between two stored models of this kind, classification is then based on the model that is closest to the object. By using the stored pixel coordinates of the stereo camera's left and right video sensors and their mutual correspondences, it is possible to classify three-dimensional objects solely from grayscale or color images. The main advantage over the related art is that there is no need for resource-intensive and error-prone disparity and depth value estimates. This means the method according to the present invention is significantly simpler. In particular, less sophisticated hardware may be used. Furthermore, classification requires less processing power. Moreover, the classification method allows highly reliable identification of the three-dimensional object. The method according to the present invention may in particular be used for video-based classification of seat occupancy in a motor vehicle. Another application is for identifying workpieces in manufacturing processes.
The basic idea is to make a corresponding model available for each object to be classified. The model is characterized by 3D points and the topological combination thereof (e.g., triangulated surface), 3D points 22 which are visible to the camera system being mapped to corresponding pixel coordinates 24 in left camera image 23 and pixel coordinates 26 in right camera image 25 of the stereo system (see
It is particularly advantageous that for each individual comparison a quality index is determined, the object being classified as a function of this quality index. The quality index may be derived from suitable correlation measurements (e.g., correlation coefficient) in an advantageous manner.
Furthermore, it is advantageous that the models are generated for a shape, e.g., an ellipsoid, for different positions or distances relative to the camera system. For example, as a general rule three different distances from the camera system are sufficient to allow an object on a vehicle seat to be correctly classified. Different orientations of the object may also be adequately taken into account in this way. If necessary, suitable adjustment methods may additionally be used.
As a general rule, known methods for model-based classification of three-dimensional objects using a stereo camera may be divided into three main processing steps.
In a first step, using data from a stereo image pair a displacement for selected pixels is estimated via disparity estimates and converted directly into depth values and a 3D point cloud. This is the stereo principle.
In a second step, this 3D point cloud is compared with various 3D object models which are represented via an object surface description. Herein, for example, the mean distance between the 3D points and the surface model in question may be defined as the measure of similarity.
In a third step, assignment to a class is performed by selecting the object model having the greatest degree of similarity.
To avoid having to determine depth values, according to the present invention it is proposed that classification is carried out solely based on comparison of the measured grayscale or color images (=images) with stored left and right stereo system camera pixel coordinates and their mutual correspondences. The stored pixel coordinates are generated by using the stereo system's left and right camera images to map surfaces of 3D models representing the objects to be classified. It is possible to classify objects in various positions and at various distances from the stereo camera system, because the accompanying models representing the particular objects are available for various positions and various distances. For example, if an ellipsoid-shaped object, for which the distance from the stereo camera system may vary, is to be classified, the corresponding model of the ellipsoid is made available for various different distances from the stereo camera system.
In the case of the classification method according to the present invention, first, in a preprocessing step, the models representing the objects to be classified must be made available. If for example the method according to the present invention is to be used to classify seat occupancy in a motor vehicle, this is carried out at the plant. Herein, various shapes to be classified, e.g., a child in a child seat, a child, a small adult, a large adult, or just the head of an adult or child, are used to generate models. The left and right stereo system camera pixel coordinates and their mutual correspondences are suitably stored (e.g., in a look-up table) for these models, which may be at a variety of defined distances from the stereo system. Using a look-up table means the search for the model having the highest degree of concordance with the object detected by the stereo camera system is less resource-intensive.
According to the method of the present invention, a processor 14, which is provided in a stereo camera control unit, then processes the data from video sensors 10 and 12 in order to classify the detected object. To accomplish this, processor 14 accesses a memory 15. Individual models characterized by their pixel coordinates and their mutual correspondences are stored in memory 15, e.g., a database. The model having the greatest degree of concordance with the measured object is sought using processor 14. The output value of processor 14 is the classification result, which is for example sent to a restraining means control unit 16, so that as a function of this classification and other sensor values from a sensor system 18, e.g., a crash sensor system, control unit 16 may trigger restraining means 17 (e.g., airbags, seat belts tighteners and/or roll bars).
D=x
1
−x
r.
In geometric terms, disparity is D=C/z, where constant C depends on the geometry of the stereo camera. In the present case, distance z from model point 22 to image plane 25 or 23, respectively, is known, as three-dimensional model 21 is situated in a predefined position and orientation relative to the stereo camera.
For each three-dimensional model describing a situation to be classified, in a one-time preprocessing step the pixel coordinates and their mutual correspondences for the model points visible to video sensors 10 and 12 are determined and stored in the look-up table of correspondences.
Classification is performed via comparison of the grayscale distributions in a defined image area surrounding the corresponding left and right camera image pixel coordinates of the stereo camera detecting the object to be classified. This is also feasible for color value distributions.
For each three-dimensional model, the comparison supplies a quality index indicating the degree of concordance between the three-dimensional model and the measured left and right camera images. The three-dimensional model having the most favorable quality index which best describes the measured values produces the classification result.
The quality index may be ascertained using signal processing methods, e.g., a correlation method. If a corresponding three-dimensional model is not generated for every possible position and orientation of the measured object, differences between the position and orientation of the three-dimensional models and those of the measured object may be calculated using iterative adjustment methods, for example.
The classification method may be divided into offline preprocessing and actual online classification. This allows the online processing time to be significantly reduced. In principle, it is also feasible for preprocessing to take place online, i.e., while the device is in operation. However, this would increase the processing time and as a general rule would not have any advantages.
During offline processing, the left and right camera pixel coordinates and their correspondences are determined for each three-dimensional model and stored in a look-up table.
An option for determining the quality index for a model is described below by way of an example, with reference to
These areas are shown by way of an example in left and right image 45 and 46. Images 45 and 46 are sent to a block 47 so that the quality may be determined via comparison of the measurement windows, e.g., using correlation methods. The output value is then point quality 48. The method shown by way of an example in
Number | Date | Country | Kind |
---|---|---|---|
102004007049.0 | Feb 2004 | DE | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2004/053350 | 12/8/2004 | WO | 00 | 5/13/2009 |