This application claims priority of European Patent Application Serial Number 07 015 282.2, filed on Aug. 3, 2008, titled METHOD AND APPARATUS FOR EVALUATING AN IMAGE, which application is incorporated in its entirety by reference in this application.
1. Field of the Invention
This invention relates to a system for evaluating an image. In particular, this invention relates to a system for evaluating an image that may be employed for object recognition in various environments such as, for example, in a driver assistance system onboard a vehicle or in a surveillance system.
2. Related Art
Nowadays, vehicles provide a plurality of driver assistance functions to assist the driver in controlling the vehicle and/or to enhance driving safety. Examples of such driver assistance functions include parking aids, collision prediction functions and safety features including airbags or seat belt retractors that may be actuated according to control logics. Some of these driver assistance functions may rely on, or at least harness, information on surroundings of the vehicle in the form of image data that is automatically evaluated to, e.g., detect approaching obstacles. In some driver assistance functions, not only the presence of an object in proximity to the vehicle, but also its “type” or “class”, such as vehicle or pedestrian, may be automatically determined so that appropriate action may be taken based on the determined object class. This may be achieved by capturing an image having a field of view that corresponds to a portion of the vehicle surroundings and evaluating the image data representing the image to detect objects and to determine their respective object class, based on, e.g., characteristic geometrical features and sizes of objects represented by the image data, which may be compared to reference data. Such a conventional approach to image evaluation frequently has shortcomings associated with it. For example, when the image data is directly compared to reference data, the reliability of object classification may depend on the distance of the object relative to the vehicle in which the driver assistance function is installed. For example, a lorry at a large distance from the vehicle may be incorrectly identified as a car at a shorter distance from the vehicle, or vice versa, due to the larger lateral dimensions of the lorry.
Similar problems exist in other situations in which an automatic identification of objects in an image is desirable, such as surveillance camera systems installed in public areas or private property.
Therefore, a need exists in the art for an improved system for evaluating an image. In particular, there is a need for an improved system for evaluating an image, which provides results that are less prone to errors caused by a variation in distance of an object relative to a camera that captures the image to be evaluated.
According to one implementation, a method for evaluating an image is provided. Image data representing the image is retrieved. Distance information on a distance of an object relative to an image plane of the image is retrieved. At least part of the object is represented by the image data. At least a portion of the image data is resampled, based both on the distance information and on a pre-determined reference distance to generate resampled image data. The portion of the image data to be resampled represents at least part of the object.
According to another implementation, an apparatus for evaluating an image is provided. The apparatus may include a processing device. The processing device may include a first input for receiving image data representing the image, and a second input for receiving distance information on a distance of an object relative to an image plane of the image. At least part of the object is represented by the image. The processing device is configured for resampling at least a portion of the image data based both on the distance information and on a pre-determined reference distance to generate resampled image data. The portion of the image data to be resampled represents at least part of the object.
According to another implementation, a driver assistance system is provided. The driver assistance system may include an image evaluating apparatus and an assistance device configured for receiving an image evaluation result from the image evaluating apparatus.
According to another implementation, a method for evaluating an image is provided. The image to be evaluated is captured. A three-dimensional image is also captured. The three-dimensional image includes depth information. A field of view of the three-dimensional image overlaps with a field of view of the image to be evaluated. At least a portion of the captured image is resampled based on the three-dimensional image.
According to another implementation, an apparatus for evaluating an image is provided. The apparatus may include a camera device for capturing an image, a three-dimensional camera device configured for capturing a three-dimensional image, and a processing device coupled to the camera device and to the three-dimensional camera device. The three-dimensional image captured by the three-dimensional camera includes depth information. A field of view of the three-dimensional image overlaps with a field of view of the image to be evaluated. The processing device is configured for receiving image data representing the image to be evaluated from the camera device, and for receiving additional image data from the three-dimensional camera device. The additional image data represent the three-dimensional image. The processing device is also configured for resampling at least a portion of the image data based on the additional image data.
Other devices, apparatus, systems, methods, features and advantages of the invention will be or will become apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the accompanying claims.
The invention may be better understood by referring to the following figures. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. In the figures, like reference numerals designate corresponding parts throughout the different views.
a) is a schematic representation of an example of a 2D image.
b) is a schematic representation illustrating a resampling of portions of the image of
a), 8(b) and 8(c) are schematic representations of portions of the 2D image of
Hereinafter, examples of implementations of the invention will be explained with reference to the drawings. It is to be understood that the following description is given only for the purpose of better explaining the invention and is not to be taken in a limiting sense. It is also to be understood that, unless specifically noted otherwise, the features of the various implementations described below may be combined with each other.
The first input 116 of the processing device 112 may be coupled to a two-dimensional (2D) camera 128 that captures the image to be evaluated and provides the image data representing the image to the processing device 112. The 2D camera 128 may be configured, e.g., as a CMOS or CCD camera and may include additional circuitry to process the image data prior to outputting the image data to the processing device 112. For example, the image data may be filtered or suitably encoded before being output to the processing device 112.
The second input 120 of the processing device 112 may be coupled to a three-dimensional (3D) camera device 132. The 3D camera device 132 may include a 3D camera 136 and an object identification device 140 coupled to the 3D camera 136. The 3D camera 136 captures additional (3D) image data. This additional image data represents a three-dimensional image including depth information for a plurality of viewing directions, i.e., information on a distance of a closest obstacle located along a line of sight in one of the plurality of viewing directions. The object identification device 140 receives the additional image data representing the three-dimensional image from the 3D camera 136 and determines the lateral positions of objects within the field of view of the 3D camera 136 and their respective distances based on the depth information. The object identification device 140 may be configured to perform a segmentation algorithm, in which adjacent pixels that have comparable distances from the 3D camera are assigned to belong to one object. Additional logical functions may be incorporated into the object identification device 140. For example, if only vehicles are to be identified in the image data, then only regions of pixels in the additional image data that have shapes similar to a rectangular or trapezoidal shape may be identified, so that objects that do not have a shape that is typically found for a vehicle are not taken into account when evaluating the image data. The object identification device 140 may identify the lateral positions of all objects of interest in the additional image data, i.e., the coordinates of regions in which the objects are located, and may determine a distance of the respective objects relative to the 3D camera 136. This data, also referred to as “object list” in the following, is then provided to the processing device 112.
The 2D camera 128 and the 3D camera 136 of the 3D camera device 132 may be arranged and configured such that a field of view of the 2D camera 128 overlaps with a field of view of the 3D camera 136. In one implementation, the fields of view essentially coincide. For simplicity, it will be assumed that the 2D camera 128 and the 3D camera 136 are arranged sufficiently close to one another that the depth information captured by the 3D camera 136 also provides a good approximation for the distance of the respective object from the image plane of the 2D camera 128. It will be appreciated that, in other implementations, the 2D camera 128 and the 3D camera 136 may also be arranged remotely from each other, in which case a distance of an object relative to the image plane of the 2D camera 128 may be derived from the depth information captured by the 3D camera 136, when the position of the 3D camera 136 relative to the 2D camera 128 is known.
The processing device 112 receives the object list from the 3D camera device 132, which includes distance information for at least one object, and usually plural objects, that are represented in the image captured by the 2D camera 128. As will be explained in more detail with reference to
The apparatus 104 may be coupled to the assistance device 108 via a bus 144 to provide a result of the image evaluation to the assistance device 108. The assistance device 108 may include a control device 148, and an output unit or warning device 152 and an occupant and/or pedestrian protection device 156 coupled to the control device 148. Based on the signal received from the apparatus 104 via the bus 144, the control device 148 actuates one or both of the warning device 152 and the protection device 156. The warning device 152 may be configured for providing at least one of optical, acoustical or tactile output signals based on a result of an image evaluation performed by the apparatus 104. The occupant and/or pedestrian protection device 156 may also be configured to be actuated based on a result of an image evaluation performed by the apparatus 104. For example, the protection system 156 may include a passenger airbag that is activated when a collision with a vehicle is predicted to occur based on the result of the image evaluation, and/or a pedestrian airbag that is activated when a collision with a pedestrian is predicted to occur.
For reasons of simplicity, the method 200 has been explained above with reference to a case in which only one object of interest is represented by the image data. When plural objects of interest are visible in the image, the steps 204-206 may be performed for each of the objects, or for a subset of the objects that may be selected in dependence on the object types of interest, for example by discarding objects that do not have a roughly rectangular or trapezoidal boundary. It will be appreciated that the distance information retrieved at step 204 may vary for different objects, and that the resampling performed at step 208 may correspondingly vary in accordance with the different distances relative to the image plane. When the image data represent several objects, steps 204-210 may be performed successively for all objects, or the step 204 may first be performed for each of the objects, and subsequently the step 206 is performed for each of the objects, etc.
The further analysis of the resampled image data at step 210 may, e.g., include comparing the resampled image data to reference data to classify the object. The further analysis of the resampled image data may also include utilizing the resampled image data, e.g., to build up a database of imaged objects, to train image recognition algorithms, or the like.
In one implementation, the analyzing at step 210 includes classifying the object, i.e., assigning the object to one of a plurality of object types or classes. For example, the storage device 124 illustrated in
The reference data stored in the storage device 124 may have various forms depending on the specific implementation of the analyzing process in step 210. For example, the analyzing performed at step 210 may be based on a learning algorithm that is trained to recognize specific object types. In this case, the reference data may be a set of parameters that control operation of the learning algorithm and have been trained through the use of images of reference objects located at the reference distance from the image plane. In another implementation, the analyzing process may include determining whether the object represented by the resampled image data has specific geometrical properties, colors, color patterns, or sizes, which may be specified by the reference data. In another implementation, the analyzing process may include a bit-wise comparison of the resampled image data with a plurality of images of reference objects of various object types taken when the reference objects are located approximately at the reference distance from the image plane.
Irrespective of the specific implementation of the analyzing step 210, the reference data, may be generated based on an image of at least one of the reference objects located at a distance from the image plane that is approximately equal to the reference distance. The analyzing step 210 is then well adapted to classify the object based on the resampled image data, which has been obtained by a distance-dependent resampling.
A result of the analyzing step 210 may be output to a driver assistance system such as the driver assistance device 108 illustrated in
The distance information retrieved at step 204, based on which the portion of the image is resampled at step 208, may be obtained in any suitable way. In the apparatus 104 of
As noted above, additional logical functions may be employed to identify objects in the additional image data, e.g., by evaluating the shape and/or symmetry of the pixels having comparable depth values. For example, only structures of pixels in the additional image data that have a square or trapezoidal shape may be selected for further processing if vehicles are to be identified in the image data. In this manner, evaluating the image data may be restricted to the relevant portions of the image data, thereby enhancing processing speeds.
By utilizing the additional image data representing a three-dimensional image, the portion of the image data representing the object may be conveniently identified, and the distance of the object relative to the image plane may also be determined from the additional image data. In this manner, the image may be evaluated by using both the image data and the additional image data, i.e., by combining the information of a two-dimensional (2D) image and a three-dimensional (3D) image. In this context, the term “depth information” generally refers to information on distances of objects located along a plurality of viewing directions represented by pixels of the three-dimensional image.
At step 308, a portion of the image data is selected based on the additional image data. The object list generated at step 306 includes information on the pixels or pixel regions in the additional image data that represent an object. The portion of the image data is selected by identifying the pixels in the image data that correspond to the pixels or pixel regions in the additional image data specified by the object list. If the 2D image and the 3D image have identical resolution and an identical field of view, there is a one-to-one correspondence between a pixel in the image data and a pixel in the additional image data. If, however, the 3D image has a lower resolution than the 2D image, several pixels of the image data correspond to one pixel of the additional image data.
At step 310, the portion of the image data that has been selected at step 308 is resampled based on the distance information contained in the object list and the pre-determined reference distance to generate resampled image data, as has been explained with reference to step 208 of the method 200 described above (
When several objects having various distances from the image plane are identified in the additional image data, each of the portions of the image data that represents one of the objects is resampled based on the respective distance information and the pre-determined reference distance.
As will be explained with reference to
a) is a schematic representation of an example of a 2D image. In particular,
b) is a schematic representation of an image 450 illustrating a resampling of portions of the 2D image 400 of
The resampling of a portion of the image data representing an object based on a 3D image will be explained in more detail with reference to
Based on the additional image data 700, a segmentation algorithm identifies portions 702, 712 and 722 and assigns them to different objects of an object list. For each of the objects, a distance value is determined, e.g., as the lowest distance value in one of the images 704, 714 and 724, respectively, or as a weighted average of the distance values in the respective image 704, 714 or 724.
It is to be understood that, while not shown in
Based on the lateral positions of the portions 702, 712 and 722 in the additional image data 700 of
In one implementation, a portion of the image data representing an object is upsampled when the object is located at a distance d from the image plane that is larger than the pre-determined reference distance dref, the upsampling factor being
sf
up
=d/d
ref (1)
and the portion of the image data is downsampled when the object is located at a distance d from the image plane that is smaller than the pre-determined reference distance dref, the downsampling factor being
sf
down
=d
ref
/d. (2)
In one implementation, in order to determine an upsampling factor or downsampling factor, the fractions on the right-hand sides of Equations (1) and (2) are approximated by a rational number that does not have too large numerical values in the numerator and denominator, respectively, or the right-hand sides may be approximated by an integer.
In other implementations the upsampling and downsampling factors sfup and sfdown, respectively, may be determined in other ways. For example, the focal length of the 2D camera may be taken into account to model the variations of image size with object distance, and the resampling factors may be determined by dividing the image size in pixels that would have been obtained for an object located at the reference distance from the image plane by the image size in pixels obtained for the actual object distance.
Returning to the example of
a), 8(b), and 8(c) schematically illustrate resampled image data obtained by resampling the portions 602 and 622 of the image data 600 shown in
b) shows the image 614 of the vehicle 560 (
c) shows resampled image data 822 obtained by upsampling the portion 622 of the image data 600 (
As may be seen from
Upsampling and downsampling of portions of the image data 600 may also be performed in other ways than the ones described above. For example, in downsampling, filters may be employed that model the changing resolution as a vehicle is located further away from the image plane. Thereby, the level of detail that may still be recognized in the resampled image data may be controlled more accurately. Upsampling may also be performed by using interpolating functions to interpolate, e.g., pixel color values when adding more pixels. Upsampling may also be performed by capturing a new image of the field of view in which the portion to be upsampled is located, i.e., by zooming into this field of view using the 2D camera to capture a new, higher resolution image.
At step 908, an object is selected from the object list, and its distance relative to the image plane is retrieved. At step 910, a portion of the image data representing the 2D image is determined that contains at least part of the object. The determining step at step 910 may again include matching the 2D and 3D images, e.g., by mapping pixels of the 3D image onto corresponding pixels of the 2D image.
At step 912, the distance d retrieved from the object list is compared to the reference distance dref. If d is less than or equal to dref, at step 914, the portion of the image data is upsampled by an upsampling factor sfup that may be determined, e.g., as explained with reference to Equation (1) above. If d is larger than dref, at step 916, the portion of the image data is downsampled by a downsampling factor sfdown that may be determined, e.g., as explained with reference to Equation (2) above.
At step 918, the object is then classified based on the resampled image data. Object classification may be performed as explained with reference to step 312 in
At step 920, a new object is selected from the object list and its distance information is retrieved, and the steps at 910-918 are repeated.
The method 900 may be repeated at regular time intervals. For example, when the apparatus 104 (
It is to be understood that the configuration of the apparatus 104 for evaluating an image shown in
The apparatus 1004 includes a processing device 1012, which has a first input 1016 to receive image data representing the image to be evaluated and a second input 1020 to receive distance information on a distance of an object that is represented by the image relative to an image plane. The processing device 1012 is further coupled to a storage device 1024 that has stored thereon reference data for object classification.
The apparatus 1004 further comprises a 3D camera device 1030 that includes a 3D camera 1034, e.g., a stereo camera, an object identification device 1040 and an image processor 1038. The object identification device 1040 is coupled to the 3D camera 1034 to identify objects in a 3D image taken by the 3D camera 1034, e.g., in the two images taken by a stereo camera, and their position relative to an image plane of the 3D camera 1034, and to provide this information to the processing device 1012 at the second input 1020. The image processor 1038 is coupled to the 3D camera 1034 to generate image data representing a 2D image based on the 3D image taken by the 3D camera 1034. For example, when the 3D camera 1034 is a stereo camera, the image processor 1038 may generate a 2D image by merging data from the two images captured by the stereo camera, or the 2D image may be set to be identical to one of the two images captured by the stereo camera. The image data representing the 2D image are provided to the processing device 1012 at the first input 1016.
The processing device 1012 receives the distance information at the second input 1020 and the image data at the first input 1016, and resamples a portion of the image data based on the distance information and a pre-determined reference distance. The processing device 1012 may operate according to any one of the methods explained with reference to
According to another aspect of the invention, a data storage medium is provided which has stored thereon instructions which, when executed by a processor of an electronic computing device, direct the computing device to perform the method according to any of the implementations described above. The electronic computing device may be configured as a universal processor that has inputs for receiving the image data and the additional image data. The electronic computing device may also comprise a processor, a CMOS or CCD camera and a PMD camera, the processor retrieving the image data from the CMOS or CCD camera and the additional image data from the PMD camera.
It is to be understood that the above description of implementations is illustrative rather than limiting, and that various modifications may be implemented in other implementations. For example, while the object identification device 140 of the apparatus 104 and the object identification device 1040 of the apparatus 1004 have been shown to be provided by the 3D camera devices 132 and 1030, respectively, the object identification device 140 or 1040 may also be formed integrally with the processing device 112 or 1012, respectively, i.e., the object list may be generated by the processing device 112 or 1012.
It is also to be understood that the various physical entities, such as the 2D camera, the 3D camera, the processing device, the object identification device, and the storage device of the apparatus, may be implemented by any suitable hardware, software or combination thereof. For example, the 2D camera may be a CMOS camera, a CCD camera, or any other camera or combination of optical components that provides image data. Similarly, the 3D camera may be configured as a PMD camera, a stereo camera, or any other device that is suitable for capturing depth information. The processing device may be a special purpose circuit or a general purpose processor that is suitably programmed.
Further, various components of the apparatus shown in
While implementations of the invention have been described with reference to applications in driver assistance systems, the invention is not limited to this application and may be readily used for any application where images are to be evaluated. For example, implementations of the invention may also be employed in evaluating images captured in security-related applications such as in the surveillance of public areas, or in image analysis for biological, medical or other scientific applications.
It will be understood, and is appreciated by persons skilled in the art, that one or more processes, sub-processes, or process steps described in connection with
The foregoing description of implementations has been presented for purposes of illustration and description. It is not exhaustive and does not limit the claimed inventions to the precise form disclosed. Modifications and variations are possible in light of the above description or may be acquired from practicing the invention. The claims and their equivalents define the scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
EP 07 015 282.2 | Aug 2007 | EP | regional |