The present invention relates to an operation support system. In particular, the present invention relates to a technology for estimating, from a result of shooting by a camera fitted to a mobile object, a three-dimensional object area, i.e. an area where a three-dimensional object appears. The present invention also relates to a vehicle employing such an operation support system.
A three-dimensional object standing on a road surface can be an obstacle to a vehicle, and driver's overlooking it may lead to a collision accident. Such collision accidents are particularly likely to occur in the blind spots of drivers. Thus there has been proposed a technique according to which a vehicle is fitted with a camera for monitoring areas that tend to be the driver's blind spots so that an image obtained from the camera is displayed on a display device disposed near the driver's seat. There has also been developed a technology for converting a camera image obtained from a camera into a bird's eye view image for display. The bird's eye view image is an image of a vehicle as viewed from up in the sky, and displaying it makes it easier for the driver to have the sense of distance to a three-dimensional object.
There has also been developed a technique for detecting a three-dimensional object around a vehicle by using an image processing technology and a sensor. Such a technique is advantageous, because the capability of detecting three-dimensional objects around a vehicle makes it possible, for example, to show the presence of a three-dimensional object on a display device and output an alarm according to a detection result of the three-dimensional object.
There has been proposed a technique of using a stereo camera to detect three-dimensional objects around a vehicle. However, use of a stereo camera itself, which is composed of two cameras, invites a cost increase. Also, the positions and angles of the two cameras need to be adjusted with high accuracy, and this makes it troublesome to introduce the technique.
In view of the above, there has been disclosed a technique for detecting a three-dimensional object around a vehicle by use of a monocular camera, for example, in Patent Document 1. According to this technique, camera motion parameters are obtained by a method of least squares by use of information of five or more feature points on a road surface, and based on the thus obtained camera motion parameters, bird's eye view images of adjacent frames are superimposed on each other, to thereby detect a three-dimensional object that appears to be rising up from the road surface in an image.
Patent Document 1: JP-A-2003-44996
Recognition of feature points on a road surface is essential in order to detect a three-dimensional object according to the technique disclosed in Patent Document 1. A large number of feature points extracted from an image obtained from a camera include feature points on a road surface, but inconveniently, the fact is that no method is proposed for determining whether each feature point extracted from the image is one on the road surface or one on a three-dimensional object. As a result, it is impossible, with the technique disclosed in Patent Document 1, to estimate a three-dimensional area in an image with desired accuracy. Furthermore, a complicated operation is required to obtain the camera motion parameters by the method of least squares using information of five or more feature points on a road surface, and this hinders the realization of a simple system structure.
In view of the foregoing, an object of the present invention is to provide an operation support system and a method for estimating a three-dimensional object area capable of estimating, with desired accuracy, a three-dimensional object area based on an image obtained from a camera. Another object of the present invention is to provide a vehicle employing such a system and a method.
According to one aspect of the present invention, an operation support system is provided with a camera fitted to a mobile object to shoot surroundings of the mobile object, and which estimates, based on camera images on a camera coordinate plane obtained from the camera, a three-dimensional object area in an image based on the camera image. Here, the operation support system comprises: an image acquisition portion which acquires first and second camera images shot by the camera at first and second time points, respectively, while the mobile object is moving, the first and second time points being different from each other; a movement vector detection portion which extracts n feature points (where n is an integer of 2 or more) from the first camera image, and which also detects movement vectors, on the camera coordinate plane, of the feature points between the first and second camera images; a bird's eye conversion portion which projects the camera images, and the feature points and the movement vectors on the camera coordinate plane onto a bird's eye view coordinate plane which is parallel to ground to thereby convert the first and second camera images into first and second bird's eye view images, respectively, and detect positions of the feature points on the first bird's eye view image and movement vectors of the feature points on the bird's eye view coordinate plane between the first and second bird's eye view images; a determination portion which determines, by use of a restraint condition for a ground feature point located on the ground to satisfy, whether or not a target feature point on the first bird's eye view image is the ground feature point; a movement information estimation portion which estimates movement information of the mobile object between the first and second time points based on positions on the first bird's eye view image, and movement vectors on the bird's eye view coordinate plane, of two or more feature points which are each judged as the ground feature point; and a three-dimensional object area estimation portion which estimates the three-dimensional object area based on the first and second bird's eye view images and the movement information.
This makes it possible to detect a ground feature point, and thus desirably accurate estimation of a three-dimensional object area can be expected.
Specifically, for example, the restraint condition may define a relationship which should be satisfied by a rotation angle and a parallel movement amount of the mobile object between the first and second time points and a position on the first bird's eye view image, and a movement vector on the bird's eye view coordinate plane, of the ground feature point.
For further example, the determination portion may extract, as target feature points, two or more feature points from among the n feature points on the first bird's eye view image, and determine whether or not the target feature points are each the ground feature point by determining whether or not the target feature points satisfy the restraint condition.
Further specifically, for example, the determination portion may extract, as target feature points, two or more feature points from among the n feature points on the first bird's eye view image, obtain two or more estimation values of the rotation angle and two or more estimation values of the parallel movement amount by applying the two or more target feature points to the relationship, on an assumption that the two or more target feature points are each the ground feature point, and determine whether or not the target feature points are each the ground feature point, based on a variation among the estimation values of the rotation angle and a variation among the estimation values of the parallel movement amount.
For further example, the movement information may include information which indicates the rotation angle and the parallel movement amount of the mobile object.
Specifically, for example, the three-dimensional object area estimation portion may correct, based on the movement information, displacement between the first and second bird's eye view images attributable to the mobile object moving between the first and second time points, and estimate the three-dimensional object area based on a comparison result between the first and second bird's eye view images after the displacement is corrected.
Specifically, for example, the three-dimensional object area which is estimated may correspond to an area where a three-dimensional object appears in the first camera image, in the second camera image, in the first bird's eye view age, or in the second bird's eye view image.
According to another aspect of the present invention, a vehicle is provided with any one of the above-described operation support systems.
According to another aspect of the present invention, a three-dimensional object area estimation method is a method for estimating, based on a camera image on a camera coordinate plane obtained from a camera fitted to a mobile object to shoot surroundings of the mobile object, a three-dimensional object area in an image based on the camera image. Here, the three-dimensional object area estimation method comprises: an image acquisition step for acquiring first and second camera images shot by the camera at first and second time points, respectively, while the mobile object is moving, the first and second time points being different from each other; a movement vector detection step for extracting n feature points (where n is an integer of 2 or more) from the first camera image, and detecting movement vectors of the feature points on the camera coordinate plane between the first and second camera images; a bird's eye conversion step for projecting the camera images, and the feature points and the movement vectors on the camera coordinate plane onto a bird's eye view coordinate plane which is parallel to ground, to thereby convert the first and second camera images into first and second bird's eye view images, respectively, and detect positions of the feature points on the first bird's eye view image and movement vectors of the feature points on the bird's eye view coordinate plane between the first and second bird's eye view images; a determination step for determining, by use of a restraint condition for a ground feature point located on the ground to satisfy, whether or not a target feature point on the first bird's eye view image is the ground feature point; a movement information estimation step for estimating movement information of the mobile object between the first and second time points based on positions on the first bird's eye view image, and movement vectors on the bird's eye view coordinate plane, of two or more feature points which are each judged as the ground feature point; and a three-dimensional object area estimation step for estimating the three-dimensional object area based on the first and second bird's eye view images and the movement information.
According to the present invention, it is possible to estimate a three-dimensional object area with desirable accuracy based on an image obtained from a camera.
The significance and benefits of the invention will be clear from the following description of its embodiments. It should however be understood that these embodiments are merely examples of how the invention is implemented, and that the meanings of the terms used to describe the invention and its features are not limited to the specific ones in which they are used in the description of the embodiments.
Hereinafter, embodiments of the present invention will be described specifically with reference to the drawings. Among different drawings referred to in the course of description, the same parts are identified by the same reference signs, and in principle no overlapping description of the same parts will be repeated. Prior to the descriptions of practical examples (Examples 1 to 3), the features common to them, or referred to in their descriptions, will be described first.
An image obtained by the shooting performed by the camera 1 is called a camera image. A camera image represented by the output signal as it is of the camera 1 is often under the influence of lens distortion. Accordingly, the image processing device 2 performs lens distortion correction on a camera image represented by the output signal as it is of the camera 1, and then generates a display image based on the camera image that has undergone the lens distortion correction. In the following description, a camera image refers to one that has undergone lens distortion correction. Depending on the characteristics of the camera 1, however, lens distortion correction may be omitted.
The camera 1 shoots the surroundings of the vehicle 100. The camera 1 is installed on the vehicle 100 so as to have a field of view, in particular, rearward of the vehicle 100. The field of view of the camera 1 covers the road surface located rearward of the vehicle 100. In the following description, it is assumed that the ground lies on the horizontal plane, and that a “height” denotes one relative to the ground. Moreover, in the embodiment under discussion, the ground is synonymous with a road surface.
Used as the camera 1 is a camera employing a solid-state image-sensing device such as a CCD (charge-coupled device) or CMOS (complementary metal oxide semiconductor) image sensor. The image processing device 2 is formed with, for example, an integrated circuit. The display device 3 is formed with a liquid crystal display panel, etc. A display device included in a car navigation system or the like may be shared as the display device 3 in the operation support system. The image processing device 2 may be integrated into a car navigation system as a part thereof. The image processing device 2 and the display device 3 are installed, for example, near the driver's seat in the vehicle 100.
The image processing device 2, by use of coordinate conversion, converts a camera image into an image as seen from the point of view of a virtual camera, and thereby generates a bird's eye view image. The coordinate conversion for generating a bird's eye view image from a camera image is called “bird's eye conversion”.
A plane perpendicular to the direction of the optical axis of the camera 1 is taken as a camera coordinate plane. In
On the other hand, a plane parallel to the ground is taken as a bird's eye view coordinate plane.
A bird's eye view image is a result of a camera image, which is defined on the camera coordinate plane, being projected onto the bird's eye view coordinate plane, and the bird's eye conversion for carrying out such projection can be achieved by one of known methods of coordinate conversion. For example, perspective projection conversion may be used, in which case a bird's eye view image can be generated by converting, according to formula (A-1) below, the coordinates (xbu, ybu) of each pixel on a camera image into coordinates (xau, yau) on the bird's eye view image. Here, the symbols f, h, and H represent, respectively, the focal length of the camera 1, the height at which the camera 1 is arranged, and the height at which the above-mentioned virtual camera is arranged. It is here assumed that the image processing device 2 previously knows the values of f, h, H, and θA (see
In practice, beforehand, according to formula (A-1), a table data is created which shows the correspondence between the coordinates (xbu, ybu) of each pixel on the camera image and the coordinates (xau, yau) of each pixel on the bird's eye view image, and the table data is stored in an unillustrated memory to form a lookup table (hereinafter referred to as the “bird's eye conversion LUT”). In actual operation, by use of the bird's eye conversion LUT, a camera image is converted into a bird's eye view image. Needless to say, a bird's eye view image may be generated by performing coordinate conversion calculation based on formula (A-1) each time a camera image is obtained.
The operation and structure of the operation support system of
First, Example 1 will be described. The image processing device 2 shown in
The image processing device 2 is provided with a function of estimating a three-dimensional object area within an image. A three-dimensional object area denotes an area in which a three-dimensional object appears. A three-dimensional object is an object with height, such as a person. Any object without height, such as a road surface forming the ground, is not a three-dimensional object. A three-dimensional object can be an obstacle to the traveling of the vehicle 100.
In bird's eye conversion, coordinate conversion is so performed that a bird's eye view age has continuity on the ground surface. Accordingly, when two bird's eye view images are obtained by the shooting of a single three-dimensional object from different viewpoints, in principle, whereas the image of the road surface coincides between the two bird's eye view images, the image of the three-dimensional object does not (see, for example, JP-A-2006-268076). This characteristic is utilized in this example to estimate a three-dimensional object area.
With reference to
Estimating a three-dimensional object area requires a plurality of camera images shot at different time points. Accordingly, in step S11, the image processing device 2 acquires a plurality of camera images shot at different time points. Here, it is assumed that the thus acquired camera images include one shot at time t1 (hereinafter called camera image at time point t1) and one shot at time point t2 (hereinafter called camera image at time point t2). The camera images at time points t1 and t2 will now be referred to as the camera images I1 and I2, respectively. It is also assumed that time point t1 comes before time point t2. More precisely, for example, time point t1 is the midpoint of the exposure period of the camera image I1, and time point t2 is the midpoint of the exposure period of the camera image I2. It is further assumed that, during the period between time points t1 and t2, the vehicle 100 moves. Accordingly, the viewpoint of the camera 1 differs between at time t1 and at time t2.
After the camera images I1 and I2 are acquired, at step S12, a plurality of (for example, a thousand) feature points are extracted from the camera image I1. A feature point is a point which is distinguishable from surrounding points and easy to track. Such a feature point can be extracted automatically by use of a well-known feature point extractor (unillustrated) which detects a pixel where density greatly varies in the horizontal and vertical directions. Examples of such a feature point extractor include the Harris corner detector, and the SUSAN corner detector. As feature points to be extracted, for example, an intersection or an end point of white lines drawn on a road surface, a stain or a crack of a road surface, and an end portion or a strain of a three-dimensional object are expected.
In step S13 following step S12, the camera image I1 and the camera image I2 are compared with each other to obtain movement vectors of the feature points extracted in step S12. The movement vectors obtained here are each a movement vector, on the camera coordinate plane, between the camera image I1 and the camera image I2 (in other words, between time points t1 and t2). The movement vector of a feature point between two images shows in which direction and how much the feature point has moved between the two images. Here, movement vectors are obtained by use of the publicly-known hierarchization Lucas & Kanade algorithm which is capable of dealing with a large amount of movement. Needless to say, a block matching method or a gradient method may be used to obtain movement vectors. A movement vector is generally called an optical flow vector or a motion vector as well.
In step S14 following step S13, the camera images acquired in step S11 are converted into bird's eye view images according to the bird's eye conversion LUT based on formula (A-1) above. The bird's eye view images based on the camera images I1 and I2 are called the bird's eye view images at time points t1 and t2, respectively, and the bird's eye view images at time points t1 and t2 will now be referred to as the bird's eye view images TI1 and TI2, respectively. The bird's eye view images TI1 and TI2 correspond to images resulting from projecting the camera images I1 and I2, respectively, onto the bird's eye view coordinate plane.
In step S15 following step S14, the feature points extracted from the camera image I1 in step S12 and the movement vectors calculated in step S13 are mapped (in other words, projected) onto the bird's eye view coordinate plane. This mapping is also performed according to the bird's eye conversion LUT based on the above-described formula (A-1) (or according to formula (A-1) itself). By this mapping, the feature points on the camera image I1 are mapped on the bird's eye view image TI1, and thereby the positions of the feature points (that is, the coordinate values (xbu, ybu) of the feature points) on the bird's eye view image TI1 are obtained, and also, the movement vectors on the camera coordinate plane are mapped on the bird's eye view coordinate plane, and thereby the movement vectors of the feature points on the bird's eye view coordinate plane are obtained. Needless to say, the movement vectors obtained here are each a movement vector between the bird's eye view image TI1 and the bird's eye view image T12 (in other words, between time points t1 and t2).
An image 210 of
An image 210a of
As is clear from
a) and 10(b) and
As mentioned above, if given two target feature points are located on the ground surface, the movement vectors of the two feature points on the bird's eye view coordinate plane are uniform. However, if the feature points are located on a three-dimensional object, such uniformity between the movement vectors is, in principle, broken. In step S16 following step S15, this characteristic is used to extract feature points located on the ground surface. Hereinafter, a feature point located on the ground surface will be referred to as a ground feature point, and a feature point located on a three-dimensional object will be referred to as a three-dimensional object feature point. In the real space, ground feature points are located at zero height (or practically zero height), and feature points which are not classified as the ground feature point are all three-dimensional object feature points.
The principle of the processing performed in step S16 will now be described. Note that, in the description below, a movement vector means a movement vector on the bird's eye view coordinate plane between time points t1 and t2, unless otherwise stated.
Now, attention is focused on a target ground feature point, and the coordinate values (xau, yau) of the target ground feature point on the bird's eye view images TI1 and TI2 are represented by (x1, y1) and (x2, y2), respectively. The movement vector of the target ground feature point is represented by (fx, fy), where fx and fy are a horizontal component (that is to say, Xau axis component) and a vertical component (that is to say, Yau axis component), respectively, of the movement vector (see
Furthermore, it is assumed that the vehicle 100 is moving while making a turn between time points t1 and t2 as shown in
On the other hand, a three-dimensional orthogonal coordinate system having its origin set at the optical center of the cameral 1 will be considered.
A point 371 in
The above-described rotation angle θ and the parallel movement amount (Tx, Ty) satisfy the relationship represented by the following formula (B-2).
The camera images I1 and I2 are normally two temporally adjacent frames which are serially acquired. Thus, when the vehicle 100 moves at a low speed or when the frame rate is sufficiently high, it is possible to consider that cos θ≈1 and sin θ≈θ. By applying these approximate values to formula (B-2), the following formula (B-3) is obtained.
Furthermore, substituting the above formula (B-1) into formula (B-3) and modifying the resulting formula gives formula (B-4).
Here, the values fx, fy, y1, and (−x1) are obtained from the result of the processing performed in step S15 in
Thus, attention will be focused on two different ground feature points on the bird's eye view image TI1, and the two ground feature points will be referred to as first and second ground feature points. Assume that the coordinate values (xau, yau) of the first and second ground feature points on the bird's eye view image TI1 are (x11, Y11) and (x12, y12), respectively. In addition, assume that the movement vectors of the first and second ground feature points are represented by (fx1, fy1) and (fx2, fy2), respectively. Then, the following formulae (B-5a) and (B-5b) are obtained from the above formula (B-4). Furthermore, formula (B-6) is obtained from a difference between formulae (B-5a) and (B-5b), and moreover, formulae (B-7a) and (B-7b) are obtained from formula (B-6).
Two values of the rotation angle θ are calculated by use of formulae (B-7a) and (B-7b), and two values of the parallel movement amount (Tx, Ty) are calculated by substituting the two calculated values of the rotation angle θ respectively into formulae (B-5a) and (B-5b). If the values (x11, y11) and (x12, y12) and the values (fx1, fy1) and (fx2, fy2) are truly related to the ground feature points, the two calculated values of the rotation angle θ are completely or substantially equal to each other, and simultaneously, the two calculated values of the parallel movement amount (Tx, Ty) are completely or substantially equal to each other. Thus, by checking the agreement of the values with respect to given two feature points on the bird's eye view image TI1, it is possible to determine whether or not the two feature points are ground feature points.
The specific procedure of the processing performed in step S16 in
First, in step S31, two feature points are chosen from a plurality of feature points formed on the bird's eye view image TI1 by the mapping in step S15 in
As shown in
In the following step S32, it is judged whether or not the movement vectors VEC1 and VEC2 of the two target feature points are similar to each other. The similarity of the movement vectors is assessed in terms of both magnitude and direction. Magnitudes of the vectors VEC1 and VEC2 are denoted by |VEC1| and |VEC2|, respectively. As shown in
In step S33, feature point information of the target feature points are substituted into formulae (B-7a) and (B-7b). Here, the feature point information of the target feature points is, for example, information representing the coordinate values (for example, (xL1, yL1)) and the movement vectors of the target feature points. That is, in step S33, xL1, yL1, xL2, yL2, fLx1, fLy1, fLx2, and fLy2 are substituted for x11, y11, x12, y12, fx1, fy1, fx2, and fy2, respectively, in formulae (B-7a) and (B-7b). And, θ obtained from formula (B-7a) and θ obtained from formula (B-7b) as a result of this substitution are denoted by θ1 and θ2, respectively. θ1 and θ2 can be called estimated values of the rotation angle θ obtained on the assumption that the two target feature points are ground feature points.
In step S34 following step S33, it is judged whether or not the absolute value Δθ(=|θ1−θ2|) of the difference between θ1 and θ2 is larger than a predetermined positive reference angle θTH, and if the relationship Δθ>θTH holds, it is presumed that at least one of the two target feature points is not a ground feature point, and the process returns to step S31, where different feature points are chosen. On the other hand, if the relationship Δθ>θTH does not hold, the process proceeds to step S35, where θ1 and θ2 are substituted for θ in formulae (B-5a) and (B-5b), respectively, to obtain the parallel movement amount (Tx, Ty). In doing so, the feature point information of the target feature points is substituted into formulae (B-5a) and (B-5b). That is, xL1, yL1, xL2, yL2, fLx1, fLx2, and fLy2 are substituted for x11, y11, x12, y12, fx1, fy1, fx2, and fy2, respectively, of formulae (B-5a) and (B-5b). The (Tx, Ty) obtained from formula (B-5a) is represented by (Tx1, Ty1), and the (Tx, Ty) obtained from formula (B-5b) is represented by (Tx2, Ty2). (Tx1, Ty1) and (Tx2, Ty2) can be called estimated values of the parallel movement amount (Tx, Ty) obtained on the assumption that the two target feature points are ground feature points.
Thereafter, in step S36, ΔL=(Tx1−Tx2)2+(Ty1−Ty2)2 is calculated, and it is judged whether or not ΔL is larger than a predetermined positive threshold value LTH. If the relationship ΔL>LTH holds, it is presumed that at least one of the two target feature points is not a ground feature point, and the process returns to step S31, where different feature points are chosen. On the other hand, if the relationship ΔL>LTH does not hold, the process proceeds to step S37, where it is determined that the two currently-chosen target feature points are ground feature points.
When two ground feature points are detected through step S37, the process in
With the method described above, two feature points are chosen as the target feature points, but three or more feature points may be chosen as the target feature points. For example, in a case in which four feature points are chosen as the target feature points in step S31 in
If Δθ1-4 is larger than a predetermined positive threshold value, the process returns to step S31, where feature points are chosen anew. If not, on the other hand, the parallel movement amount (Tx, Ty) is calculated with attention focused on each of the first and second groups. First, with attention focused on the first group, the feature point information of the target feature points of the first group is substituted into formulae (B-5a) and (B-5b), and simultaneously, θ1 and θ2 are substituted for θ of formula (B-5a) and θ of formula (B-5b), respectively, to thereby obtain parallel movement amounts (Tx, Ty). The (Tx, Ty) obtained from formula (B-5a) is represented by (Tx1, Ty1), and the (Tx, Ty) obtained from formula (B-5b) is represented by (Tx2, Ty2). Next, with attention focused on the second group, the feature point information of the target feature points of the second group is substituted into formulae (B-5a) and (B-5b), and simultaneously, θ3 and θ4 are substituted for θ of formula (B-5a) and θ of formula (B-5b), respectively, to thereby obtain parallel movement amounts (Tx, Ty). The (Tx, Ty) obtained from formula (B-5a) is represented by (Tx3, Ty3), and the (Tx, Ty) obtained from formula (B-5b) is represented by (Tx4, Ty4). Then, according to formula (C-2) below, ΔL1-4 is calculated. That is, ΔL1-4 is obtained as a total amount of {(Txi−Txj)2+(Tyi−Tyj)2} with respect to (i, j)=(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), and (3, 4).
If ΔL1-4 is larger than the predetermined positive threshold value, the process returns to step S31, where feature points are chosen anew. If not, on the other hand, the process proceeds to step S37, where it is determined that the four target feature points are ground feature points, and by use of θi, Txi and Tyi (here, i=1, 2, 3, 4) based on the ground feature point information of the four ground feature points, the vehicle movement information is generated according to θ=(θ1+θ2+θ3+θ4)/4, Tx=(Tx1+Tx2+Tx3+Tx4)/4 and Ty=(Ty1+Ty2+Ty3+Ty4)/4.
A detailed discussion will be given of the method of extracting ground feature points described above. The above described formulae (B-5a), (B-5b), (B-7a), and (B-7b) are restraint formulae prescribing relationship which the rotation angle θ and the parallel movement amount (Tx, Ty), the coordinate values (x11, y11) and (x12, y12) of the ground feature points, and the movement vectors (fx1, fy1) and (fx2, fy2) should satisfy. In other words, these formulae represent a restraint condition which ground feature points should satisfy. With the above-described method, two or more feature points are extracted, from among a group of feature points on the bird's eye view image TI1, as target feature points, and then it is judged whether or not the two or more target feature points (hereinafter, also collectively referred to as target-feature-point group) satisfy the above restraint condition. And, only when the restraint condition is satisfied, it is determined that the target feature points are ground feature points.
In practice, by applying the feature point information of the target feature points to the restraint formulae on the assumption that the target feature points are ground feature points, two or more estimated values (such as θ1 and θ2) of the rotation angle are obtained, and simultaneously two or more estimated values (such as (Tx1, Ty1) and (Tx2, Ty2)) of the parallel movement amount are obtained. Then, an indicator (such as Δθ and Δθ1-4 described above) indicating a variation among the estimated values of the rotation angle and an indicator (such as ΔL and ΔL1-4 described above) indicating a variation among the estimated values of the parallel movement amount are calculated, and based on the degrees of the variations, it is judged whether or not the restraint condition is satisfied. Only when the variations among the estimated values of both the rotation angle and the parallel movement amount are comparatively small, it is determined that the restraint condition is satisfied, and the process proceeds to step S37 in
Incidentally, in the above method, ground feature points are extracted and vehicle movement information is generated on the assumption that the vehicle 100 is rotating (that is, turning) while moving. When the vehicle 100 is moving straight, a rotation angle θ of zero degrees is accordingly obtained. The straight moving state can be taken as a rotation state at the rotation angle θ of zero degrees.
Refer to
More specifically, geometric conversion by use of the rotation angle θ and the parallel movement amount (Tx, Ty) is applied to the bird's eye view image TI1 to generate a reference image TS1. This geometric conversion is performed according to the following formula (D-1) corresponding to formula (B-3) described above. Pixels located at coordinate values (xau, yau) on the bird's eye view image TI1 are converted by the geometric conversion to pixels located at coordinate values (Xau′, yau′), and the pixels resulting from the conversion form the reference image TS1. The reference image TS1 corresponds to an image resulting from rotating the bird's eye view image TI1 by the rotation angle θ and also parallelly moving it by the parallel movement amount (Tx, Ty) on the bird's eye view coordinate plane (in practice, the approximate value of θ≈0 is used).
Images 401, 402, 403, and 404 of
For example, the differential image DI can be generated by using a commonly used frame subtraction. That is, a difference value between pixel values of each pair of pixels located at the same coordinate values on the reference image TS1 and the bird's eye view image TI2 is obtained, and an image having the difference values as the pixel values of the pixels thereon is the difference image DI. In
In step S18, furthermore, each pixel value of the differential image DI is binarized to generate a binarized differential image. Specifically, the pixel value (that is, the above-described difference value) of each pixel of the differential image DI is compared with a predetermined threshold value, and a pixel value of 1 is given to a pixel whose pixel value is larger than the threshold value (hereinafter, such a pixel will be referred to as distinctive pixel), while a pixel value of 0 is given to a pixel whose pixel value is not larger than the threshold value (hereinafter, such a pixel will be referred to as non-distinctive pixel). The image 420 of
The position and size of the thus extracted three-dimensional object area are treated as the position and size of the three-dimensional object area on the bird's eye view image TI2. The area out of the three-dimensional object area is estimated as a ground area in which an object without height, such as the road surface, appears. Then, for example, as shown in
It is also possible to estimate the position and size of the three-dimensional object area on the bird's eye view image TI1, the camera image I1, or the camera image I2 based on the position and size of the three-dimensional object area on the bird's eye view image TI2. Application, to the three-dimensional object area on the bird's eye view image TI2, of the inverse conversion of the geometric conversion used to obtain the reference image TS1 from the bird's eye view image TI1, determines the position and size of the three-dimensional object area on the bird's eye view age TI1. Application, to the three-dimensional object areas on the bird's eye view images TI1 and TI2, of the inverse conversion of the geometric conversion (the bird's eye conversion described above) used to obtain the bird's eye view images TI1 and T12 from the camera images I1 and 12, determines the positions and sizes of the three-dimensional object areas on the camera images I1 and I2.
According to the above-discussed example, a ground feature point is accurately extracted by simple operational processing, and this makes it possible to accurately estimate vehicle movement information and a three-dimensional object area with a low operational load. Accurate identification of a three-dimensional object area leads to desirable operation support.
Example 2 will be described next. In Example 1, the differential image DI is generated by obtaining the difference in pixel value between the reference image TS1 and the bird's eye view image TI2 with respect to each pixel. This method, however, is prone to be negatively affected by local noise. In Example 2, a differential image generating method and a three-dimensional object area estimation method less prone to be negatively affected by local noise will be discussed. Example 2 corresponds to an example resulting from partially modifying Example 1, and, unless inconsistent, any feature in Example 1 is applicable to Example 2. The operation performed until the bird's eye view images TI1 and TI2 and the reference image TS1 are obtained via the processing in steps S11 to S17 and part of the processing in step S18 in
In Example 2, the bird's eye view image TI2 and the reference image TS1 are each treated as an operation target image (image with respect to which an operation is performed). And, as shown in
A small block at a block position (m, n) in the bird's eye view image TI2 and a small block at a block position (m, n) in the reference image TS1 are made to correspond to each other. When the bird's eye view image TI1 and the reference image TS1 are superimposed on the same bird's eye view coordinate plane, some areas at their edges do not overlap with each other (see
After setting the small blocks in the above-described manner, a differential image is generated in the following manner. As examples of the method of generating the differential image, first to three generation methods will now be described one by one.
[First Generation Method]
A first generation method will be described. In the first generation method, a color space histogram is obtained for each small block. And, the color space histograms are compared between the bird's eye view mage TI2 and the reference image TS1 to thereby calculate a difference degree ε1. For example, first to Qth divisions are provided in an RGB color space by dividing the RGB color space into Q pieces, and which division each pixel is to belong to is determined by mapping each pixel onto the RGB color space based on its color information (Q is an integer of 2 or more). The color space histograms may be obtained based on a color space other than the RGB color space (for example, an HSV color space). The difference degree ε1 is calculated for each block position, but here, the calculation method will be described with respect to a target block position on which attention is focused.
a) shows a color space histogram hA of a small block at the target block position in the bird's eye view image TI2.
Such a difference degree ε1 is obtained with respect to each block position, block positions where the difference degree ε1 are larger than a predetermine positive threshold value are identified, and small blocks at the identified block positions in the bird's eye view image TI2 are set as component blocks. Small blocks in the bird's eye view image TI2 other than the component blocks are called non-component blocks. And by giving each pixel in the component blocks a pixel value of 1 and giving each pixel in the non-component blocks a pixel value of 0, a differential image as a binarized image is obtained. An example of the thus obtained differential image is shown in
[Second Generation Method]
A second generation method will be described. In the second generation method, an edge intensity histogram is obtained for each small block. And, the edge intensity histograms are compared between the bird's eye view image TI2 and the reference image TS1 to thereby calculate a difference degree ε2.
Specifically, by applying edge extraction processing to each pixel in the bird's eye view image TI2 and the reference image TS1 by use of any edge extraction filter such as a Laplacian filter, a first edge extraction image based on the bird's eye view image TI2 and a second edge extraction image based on the reference image TS1 are generated. As is publicly known, pixel values of pixels forming an edge extraction image indicate edge intensity. First to Qth divisions are provided which are different from each other in edge intensity, and the pixels in the edge extraction images are each classified into one of the first to Qth divisions (Q is an integer of 2 or more) according to their pixel values (that is, edge intensity).
The difference degree ε2 is calculated for each block position, but here, the calculation method will be described with respect to a target block position on which attention is focused.
Such a difference degree ε2 is obtained for each block position, block positions where the difference degree ε2 is larger than a predetermine positive threshold value are identified, and small blocks at the identified block positions in the bird's eye view image T12 are set as component blocks. Small blocks in the bird's eye view image 112 other than the component blocks are called non-component blocks. And by giving each pixel in the component blocks a pixel value of 1 and giving each pixel in the non-component blocks a pixel value of 0, a differential image as a binarized image is obtained.
[Third Generation Method]
A third generation method will be described. In the third generation method, edge direction histograms are obtained one for each small block. And, the edge direction histograms are compared between the bird's eye view image TI2 and the reference image TS1 to thereby calculate a difference degree ε3.
Specifically, by applying edge extraction processing to each pixel in the bird's eye view image T12 and the reference image TS1 by use of any edge extraction filter such as a Laplacian filter, a large number of edges are extracted from the bird's eye view image 112 and the reference image TS1, and edge directions of the extracted edges are detected. An edge means where brightness sharply changes in an image, and an edge direction means a direction of the sharp change in brightness. First to Qth divisions are provided which are different from each other in edge direction, and the extracted edges are each classified into one of the first to Qth divisions (Q is an integer of 2 or more) according to their edge directions.
The difference degree ε3 is calculated for each block position, but here, the calculation method will be described with respect to a target block position on which attention is focused.
Such a difference degree ε3 is obtained for each block position, block positions where the difference degree ε3 is larger than a predetermine positive threshold value are identified, and small blocks at the identified block positions in the bird's eye view image T12 are set as component blocks. Small blocks in the bird's eye view image T12 other than the component blocks are called non-component blocks. And by giving each pixel in the component blocks a pixel value of 1 and giving each pixel in the non-component blocks a pixel value of 0, a differential image as a binarized image is obtained.
[Estimation of Three-Dimensional Object Area]
b) shows the image 403 shown in
However, it is desirable that a three-dimensional object area is finally identified by executing, with each component block regarded as a candidate for a component of a three-dimensional object area, area combining processing for forming a combination area by combining a group of neighboring component blocks and elimination processing for eliminating a component block spacially isolated from other component blocks and a small-sized combination area. For example, it is judged whether or not a component block and another component block (or a combination area) are adjacent to each other, and if they are found to be adjacent to each other, they are combined together to form a new combination area. This processing is repeatedly executed until new combination is not performed any more. Then, sizes of the thus obtained combination areas are checked, and a combination area of a predetermined size or smaller and an uncombined component block are eliminated. A finally remaining combination area or an area (for example, a rectangular area) surrounding the combination area is estimated as a three-dimensional object area on the bird's eye view image TI2. As a result, a three-dimensional object area as indicated by a broken line frame 431 in
Next, example 3 will be described. In Example 3, a description will be given of an example of a functional block diagram of an operation support system corresponding to the practical examples described above.
An image acquisition portion 11 acquires one camera image after another based on an output signal of the camera 1. The image data of each camera image is fed from the image acquisition portion 11 to a movement detection portion (movement vector detection portion) 12 and to a bird's eye conversion portion 13. The movement detection portion 12 executes processing of step S12 and processing of step S13 shown in
<<Modifications and Variations>>
The specific values given in the descriptions above are merely examples, which, needless to say, may be modified to any other values. In connection with the examples described above, modified examples or supplementary explanations applicable to them will be given below in Notes 1 to 5. Unless inconsistent, any part of the contents of these notes may be combined with any other.
[Note 1]
Although a method for obtaining a bird's eye view image from a camera image by perspective projection conversion is described, it is also possible to obtain a bird's eye view image from a camera image, instead, by planar projection conversion. In this case, a homography matrix (planar projection matrix) for converting the coordinates of the individual pixels on a camera image into the coordinates of the individual pixels on a bird's eye view image is determined by camera calibration performed prior to actual use. The homography matrix is determined by a known method. Then, in a case in which the operation shown in
[Note 2]
Although the above examples deal with cases where the camera 1 is installed in a rear part of the vehicle 100 so as to have a field of view rearward of the vehicle 100, it is also possible to install the camera 1, instead, in a front or side part of the vehicle 100 so as to have a field of view frontward or sideward of the vehicle 100. Even with the camera 1 so installed, it is possible to perform processing similar to that described above, including processing for estimating a three-dimensional object area.
[Note 3]
In the embodiments described above, a display image based on a camera image obtained from a single camera is displayed on the display device 3. Instead, it is also possible to install a plurality of cameras (not shown) on the vehicle 100 and generate a display image based on a plurality of camera images obtained from the plurality of cameras. For example, it is possible to fit one or more additional cameras to the vehicle 100 in addition to the camera 1. In this case, it is possible to merge images based on camera images obtained from the additional cameras with an image (for example, the image 440 shown in
[Note 4]
In the embodiments described above, an automobile is dealt with as an example of a vehicle. It is, however, also possible to apply the present invention to vehicles that are not classified into automobiles, and even to mobile objects that are not classified into vehicles. For example, a mobile object that is not classified into vehicles has no wheel and moves by use of a mechanism other than a wheel. For example, it is possible to apply the present invention to, as a mobile object, a robot (unillustrated) that moves around inside a factory by remote control.
[Note 5]
The functions of the image processing device 2 shown in
Number | Date | Country | Kind |
---|---|---|---|
2007-300537 | Nov 2007 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2008/067150 | 9/24/2008 | WO | 00 | 5/18/2010 |