The present invention is directed to a system and method for detecting a passing vehicle, and more particularly, to a system and method for detecting a passing vehicle from dynamic background using robust information fusion.
Machine-vision-based obstacle detection and tracking is an important component in autonomous vehicle systems. In a typical driving scene, the obstacles include vehicles, pedestrians and any other objects that are either moving or rising above the road plane. The purpose of obstacle detection is to separate moving objects from the driving scene, the obstacles including vehicles, pedestrians and any other objects that are either moving or rising above the road plane. Such information is required by a number of automotive applications, e.g., adaptive cruise control, forward collision avoidance and lane departure warning. By fusing the results of detecting and tracking individual objects, it is possible to achieve sufficient perception of the driving environment.
In a monocular vision system designed for driver assistance, a single camera is mounted inside the ego-vehicle to capture image sequence of forward road scenes. Various vehicle detection methods have been developed to detect vehicles in the central field of the view. Such methods can be used in passing vehicle detection. In passing vehicle detection, vehicles that are passing the ego-vehicle upon the left or right and entering the field of view at a higher speed are detected. Passing vehicle detection plays a substantial role in understanding the driving environment. Because of the potentially unsafe driving situation that an overtaking vehicle could create, it is important to monitor and detect vehicles passing by.
Since passing vehicles need to be detected earlier on while they are entering the view and only partially visible, appearance information cannot be completely relied upon. Instead, characteristic optical flows are generated by a vehicle passing by. Hence, motion information becomes an important cue in detecting passing vehicles. Several known obstacle detection methods using optical flow have been used to detect passing vehicles. In these methods, a predicted flow field calculated from camera parameters and vehicle velocity is compared with the actual image flows calculated from motion estimation. An obstacle is declared if the actual flows do not match the predicted flows. These methods work well if neither strong noise nor illumination change is present. However, in practical situations, structured noise and strong illumination are quite common which cause spurious image features and unreliable flow estimates. There is a need for a method for detecting passing vehicles which is capable of robust motion estimation.
The present invention is directed to a system and method for detecting a passing vehicle. A video sequence comprising a plurality of image frames is received. Image intensity is measured and image motion is estimated in each image frame. A hypothesis model describing background dynamics is formulated. The measured image intensity and motion estimation is used to determine if the background dynamics has been violated in a given image frame. If the background dynamics has been violated, motion coherency is used to determine whether the violation of the background dynamics is caused by a passing vehicle.
Preferred embodiments of the present invention will be described below in more detail, wherein like reference numerals indicate like elements, with reference to the accompanying drawings:
The present invention is directed to a system and method for detecting passing vehicles from dynamic background.
In addition, a coherency test is performed to substantiate that the scene change is due to a passing vehicle and not some other condition such as noise, illumination changes or other background movement. Once a passing vehicle is identified at a sufficient confidence level, the vehicle is identified via an output device 106. The output device 106 provides an output signal which communicates to the user the presence of the passing vehicle. The output signal may be an audible signal or other type of warning signal. The output device 106 may also include a display for viewing the detected vehicles. The display provides a view of the images taken by the camera 102 which are then enhanced to indicate vehicles that have been detected and which are being tracked. These images can be stored in database 108.
The present invention is directed to detecting events of a vehicle entering and triggering warning in real time. In particular, a robust motion estimation scheme using variable bandwidth density fusion is used to detect passing vehicles as will be described in further detail hereinafter. Vehicle passing is a sporadic event that changes the scene configuration from time to time.
In accordance with the present invention, three issues are addressed in order to detect a passing vehicle: modeling the dynamics of the road scene and vehicle passing, deriving a decision rule for passing vehicle detection and estimating relevant features and statistical quantities involved in hypothesis testing. A high level flow diagram of the method is illustrated in
In the absence of passing vehicles, the visible road scene, i.e., the background, is moving consistently in the field of view as the camera is moving along with the ego-vehicle. Given the vehicle velocity and camera calibration, the image motion and image intensity of the background scene is predictable over time. In other words, the background scene follows a dynamic model defined by camera parameters and camera motion. Denote the image intensity at time instance t by I(x, t) and the motion vector v(x,t), where x is the spatial coordinate of an image pixel. The hypothesis of the dynamic background is described as follows:
Hroad: I(x+v(x, t)·δt, t −δ t)=I(x, t)+n (1)
v(x, t)=h(x, V0(t), θ)
The true image motion v(x, t) is decided by the vehicle speed V0 (t) and camera parameters θ. Under the brightness constancy condition, image intensity is predictable from previous frames given the true motion. Nevertheless, brightness constancy is frequently violated in practice due to changing illumination. In addition, intensity is also affected by various image noise. Therefore, a noise term n, is adopted to account for the perturbation on intensity. These hypotheses on scene dynamics improve useful domain-specific constraints.
When a passing vehicle enters the view, the dynamics of the background scene is violated. From equation (1), violations of background dynamics can be detected through hypothesis testing on image intensity and image motion. However, a violation can happen under conditions other than vehicle passing, such as strong illumination changes and structured noise. To validate that a violation is indeed caused by a passing vehicle, it is necessary to exploit the domain-specific constraints introduced by passing vehicles as well.
Considering the diversity of vehicle appearance and velocity, the dynamics of passing vehicles are characterized by underlining the coherency present in vehicle motion. As illustrated in
In contrast, such coherency is lacking in the case where the violation of scene or background dynamics is a consequence of irregular causes such as sudden illumination changes, structured noise and shadows. Therefore, the hypothesis made on passing vehicles helps to further distinguish events with coherent motion from irregular causes, also referred to as outliers. Referring now to
The event of a vehicle passing is described as a series of state transitions of SASB starting with RR and ending with VV. As shown in
Hvehicle: ρ={RR→VR→ . . . →VV). (2)
In dealing with passing vehicle detection, different contexts are encountered in the analysis window, e.g., road scenes, outliers and vehicles. Decision trees classify these contexts by sorting through a series of hypothesis testing represented in a tree form.
(background dynamics is violated) {circumflex over ( )} (coherency is satisfied) (3)
The true motion v(x, t) of the road scene is given in equation (1). If the observed image motion is estimated, then the hypothesis testing on background dynamics is expressed as follows:
Although further testing is performed to classify instances of violations, it is important to have reliable motion estimation v(x, t) that faithfully reflects the context for an accurate initial classification. The present invention employs a robust motion estimation algorithm using Variable Bandwidth Density Fusion (VBDF) and spatial-temporal filtering.
When motion estimation is not reliable, the residual test helps to identify background scenes. In some instances, the presence of background can not be identified by motion but can be easily identified by testing the image residual. The thresholds τmotion, τresidual as well as the admissible state transitions ρ are part of the decision tree solution. There are generally two ways to solve them, offline learning and online learning. Online decision tree learning enables system adaptation to the gradual change of scene dynamics. Take τresidual as an example, online learning can be achieved by modeling the residual data {R(x, T), R(x, T−1), R(x, T−2), . . . } computed online. Nonparametric density estimation and mode finding techniques can be used to cluster the data, obtain a Gaussian mixture model and update the model over time. The mixture model learned online is then used to predict the context from new observations R(x, T+1).
The coherency test is performed on instances where background dynamics is violated. The purpose of this test is to further rule out outliers caused by structured noise and sudden illumination changes. From the hypothesis formulated on passing vehicles in Equation (2), the decision rule is expresses as:
As described above, the present invention uses a robust motion estimation algorithm to determine whether the background dynamics are violated. If brightness constancy is assumed, the motion vector for a given image location is computed by solving the linear equation
∇xI(x, t)·v=−∇tI(x, t) (6)
The biased least squares solution is as follows:
{circumflex over (v)}=(ATA+βI)−1ATb (7)
where A is a matrix defined by the spatial image gradients ∇xI in a local region, and b is a vector comprised of temporal image gradients ∇xI. To describe the uncertainty of the motion estimation, its covariance is defined as:
where N is the number of pixels in the local region and {circumflex over (σ)}2 is the estimated variance of image noise. Unreliable flow estimates are associated with covariance matrices with a large trace. This information is important for robust fusion.
A multiscale hierarchical framework of computing {circumflex over (v)} and its covariance C is described in D. Comaniciu, “Nonparametric information fusion for motion estimation”, CVPR 2003, Vol. 1, pp. 59-66 which is incorporated by reference. For every image frame, the initial motion vector is estimated at different spatial locations inside the analysis window. As a result, a sequence of motion estimates with covariances {vx,t, Cx,t} in space and in time are obtained.
The initial motion estimates are sensitive to structured noise and illumination changes which introduce outliers in motion estimates. To overcome these outliers, joint spatial-temporal filtering is performed on the initial motion estimates through a technique called Variable Bandwidth Density Fusion (VBDF). VBDF is a fusion technique that is able to locate the most significant node of the data and this is robust against outliers. Given the initial motion estimates vx,t and covariance Cx,t across multiple spatial and temporal locations x={x1, . . . , xn}, t={T, T−1, . . . , T−M}, VBDF is applied to obtain the dominant motion in the analysis window of the T-th frame.
VBDF is implemented through the following mean shift procedure. First a pointwise density estimator is defined by a mixture function:
Here ax,t defines a weighting scheme on the data set, and K(v; vi, Ci) is the Gaussian kernel with center vi and bandwidth Ci. The variable bandwidth mean shift vector at location v is given by:
The iterative computation of the mean shift vector recovers a trajectory starting from v and converging to a local maximum, i.e., a mode of the density estimate f(v; {vx,t}).
v0=v
vj+1=vj+m(vj) (j≧0)
vj→mode(v; {x,t, Cx,t{) as j→∝ (11)
To treat f(v; {vx,t, Cx,t}) with multiple modes, a series of analysis bandwidths Cx,t1=Cx,t+αlI (α0>α1> . . . >0) which leads to multiple smoothed density estimates f(v; {vx,t, Cx,t1}). The number of modes in the density estimate decreases as larger analysis bandwidths is adopted. At the initial scale, α0 is set large such that the density f(v; {vx,t, Cx,t0}) has only one mode mode0=mode (v; {vx,t, Cx,t0}) which is invariant to the starting point v in VBDF. The mode point is then propagated across scales. At each scale, VBDF uses the mode point found from the last scale as the initial point to locate the mode for the current scale.
mode1=mode(model-1; {x,t, Chd x,t1}) (I=1,2, . . . ) (12)
The mode point will converge to the most significant sample estimate as αj decreases. The convergent point defines the dominant motion {circumflex over (v)}t, inside the analysis window of frame T.
Having described embodiments for a method for detecting passing vehicles, it is noted that modifications and variations can be made by persons skilled in the art in light of the above teachings. It is therefore to be understood that changes may be made in the particular embodiments of the invention disclosed which are within the scope and spirit of the invention as defined by the appended claims. Having thus described the invention with the details and particularity required by the patent laws, what is claimed and desired protected by Letters Patent is set forth in the appended claims.
This application claims the benefit of U.S. Provisional Application Ser. No. 60/545,781, filed on Feb. 19, 2004, which is incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
60545781 | Feb 2004 | US |