The invention relates to a vehicle based tacking system where a host vehicle is equipped with a camera system which tracks objects of interest such as other vehicles over time.
A fundamental task of camera based Advanced Driver Assistance Systems (ADAS) applications, e.g. Forward Collision Warning (FCW), Pedestrian Detection (PED), or Traffic Sign Recognition (TSR), is to temporally track of objects of interest, such as vehicles or pedestrians.
In this application, the methodology embodies optical tracking which can be regarded as continuous position estimation of an image region (covering the projection) of a particular object of interest in successive camera frames, i.e. images. For efficiency on complexity reasons, this image region is generally assumed to be box (rectangular shaped) shaped.
In general, two tracking concepts are used in ADAS applications. In a first approach is used a tracking-by-detection method where the object of interest position is detected from scratch for every frame based on a pre-trained model capturing the representative semantics of the target object's class. Tracking of particular class instances is then performed via assignment of detections in successive frames. While relatively robust against sudden light condition changes, the concept may suffer from mis-assignments, missing detections and volatile box trajectories, where the latter may render fuzzy objectives such as Time To Contact (TTC) estimation impossible.
An alternative approach is to model the actual appearance of an individual object instance in a template (template tracking) that is to be redetected in every frame as discussed by Carlo Tomasi and Takeo Kanade in the paper “Detection and tracking of point features”, International Journal of Computer Vision, 1991. Knowledge on the actual object appearance compared to information on principle object class semantics in general, allows for a more accurate and dense trajectory determination, giving benefit to TTC estimation task. Approaches differ in complexity of the template model (e.g. constant or highly adaptive as in the paper “Incremental Learning for Robust Visual Tracking” by David Ross, Jongwoo Lim, Ruei-Sung Lin, Ming-Hsuan Yang, from the International Journal of Computer Vision, 2007, as well as the motion model (e.g. 2d shift or affine, e.g. as described in the paper “Lucas-Kanade 20 Years On: A Unifying Framework” by Simon Baker, kin Matthews, International Journal of Computer Vision, 2004. Due to the missing semantic information, this concept suffers from model violations as static models may lose track quite early and dynamic/adaptive models tend to drift away from the original object of interest. Thus, template and motion model need to be chosen appropriate for the respective application, while further a robust and strict self-evaluation is needed in order to detect track loss.
In general, hybrid systems are used to combine advantages of both concepts. Tracking vehicles and bikes from a moving platform for the purpose of intending collision warning imposes multiple requirements on the tracker design: there must be an adequate model for the objects motion dynamics, be sufficiently dynamic to allow for tracking and prediction of characteristic motion, and sufficiently filtered to enable smooth TTC estimation.
The capability of tracking algorithms for TTC estimation is highly dependent on their precision with respect to the estimated the object of interest bounding box, as TTC in principle is derived from the proportion of the bounding box width and the width change over time. While purely detection based approaches tend to result in long-time stable but volatile results even when employing restrictive filters, template matching allows for a subpixel precise estimation of the bounding box position and extent if designed suitable for the designated application.
It is an object of the invention to provide methodology which has improves accuracy and robustness.
Further there should be adequate template representation, and the method should be sufficiently dynamic to cope with small appearance changes and well as sufficiently filtered to minimize drift. There should be stringent self-evaluation to detect drift and track loss, and the method should allow objects to be tracked with bounding box partially out of image.
In one aspect is provided method of tracking an object of interest between temporally successive images taken from a vehicle based camera system comprising:
a) from an initial image (1), determining an initial patch (2) with a respective boundary box (3) encapsulating an identified object of interest;
b) using a search template in Kanade Lucas Tomasi (KLT) methodology to track said object of interest in a temporally successive image from said camera system; so as to determine therein a new patch (11) having a respective new boundary box (12) or portion thereof, with respect to said object of interest, characterised in;
c) performing a check on the robustness of the tracking step in step b) by analysing one or more parameters output from step b).
The method may include after step a) formulating a patch template (7) of fixed size from said initial patch.
In step b) the search template may be the patch template or is derived or amended therefrom.
Said parameters may be one or more of the following; one or more the co-ordinates, height, width, area or scale of the new bounding box.
Step c) may comprise determining the area, height or width of the newly computed boundary box, and determining if this value is less or more than a threshold.
Step c) may comprise determining with respect to the initial bounding box and newly determined boundary box, the absolute or relative change for one or more of the following: one or more co-ordinates of the initial and new boundary boxes; the width, height or area of the boundary boxes.
The method may include determining if said change is higher or lower than a threshold.
Step c) may comprise determining the amount or percentage of the newly determined boundary box which lies outside the image.
The method may include provided an updated patch template based on the newly determined patch.
The method may include providing a updated search template or revising the search template based on the updated patch template.
Step c) may comprise determining the degree of texture or variance within said newly determined patch, said updated patch template, or said new search template.
Step c) may include determining, for one or more pixels of said newly determined patch, updated patch template or updated search template, the gradient between a pixel parameter and that for one or more neighboring pixels.
In step c) may comprises determining the degree of similarity or difference between any of the following: initial patch, search template or initial patch template, and any of the following: newly computed patch, updated patch template or the updated search template.
Determining the degree of similarity or difference may comprise comparing texture or variance.
The method may comprise determining the degree of similarity or difference between said initial patch template or search template with said newly updated patch template.
The method may include determining the number or proportion of pixels of the updated patch template whose pixel parameter difference to the corresponding pixels in said initial patch template or search template, differ more than a predetermined value.
Said pixel parameters of texture or variance may relate to one of the following: brightness; grey scale, hue or saturation.
Further features and advantages will appear more clearly on a reading of the following detailed description of the preferred embodiment, which is given by way of non-limiting example only and with reference to the accompanying drawings.
The present invention is now described by way of example with reference to the accompanying drawings in which:
Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the various described embodiments. However, it will be apparent to one of ordinary skill in the art that the various described embodiments may be practiced without these specific details. In other instances, well-known methods, procedures, components, circuits, and networks have not been described in detail so as not to unnecessarily obscure aspects of the embodiments.
‘One or more’ includes a function being performed by one element, a function being performed by more than one element, e.g., in a distributed fashion, several functions being performed by one element, several functions being performed by several elements, or any combination of the above.
It will also be understood that, although the terms first, second, etc. are, in some instances, used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first contact could be termed a second contact, and, similarly, a second contact could be termed a first contact, without departing from the scope of the various described embodiments. The first contact and the second contact are both contacts, but they are not the same contact.
The terminology used in the description of the various described embodiments herein is for describing embodiments only and is not intended to be limiting. As used in the description of the various described embodiments and the appended claims, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses all possible combinations of one or more of the associated listed items. It will be further understood that the terms “includes,” “including,” “comprises,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
As used herein, the term “if” is, optionally, construed to mean “when” or “upon” or “in response to determining” or “in response to detecting,” depending on the context. Similarly, the phrase “if it is determined” or “if [a stated condition or event] is detected” is, optionally, construed to mean “upon determining” or “in response to determining” or “upon detecting [the stated condition or event]” or “in response to detecting [the stated condition or event],” depending on the context.
A very well-known tracking technique is the Kanade Lucas Tomasi (KLT) tracking method which is described in the first reference paper above.
The invention relates to an improved technique based on e.g. this methodology which provides more robust results. Thus, methodology of the tracking system according to aspects, may be based on the well-known KLT tracker methodology where the methodology is improved to provide increases robustness.
Brief Summary of KLT
Step 1 Initial Patch Extraction
The bounding box 3 at this initial time point t1 within the image has a width wt1 height ht1 and will have in the example, reference coordinates relative to the image frame of Xt1 and Yt1. This reference may be taken from the center of the bounding box or a corner point e.g. the bottom left corner.
The patch 2 bounded by bounding box 3 comprises a number of pixels N1 designated with reference numeral 5. In
A patch template may be formed from the patch. This is described in more detail below in step 2. This can be done by i.e. the patch (the interior of the bounding box) being resampled (e.g. by bilinear interpolation) to a fixed size patch template. (let's call it patch template grid resolution). The patch template may thus be effectively a resized copy of the first patch (thus, in patch template grid resolution) forming the pattern to be redetected in later images. The patch template may be used as the search template for the next image/frame. Alternatively, the search template may be unchanged i.e. the same as before. In another alternative the new search template may be formed from the old search template but updated with patch data; i.e. from the patch template derived from the patch. So the patch template is effectively the patch resized to the same size as the search template, and the new search template would be the old search template refined with the patch template.
Step 2 Forming a Template
The patch which is found and bounded by bounding box can then be converted to a patch template 7 of fixed size—so to provide a new e.g. search template. In the examples the patch template of a fixed size (in this example 31×31 pixel) is thus effectively punched out of the image. Providing the patch template 7 from the patch 2 may be performed using known techniques such as bilinear interpolation, and used to form the patch template 7, which may form the basis of a search template used to search for the object of interest in the next frame i.e. the next image taken by the camera at time point t2, so the object of interest can be tracked. The KLT technique searches in the next image using a search template (e.g. based on the determined previous patch template 7) for a portion of the image that corresponds to the search template.
So in this stage effectively a fixed size patch template 7 is provided from the initial image at time t1 representing, from the image, a specified initial interest region encapsulating the object of interest to be tracked. The converting of the patch 2 within the bounding box 3 to a patch template 7 of fixed size may use known techniques as mentioned.
In aspects, the search template may be updated in all successive frames in case matching was performed successful, as weighted mean of the old search template and the newly determined patch template grabbed from the final matched position in the current frame. This keeps rigid structures stable while smoothing out dynamic parts. Therefore, e.g. intensity (gray value) variance of each patch template pixel is computed and stored for each frame at template update step (template variance, see
Step 3 Tracking
In the next step a successive image is taken; i.e. at a later time t2. Images in such systems are typically taken (or processed at small time intervals). The image at the later time interval is then used along with a search template, which may be based on, or updated from, the patch template 7 determined previously i.e. at the preceding time point, to determine a new patch 12 with respective bounding box 11 which encapsulates the object of interest, which may have moved and thus the bounding box therefore may be shifted in x and y co-ordinates and/or become larger or small depending on whether the object of interest is heading towards or away from the host vehicle.
In other words, effectively the new image at later time point t2 is scanned with the search template at different locations and magnification to get a match where the differences (sum of absolute differences) in the brightness of corresponding pixels is at a minimum. This is the function of the KLT tracking methodology. In this way, the shift in the bounding box from time t1 to time t2 within the taken image can be determined as well as the scaling of the bounding box (or increase in height and width of the bounding box).
So, in other words the search template is compared with the later (next image taken by the camera on the host/ego vehicle) image so as to determine a new patch with respective bounding box. To recap, effectively one could consider the method as such that the search template is moved over the new image at different location relative to the image to determine the best match patch/boundary box size and position such that differential of the parameter e.g. differential of brightness of relevant pixels between the patch (or patch template derived therefrom)/search template and the portion of the image is at a minimum. This may entail converting the newly found patch to a patch template of the same size in terms of pixels as the search template used to find it. The actual matching process, i.e. the estimation of the horizontal and vertical displacement as well as the scale change, may be done by non-linear weighted least squares optimization based on the brightness (or other parameter) constancy constraint. Such a technique may be specifically the KLT technique.
The output parameters of the KLT techniques are thus shown also in
It may be assumed in basic embodiments that the bounding boxes of successive images (frame) are of the same shape—this simplifies the methodology—thus other words the height change between frames (successive time points) is the same as the width change, So where this is the case the change in scale of the bounding boxes may be determined (in addition or alternatively) to the change in width Thus important outputs of the KLT technique are the values of the shift in X and Y co-ordinates referred to as Δx and Δy and shift in scale Δs.
It is to be noted that the portions of the newly computed boundary box may lie outside the image in other words if the tracked object is a vehicle which has moved partially outside the field of view only a portion of the boundary box will match with a portion of the image. This can cause problems which are solved by aspects of the invention which will be described hereinafter.
As mentioned, the search template used may be updated by reformulating it based on the newly found patch i.e. from a patch template derived therefrom. For example, the search template may be updated each time from the last found patch/patch template derived therefrom, or be modified based on two or more patches (patch templates thereof) at different time points e.g. the latest patch/patch template and the previous search template.
So to repeat the definitions: a patch refers to the region of interest tracked is found using a search template. A patch template is template which is derived from the patch and is a fixed size, which would normally be the same size as the search template. The search template is the fixed size template used to search for the object of interest in each image. The search template may use a fixed search template or latest formulated patch template. The search template may be updated based on the previous (old) search template and refined by the latest found patch data (i.e. by the latest patch template).
Methodology According to Examples of the Invention
In aspects of the invention, checks are performed, e.g. between successive images (frames) to determine the robustness of the results and to determine whether the tracking results (e.g. between one frame/time point and the next) should be disregarded or flagged as possibly inaccurate. Thus one could consider aspects of the invention to provide a “sanity” check to results obtained in between the KLT cycle. These checks use e.g. parameters which are output from the KLT method.
For example there may be losses and anomalies caused by e.g. drift or temporary occlusions (e.g. by wipers), so in order to yield robustness of the method to detect whether the tracking is realistic, robustness checking steps may be performed on the tracking result of each frame for self-evaluation.
According to examples, robustness checks may use data computed already which is output from e.g. the KLT methodology with respect to the parameters of the initial and subsequent bounding boxes (at times t1 and t2 for example) and parameters of the pixels in the relevant patches of the bounding boxes. In other words a robustness check is performed using as input data, parameters provided by the KLT methodology, in particular changes in parameters of the initial bounding box 3 and the subsequent bounding box 12.
In summary as mentioned these may be shift in x and/or y position of the bounding boxes and/or scale changes. Alternatively, or additionally other parameters or data (e.g. computed at various stages of the KLT method) may be used for the robustness check, such as data pertaining to the pixels within the determined initial and subsequent bounding boxes (patches themselves). Additionally, the check may comprise comparing the newly found patch (which would generally be from the patch template derived therefrom) and the search template which was used to search for it. So, in other words the method may incorporate one or more the following steps, based on estimated bounding box: the structure of the found patch or patch template derived therefrom extracted at the final estimated track pose in current frame.
The following are examples of the robustness check which uses data computed from the KLT algorithm.
a) Tracks (i.e. the tracking result) may be discard tracks which result in too small bounding boxes. So, where the newly computed bounding box 12 of found patch is too small, the track (tracking step) may be discarded. Thus, the absolute size of the bounding box or the size relative to the image may be compared with a threshold and if lower, the track disregarded i.e. ignored. So, the check involve determining the absolute or relative size of new boundary box
b) Tracks with unusual (e.g. too large) changes of the bounding box width/height/area between successive images may be discarded (rejected). Here the changes in the absolute or relative width/height or area (i.e. scale) of bounding boxes are computed. Where the change exceeds a threshold the tracks may be disregarded. Preferably relative changes of bounding box parameters between successive time points are determined—thus the ratio of the width, height or area of bounding box is computed with respect to successive images between the two respective time-points. If the ratio therebetween is greater than or smaller than a threshold the tracks may be disregarded. Thus, for example the value of Δs output from the KLT method may be analyzed as an absolute or relative value and the track disregarded if this value exceeds a certain threshold or is less than a certain threshold.
c) In some cases the result of the KLT process will match only a portion of the search template to a patch in the new image, the rest of the newly computed boundary box with respect to the object of interest may lie outside the image. If a proportion of the area in the newly computed boundary box 12 lies outside the image, the track may be disregarded. So, the amount or percentage of the boundary box 12 which lies outside the image may be determined and if greater than a threshold, the track may be disregarded.
d) In one aspect there may be a check for sufficient texture in the the found patch, patch template derived therefrom revised search template (derived from this patch template). As mentioned the search template may be updated by reformulating it from the patch template 7 from the newly found patch 11 so for example the search template may be updated each time from the last found patch 2/patch template thereof. A check can be made on the newly found patch/the patch template derived therefrom or the new (refined) search template (based on the old search template and newly determined patch template) to see if there is sufficient contrast therein, in other words if there is sufficient variation in the level of the parameter (e.g. brightness of pixels within the patch/templates). For example if the object of interest is found based on the rear of a white plain van or lorry without a logo then there is little contrast in patches falling within the border of the edge of the back of the lorry without a darker border region outside the rear of the lorry. If the newly found patch or (revised template derived therefrom) is just a white rectangle with no contrast e.g. no darker border regions, it is not known how much of this area covers the back of the van; i.e. how close the van is to the host vehicle—thus in such circumstances it may be preferable to discard the track. Thus, contrast checks such as gradients checks may be performed on the new patch/patch template derived therefrom or updated search template derived from the computed patch template. This may comprise for example checking if there is sufficient number or proportion of patch/patch template or revised search template pixels with sufficiently large gradient between it and a neighboring pixel, and if not disregarding the track. The term “determining contrast of a patch or template” should thus be interpreted as including such methodology.
With respect to this, there may be a check to see if there are sufficient detectables in the newly found patch or patch template derived therefrom, or indeed any revised template based on the old template and new template. The variance is computed over the template (as it is updated for every new image). The new template to old template difference may then be checked to pixelwise be lower than the old template variance at the specific pixel position. This will be explained in more detail hereinafter.
This check may determine the track is lost (to be discarded if) there are more than τ lost pixels or a proportion of lost pixels in the patch or, preferably in the patch template derived therefrom. Here there is a notion of “detectables” which are denotes pixels with absolute image gradient
exceeding a certain threshold. Overall low gradient will impede successful patch/patch template matching.
e) There may be a check for sufficient similarity between the newly found patch 11 or patch template derived therefrom, or the revised search template, to the (old) search template used to find the patch. Preferably this is done by comparing search template used to find the patch, to the patch template derived from the patch, as the sizes of these templates are the same (same number of pixels). Thus, here may be a check for sufficient number of patch template pixels whose gray value difference to its search template counterparts by e.g. determining if the comparison satisfies a 2-sigma check based on the standard deviation imposed by the template variance. In one example the basic idea of the test is to keep track of the local per-pixel gray value variance σ2(u; v) of the final (most recent) patch template Pt grabbed based on the final estimated pose parameters ŝt, {circumflex over (x)}t, ŷt over time and perform a 2−σ (95%-quantile) check for the local patch template-to-search template-difference;
|Tt(u,v)−{circumflex over (P)}t(u,v)|<max(2σ(u,v),σmin). (11)
A αmin may be used to create a minimum parameter for the test to consider the low sample size and model non-Gaussian parts. Note that the similarity check may only performed for pixels marked as detectables, as in general changes in the high gradient object outline rather than in homogeneous patch template regions, such as road pixels, indicate the redetection of an object. The term determining the similarity between patch template and search template should be interpreted as including such methodology. This therefore include the notion of determining the differences between a parameter (e.g. brightness) of equivalent/corresponding regions of the newly found patch/template derived therefrom and the search template used to find the patch. This would involve looking e.g. at the differences in the appropriate parameter (e.g. brightness/grey values) between corresponding pixels of the patch template derived from the found patch and the search template (used to search and find the patch).
Where such techniques are used in advanced methodology, the individual patch/patch template/search template pixel weights may possess higher values for center pixels and decrease towards the patch/patch template borders in order to favor object structures in the major interest region and decrease the influence of background structures visible in the bounding box. The weights may be further employed to allow for tracking of objects partially out of image.
Patch/patch template pixels exceeding the image borders may be filled by the respective template pixel value and added to the adjustment with a diminished weight in order to keep the adjustment stable despite the reduced number of observations.
For further stabilization, the scale parameter update may be damped assuming small changes via a Bayesian prior.
A Kalman filter may be used to predict an initial position and scale estimate in every frame based on the previous result. The filter model is preferably specifically designed to support TTC estimation capabilities. The vertical object position may be simply low pass filtered to allow for the highly dynamic changes caused by ego pitch movement. Horizontal position and box width are filtered using a horizontal planar world model assuming constant acceleration in longitudinal and lateral direction. The world model is connected to the image measurements via pinhole camera model assumption. The unknown object world width is artificially set to one. Although this results in an incorrectly scaled world system, it doesn't affect TTC estimate as the scale vanishes when computing the ratio of widths and width change. As a further novelty, lateral and longitudinal updates are decoupled by canceling the mixed terms in the state transition matrix to minimize the influence of lateral errors on the TTC relevant longitudinal estimate.
As the actual world scale is unknown we refrain from compensating the ego velocity. This however is of minor harm as TTC computation only requires relative speed information. Ego yaw rate, on the other hand, is compensated by computing the resulting individual pixel movement of the box center position employing pinhole model. This yaw rate displacement is deducted from the total observed displacement when updating the Kalman filter state.
Methodology of KLT with Robustness Check
Tracking Assumption
The principle idea of a sKLT (scale-KLT) tracking procedure is based on the object tracking approach of Kanade Lucas Tomasi (KLT). A box shaped object is to be tracked over an image sequence, It, assuming the visual appearance of the tracked object to stay constant over time. More specifically, for every time step t we assume that when resampling the current object bounding box content to a fixed size U×V-patch template Pt (by bilinear interpolation), this template will match a constant object (search) template T pixelwise. Thus,
T(u,v)Pt(u,v)=It(xt+stu,yt+stv) (1)
holds for all pixels (u; v) of the patch, where (xt; yt) denotes the current bounding box position and st the box scale with respect to the image coordinate system.
Linear Tracking Model (Functional Model)
The estimation of the unknown pose parameters relating to the bounding box st; xt; yt is performed by solving the non-linear optimization task (1) in a least squares sense. First order Taylor expansion of the right side of the equation yields the linearized model
with the approximate pose values (predictions) st(0); xt(0); yt(0) the updates to Δs Δx Δy and the gradient images. These parameters may be used for the checks for robustness of the track.
Using the difference image ΔT(u,v)t(0)=T(u,v)−It(xt(0)+st(0)u,yt(0)+st(0)v) the final linearized model follows from stacking all template pixels in a common equation system
where the short notation comprises the vectorized difference image ΔTi(0) the Jacobian A(0) and the parameter update vector Δp.
Iterative Parameter Estimation
Adjustment
We search for the parameters optimizing (1) by iteratively solving (6) in a weighted least squares sense, i.e.
yielding the new parameter state p(v+1)=p(v)+Δp(v+1) for every iteration. The diagonal weighting matrix W=diag ([: : : ; wu; v; : : :] T) induces prior knowledge assuming the tracked object to be located in the center of the box, thus down-weighting the pixels from the center to the box border. Note that due to the diagonal structure of W the elements of the normal equation matrix N and the right-side h can be computed by
where au,v(v), denotes the row of the Jacobian A(v) corresponding to pixel (u; v).
Out-of-Image Handling
Patch template pixels (u; v) beyond the image borders are set to the respective pixel value of the (e.g. search) template position T(u,v) and added to the normal equation system (8) with a diminished weight w*u;v=0:01wu;v to improve adjustment stability.
Scale Parameter Damping and Inversion Stabilizing
In order to stabilize the inversion in (7) we add a small value to the diagonal of the normal equation matrix N(v)+=d3 reducing the occurrence of rank defects.
Further, we assume the scale change to be small, i.e. stst−1. The introduction of this prior in a Bayesian way is easily by damping the parameter update Δs via adding a constant damping weight ws to the respective diagonal position of N(v) In combination, the actual utilized normal equation matrix for (7) reads as
The parameter w is adapted to compete with the terms in (6) due to the number of observations M=UV and their contribution which is bound to the number of gray values K (resulting from the image gradient dependency of the Jacobian). Thus, the parameter is defined by ws=αsMK leaving a damping adjustment parameter chosen around the value αs=1.
Robustness Check
1. As mentioned in aspects the KLT methodology is integrated with a robustness check. The robustness check may be designed to discard/ignore the track or allow it dependent on the results of the robustness check. Alternatively, the robustness check may divide the outcome into three or more categories: a three class evaluation of the matching success (robustness check) may classifies the results into valid; bad; lost, whereas the intermediate class bad denotes poor but not totally failed matches.
The test yields the result lost if any of the following condition as are met: bounding box is too small (bad sampling density); scale change is too high (we expect small scale changes); too many patch (or patch template derived therefrom) pixels are out of image; insufficient detectables in patch/patch template, insufficient detectables in the patch/patch template. Similarity check fails for more than τ_lost pixels. Otherwise the result bad if similarity check fails for more than a defined number τ_bad pixels—otherwise the result valid.
2. Examples of Detectables for Robustness Check
The term detectables denotes pixels with absolute image gradient
exceeding a certain threshold. Overall low gradient will impede successful patch/patch template matching.
3. Example of Robustness Check: Similarity Check
The basic idea of the test is to keep track of the local per-pixel gray value variance σ2(u; v) of the final patch template {circumflex over (P)}t grabbed based on the final estimated pose parameters ŝt, {circumflex over (x)}t, ŷt over time and perform a 2−σ (95%-quantile) check for the local patch template-to-old (e.g. search) template-difference;
|Tt(u,v)−{circumflex over (P)}t(u,v)|<max(2σ(u,v),σmin). (11)
We use σmin to create a minimum socket for the test to consider the low sample size and model non-Gaussian parts. Note that the similarity check is only performed for pixels marked as detectable, as in general changes in the high gradient object outline rather than in homogeneous patch/patch template regions, such as road pixels, indicate the redetection of an object.
Template Adaption
As the appearance of the tracked object may vary over time due to lighting conditions, distance etc., the new search template T is adapted over time by means of low pass filtering. In case the sanity check passes the new template Tt+1 is interpolated from the old search template Tt and the patch template grabbed based on the final estimated pose parameters ŝt, {circumflex over (x)}t, ŷt to
Tt+1=γTt+(1−γ)It({circumflex over (x)}t+ŝtu,ŷt+ŝtv). (12)
where the adaption parameter γε[0; 1] controls the adaption rate.
Template Tracking
We utilize a Kalman filter to stabilize the tracking result based on a filter model designed to support TTC estimation.
Filter Model
Assume the following notation given for bounding box lateral horizontal position x, bounding box width w, object 3d lateral position in camera system {tilde over (X)} and longitudinal position {tilde over (Z)}, object 3-d width {tilde over (W)} and focal length f, represented in the rectified normal case of the image geometry such that the equation holds below
Since the actual object width {tilde over (W)} is unknown and 3d coordinates are not of direct interest for us in this application we define W=1 and perform all work in the resulting scaled system—see below
System Model
With the time difference Δt=tnew−told the filter model reads as
Using the substitutions {tilde over (X)} and {tilde over (Z)}, the Jacobian with respect to the parameters is given by
Note that setting the mixed terms in the last row to zero decouples x and w appears to yield more stable result especially with respect to TTC estimation as lateral influence is suppressed.
Observation Model
The presented formulation of the system model allows for an quite simple integration of the observations wobs; xobs achieved by template tracking
w=wobs (22)
x=xobs, (23)
with the Jacobian
Filtering of Vertical Image Position
The vertical object position is simply low pass filtered to allow for the highly frequent changes caused by ego pitch movement.
Egomotion Compensation
As the actual world scale is unknown we refrain from compensating the ego velocity. This however is of minor harm as TTC computation only requires relative speed information. Ego yaw rate β is compensated by computing the resulting individual pixel movement Δxego of the box center position x employing the pinhole model,
The displacement Δxego is deducted from the total observed displacement when updating the filter parameters.
TTC Estimation
Constant Velocity TTC
Assuming constant velocity (Z″=0) TTC estimation is based on the formula
From substitution of Z using relation (13) and with the derivative
the unknown width {tilde over (W)} vanishes and the TTC can be entirely represented in terms of the Kalman state parameters
Constant Acceleration TTC
Constant acceleration assumption induces a slightly more complicated procedure. It demands solving
For tTTC
Assuming {tilde over (Z)}″≠0 allows for reshaping to
whereas the transformation follows from canceling out the unknown object width {tilde over (W)}. The solution results from applying pq-Formula
and can again be represented in terms of the Kalman state parameters using relation 13.
From both solution candidates, choose the smaller non-negative value if exists.
Distance to Front Bumper Integration
TTC computation described in the previous sections yields the time to contact of object to camera. This embodies a good approximation to vehicle contact time in case the distance of camera to front bumper is small (such as for at nose trucks). Otherwise the bumper to camera distance induces a distinct bias especially if when the distance to the object is small. To correct this bias, the actual width of the object {tilde over (W)} needs to be known to transfer front-bumper-to-camera distance dbump into the Kalman filter 3d space.
For constant velocity assumption the bias correction is given by
For constant acceleration assumption the set must be considered in the square root part of (35)
While this invention has been described in terms of the preferred embodiments thereof, it is not intended to be so limited, but rather only to the extent set forth in the claims that follow.
Number | Date | Country | Kind |
---|---|---|---|
17182718 | Jul 2017 | EP | regional |
Number | Name | Date | Kind |
---|---|---|---|
9251708 | Rosenbaum et al. | Feb 2016 | B2 |
10019805 | Robinson | Jul 2018 | B1 |
10275668 | Nuernberger et al. | Apr 2019 | B1 |
11093762 | Siegemund et al. | Aug 2021 | B2 |
20080002916 | Vincent | Jan 2008 | A1 |
20100289632 | Seder | Nov 2010 | A1 |
20120213412 | Murashita | Aug 2012 | A1 |
20150131858 | Nakayama | May 2015 | A1 |
20150269733 | Kosaki | Sep 2015 | A1 |
20150310624 | Bulan | Oct 2015 | A1 |
20170188052 | Chu | Jun 2017 | A1 |
20190234746 | Zhang | Aug 2019 | A1 |
20190362163 | Siegemund et al. | Nov 2019 | A1 |
Number | Date | Country |
---|---|---|
101147159 | Mar 2008 | CN |
101308607 | Nov 2008 | CN |
101635057 | Jan 2010 | CN |
104700415 | Jun 2015 | CN |
105469427 | Apr 2016 | CN |
102016218853 | Mar 2018 | DE |
2993654 | Mar 2016 | EP |
Entry |
---|
Sun Wei et al: “Learning based particle filtering object tracking for visible-light systems”, May 15, 2015, pp. 1830-1837. |
Xue Mei et al: Integrated Detection, Tracking and Recognition for IR Video-Based Vehicle Classification:, Jan. 1, 2006, pp. V-745-V-748. |
Spampinato C et al: “Evaluation of tracking algorithm performance without ground-truth data”, pp. 1345-1348, Sep. 30, 2012. |
Lebeda Karel et al: “Long-Term Tracking through Failure Cases”, Dec. 2, 2013, pp. 153-160. |
“Foreign Office Action”, EP Application No. 17182718.1, dated Oct. 29, 2020, 7 pages. |
“Non-Final Office Action”, U.S. Appl. No. 16/406,356, filed Oct. 20, 2020, 6 pages. |
Baker, et al., “Lucas-Kanade 20 Years On: A Unifying Framework”, Mar. 2004, 54 pages. |
Barth, et al., “Estimating the Driving State of Oncoming Vehicles From a Moving Platform Using Stereo Vision”, Dec. 2009, pp. 560-571. |
Geiger, “Tracking by Detection”, Retrieved at: http://www.cvlibs.net/software/trackbydet/, on Jan. 8, 2021, 2 pages. |
Hayword, “Near-Miss Determination Through Use of a Scale of Danger”, Jan. 1972, pp. 24-35. |
Itay, et al., “A Monocular Vision Advance Warning System for the Automotive Aftermarket SAE”, Sep. 29, 2004, 8 pages. |
Kim, et al., “Vision-based vehicle detection and inter-vehicle distance estimation”, Oct. 2012, pp. 388-393. |
Lee, “Boosted Classifier for Car Detection”, Jan. 2007, 4 pages. |
Ponsa, et al., “Cascade of Classifiers for Vehicle Detection”, Aug. 2007, pp. 980-989. |
Pyo, et al., “Front Collision Warning based on Vehicle Detection using CNN”, International SoC Design Conference, 2016, Oct. 2016, Need Copy. |
Ross, et al., “Incremental Learning for Robust Visual Tracking”, May 2008, 8 pages. |
Shrinival, et al., “Time to Collision and Collision Risk Estimation from Local Scale and Motion”, Sep. 2011, pp. 728-737. |
Siegemund, “Scale-KLT Vehicle Tracker”, Apr. 27, 2016, 8 pages. |
Sun, et al., “Monocular Pre-crash Vehicle Detection: Features and Classifiers”, Jun. 2006, 14 pages. |
Sun, et al., “On-Road Vehicle Detection: A Review”, May 2006, pp. 694-711. |
Tomasi, “Detection and Tracking of Point Features”, Apr. 1991, 22 pages. |
“Foreign Office Action”, EP Application No. 18174495.4, dated May 27, 2021, 6 pages. |
“Foreign Office Action”, CN Application No. 201810811187.1, dated May 28, 2021, 24 pages. |
“Notice of Allowance”, U.S. Appl. No. 16/406,356, filed Apr. 19, 2021, 9 pages. |
“Foreign Office Action”, CN Application No. 201810811187.1, dated Jan. 30, 2022, 17 pages. |
Number | Date | Country | |
---|---|---|---|
20190026905 A1 | Jan 2019 | US |