This application claims priority to European Patent Application Number 21163577.6, filed Mar. 19, 2021, the disclosure of which is hereby incorporated by reference in its entirety herein.
Kalman filters are conventionally used in perception systems to provide, by predictions, stable (linear) tracking of objects across scenes based on observations from one or more sources. To this end, Kalman filters are implemented by an algorithm programmed in accordance with an internal model. In order to balance their internal model predictions with external observations, Kalman filters require estimates of the observation noise of each of the sources. These sources are typically constituted by sensors, such as object detectors, connected to the Kalman filter like peripheral devices. Large noise estimations may result in the tracker giving more credence to predictions issued from its own internal state, while low noise estimations may give more credence to the observations.
Noise estimations are usually the result of tuning and experimentation and setting them is considered a form of “black art.” Trackers based on Kalman filters must be manually tuned by experts, a process which is labor-intensive and error-prone. The tuning process must be repeated whenever the sensor or object detector are modified, further slowing down the development cycle and increasing the risk of introducing errors. Modern vehicles mount multiple sensors of each type, so the process must often be repeated for each of these.
Modern machine-learning-based object detectors often report a confidence value with each detected object, which correlates with how likely the detection is to be correct/accurate. These confidence values can be used by the tracker to give higher or lower credence to each detection, but there is currently no straightforward way to do this. Furthermore, while confidence values might be used to filter or reweight the outputs of a single detector, they are even more difficult to apply when combining the outputs of multiple detectors. This is because confidences reported by different object detectors do not necessarily lie in the same ranges or associate the same magnitude of errors to each value. Making confidences of multiple object detectors thus requires specifically designing the detectors to report comparable confidence value, reducing reusability of existing algorithms.
Accordingly, there may be a need for improving existing detectors in order to at least partially overcome the aforementioned issues and drawbacks, especially to simplify the calibration process of trackers and to improve reliability of object observations, especially when several sensors are used with the tracker.
The present disclosure relates to the field of trackers which are typically used for pattern recognition, e.g., in the automotive sector, and to each of which at least one object detector or sensor is connected for providing noisy observation data. The tracker of the present disclosure is provided with a Kalman filter to improve tracking of a time-varying object across a scene. Based on such equipment, the present disclosure makes use of high-quality reference data in order to automatically collect error or noise statistics of at least one sensor and use the statistics to calibrate the Kalman filter of the tracker. More specifically, the present disclosure relates to a method for automatically determining and using such noise statistics, as well as a non-transitory computer-readable medium and a tracker for implementing the method.
To address the above concerns with existing detectors, the present disclosure suggests an automatic way to derive noise calibration for Kalman-based object detectors. More specifically, it suggests a method for automatically determining and using noise statistics of noisy observation data provided by at least one sensor connected to a tracker, in order to improve tracking of a time-varying object across a scene using a Kalman algorithm. The Kalman algorithm allows to get an estimation of dynamic parameters of the object and providing a precision of the estimation from the noisy observation data and from a corrected previous estimation. The aforementioned method comprises receiving reference data from a data source external to the Kalman algorithm and connected to the tracker, deriving, from said reference data, said noise statistics reflecting errors made by the sensor, and using said noise statistics as setting parameters of the Kalman algorithm.
The present disclosure allows for more accurate tracking of objects and improving tracker performances. In addition, it further allows for fusing multiple detectors as ‘black boxes’ without assumptions on their internal function if the tracker comprises several sensors, i.e., several detectors. In this manner it can be possible to allow confidence-based sensor fusion for off-the-shelf software components.
In one embodiment, the reference data relates to the scene from which the noisy observation data has been got through the sensor, and wherein deriving the noise statistics from the reference data is obtained by comparing said reference data with the noisy observation data provided by the sensor.
Preferably, reference data comprises annotated observation data received from the data source, and the reference data is used to calibrate the tracker.
According to one embodiment, noise statistics comprise a precision of the sensor which defines the probability that the noisy observation data is correct.
Preferably, the precision of the sensor is used to determine whether the related noisy observation data provided by said sensor is to be considered by the tracker for tracking said object or is to be discarded.
In one embodiment, the noisy observation data relates to a plurality of measured variables of the object from which at least one pair of measured variables is identifiable and said noise statistics comprises at least one of an expected variance of each of the measured variables, and an expected covariance between each pair of measured variables.
Preferably, the measured variables relate to at least one of dimensional data of the object, positioning data of the object within the scene, distance data between the object and the sensor, an absolute or relative speed data of the object, and an absolute or relative acceleration data of the object.
According to a preferred embodiment, the expected covariance between each pair of measured variables is provided in a covariance matrix or vector form and is used as an input of the Kalman algorithm for defining an observation noise of the noisy observation data.
In one embodiment, the sensor reports a confidence value which is in relationship with a probability that noisy observation data is correct or accurate, and the method further comprises using said confidence value as weighting parameter applied to said noisy observation data via an observation noise further inputted in the Kalman algorithm.
In a further embodiment, the sensor reports a confidence value which is in relationship with a probability that noisy observation data is correct or accurate, and the method further comprises comparing said confidence value to a plurality of predetermined confidence intervals to each of which an error parameter is associated and including in said noise statistics the error parameter associated to the confidence interval which comprises the confidence value.
Preferably, the error parameter is determined during a preprocessing phase aiming to run the sensor over a set of reference data to collect a plurality of confidence values, separate said reference data into subsets depending on said confidence values, compare, for each subset, the noisy observation data provided by the sensor with the related reference data, and compute an error covariance and the precision of the sensor as data forming said error parameter.
According to one embodiment, the sensor reports a confidence value quantifying a probability that noisy observation data is correct or accurate, and the method further comprises using a regression model, which establishes a relationship between an error and the confidence value for determining the error corresponding to the reported confidence value, including in said noise statistics the error provided by the regression model for the reported confidence value.
Preferably, the regression model is determined on the basis of a scatter pot obtained from a plurality of detections of the object by the sensor, wherein each point of the scatter plot is derived from a confidence value and an error determined at each detection
The present disclosure further relates to a non-transitory computer-readable medium comprising program instructions for causing a processor to execute the method according to any of the method embodiment or according to any possible combination of its embodiments.
The present disclosure also relates to a tracker for implementing the aforementioned method according to any of its embodiments or according to any possible combination of its embodiments, said tracker comprising a processing unit for storing and running the Kalman algorithm and steps of the method, a first communication interface for connecting at least one sensor providing noisy observation data, and a second communication interface for receiving reference data from an external data source.
Other embodiments and advantages are disclosed hereafter in the detailed description.
The disclosure and the embodiments suggested in the present disclosure is to be taken as non-limitative examples and are better understood with reference to the attached Figures in which:
As schematically shown in
Without going into details, the Kalman algorithm 10 is based on a two-step process as schematically depicted in
At its output, the Kalman algorithm 10 provides output data 13 regarding the observed object, in particular dynamic parameters defining the object 40 or related to it at an instant t. The output data 13 includes an estimation 14 and a precision 15 of the estimation 14. In
The Kalman algorithm 10 further takes as input a corrected previous estimation 13′ at the instant t−1. Accordingly, the corrected previous estimation 13′ is also identified in
Modern machine-learning-based object sensors, most notably neural networks, often report a confidence value 22 with each detected object 40. The confidence value 22, especially each confidence value 22a, 22b reported by each of the sensors 20a, 20b (
It is to be noted that the term precision is used here in accordance with its definition from statistics view. Accordingly, the precision is to be regarded as being the fraction of relevant instances among the retrieved instances. In other words, it is the probability that the observation issued from the sensor 20 is correct. There are two possible outcomes regarding the output issued by a sensor 20 from measures applied onto an object 40. Either the observations output from the sensor 20 includes some errors, e.g., regarding the location or the dimensions of the detected object 40, or the sensor 20 has detected a false object, since no object 40 exists in real in the scene. In the first case, the wording “true positive” is used, while in the second case the wording “false positive” is used.
The concept of “false positive” and “false negative” is also shown in connection with
From the foregoing, the precision may be regarded as being the ratio of “true positives” to the total of “true positives” and “false positives”. In other words, if the “true positives” are noted TP and the “false positives” are noted FP, as shown in the example of
In the Kalman filter shown in
In its main approach, the method of the present disclosure is generally described on the basis of
It is to be noted that the reference data 51 can typically be received at the tracker 100, e.g., via the second communication interface 105, and that deriving and using the noise statistics 30 can be carried out by the tracker 100, in particular by the processing unit 102, e.g., by the Kalman filter 101 which may considered as being part of the processing unit 102. It is also to be pointed out that the noise statistics 30 are not to be confused with the confidence value 22 reported by the sensor 20. Indeed, the confidence value 22 relates to data output by the sensor, as shown in
The data source 50, from which reference data 51 is provided, is typically an external data source 50, in particular a data source external to the Kalman algorithm 10, so that the reference data does not comes from the Kalman algorithm 10. Accordingly, the data source 50 is external to the Kalman filter 101 and preferably external to the processing unit 102. Still preferably, the data source is external to the tracker 100. Therefore, the tracker 100 may typically be connected to the data source 50, e.g., via a wired or wireless connection through the second communication interface 105. In a preferred embodiment, the data source 50 is a remote database or server to which the tracker 100 can connect e.g., via a wide area network such as the Internet.
Since noise statistics 30 both reflect errors made by the sensor 20 and are derived from reference data 51, it means that reference data 51 can be regarded as ground truth data or data reflecting the absolute truth. The nature and/or format of the reference data 51 may depend on the nature and/or format of the noisy observation data 21, so that both of them are of the same type and can be properly compared. Therefore, and in accordance with a preferred embodiment, reference data relates to the scene from which the noisy observation data 21 has been got through the sensor 20. In other words, reference data 51 may be regarded as a description of the scene that is in the same domain as the inputs of the sensor 20 which are the noisy observation data 21. For example, if the sensor outputs data regarding a plurality of bounding boxes which may be each defined through several object parameters such as center, dimension and/or rotation, then the reference data 51 can also relate to a bounding boxes, in particular to the same parameters as the aforementioned object parameters.
Preferably, deriving the noise statistics 30 from the reference data 51 is obtained by comparing the reference data 51 with the noisy observation data 21 provided by the sensor 20. According to the above example, the noise statistics 30 may be computed by comparing the detected bounding boxes with respect to the reference bounding boxes.
Recently, the advert of machine learning for sensory perception has spurred a surge in collection and annotation of sensor data, used to train machine learning models. The annotated data offers a unique opportunity to automatically measure the error statistics 30 of object sensors 20 and use it, for example, to calibrate trackers 100. Accordingly, in one embodiment reference data 51 includes annotated observation data received from the data source 50. Annotated data, also referred to as “labeled data”, relates to data in which the desired output is marked manually. For example, annotated data can be bounding boxes manually marked by a human around the objects in the scene. Accordingly, annotated data may typically be geometrical data (dimensions, distances, angles, segments, curves, coordinates, etc.). On occasion, annotated data relates to dynamical data (speeds, acceleration, etc.) or semantics data (words such as key words for example). Anyway, annotated data are particularly useful in machine learning methods, where data pairs (each comprising an observation and an annotation) are provided to the algorithm which seeks a mapping between the inputs and the desired output. Usually, the annotated data is in the same format as the noisy observation data 21 output from the sensor 20. The only difference may relate to the confidence value, which is typically omitted from the annotated data since they are treated as “absolute truth”.
According to one embodiment, the present disclosure preferably makes use of the reference data 51, preferably high-quality reference data 51, in order to automatically calibrate the tracker 100. Such a calibration may be carried out in an automatic manner during an initialization phase or a so-called calibration phase.
To perform such determinations, noisy observation data 21 are compared to reference data 51 which may typically refer to annotated data relating to the same scene as that on which the sensor 20 is used for capturing the noisy observation data 21. Such a scene may be used for validation purposes during the development of the sensor 20. As schematically shown in
The case where the detection made by the sensor 20 provides a high confidence HC is shown through two examples in which there is a slight offset between the noisy observation data 21 and the reference data 51. On the other hand, the right side of
As illustrated in
Preferably, the precision P of the sensor 20 is used to determine whether the related noisy observation data 21 provided by the sensor 20 is to be considered by the tracker 100 for tracking the object 40 or is to be discarded.
According to one embodiment, noisy observation data 21 relates to a plurality of measured variables 41 of the object 40 from which at least one pair of measured variables 41 is identifiable, and the noise statistics 30 includes at least one of an expected variance of each of the measured variables 41, and an expected covariance between each pair of measured variables 41.
Such an embodiment is illustrated in
According to a general approach, the measured variables 41 may relate to at least one of dimensional data of the object 40, positioning data of the object 40 within the scene, or distance data between the object 40 and the sensor 20. Additionally, or alternatively, the measured variables 41 may relate to at least one of an absolute or relative speed data of the object 40 or an absolute or relative acceleration data of the object 40.
The variance refers to a measure of the dispersion of the values of the sample. It is determined relative to the mean of the measures values, in particular as being the mean of the squares of the deviations from the aforementioned mean. The covariance provides the degree of dependence between two variables, i.e., between a pair of measured variables 41. The noise covariance is the error covariance or in other words the precision of the estimation which may be represented through an error covariance matrix, or a noise covariance matrix identified by vk in
According to one embodiment, the expected covariance between each pair of measured variables 41 is provided in a covariance matrix vk or vector form and is used as an input of the Kalman algorithm 10 for defining an observation noise of the noisy observation data 21. In other words, the observation noise is a noise statistics 30 that can be used as setting parameter of the Kalman algorithm 10. As shown in
According to another embodiment, the sensor 20 reports a confidence value 22, depicted in
In this case, the confidence value 22 is used indirectly to weight the noisy observation data 21. Such an operation may be carried out by any routine within the processing unit 102 or eventually within the Kalman filter 101. Noisy observation data 21 with high confidence value 22 may cause a low noise estimation to be provided into the Kalman algorithm 10. Accordingly, the tracker 100, internally, may thus give higher weight to that detection, because the noise estimation is an important part of the decision on what weight it is to give it.
According to another embodiment schematically shown in
Again, such operations may be carried out by any routine within the processing unit 102 or eventually within the Kalman filter 101. Besides, from the aforementioned operations, it is to be understood that the confidence 22 is assigned to one confidence interval 23 among a plurality of predetermined confidence intervals 23 which, in
Preferably, the error parameter 25 is determined during a preprocessing phase aiming to run the sensor 20 over a set of reference data 51 to collect a plurality of confidence values 22 and separate the reference data 51 into subsets depending on the confidence values 22. For each subset, the noisy observation data 21 provided by the sensor 20 is compared with the related reference data 51 and computed is an error covariance and the precision of the sensor as data forming the error parameter 25.
Also, as mentioned earlier, such operations may be carried out by any routine within the processing unit 102 or eventually within the Kalman filter 101. The computed error covariance (noise covariance) is preferably made in the form of a matrix as shown in
According to another embodiment, the sensor 20 reports a confidence value 22 quantifying a probability that noisy observation data 21 is correct or accurate, and the method further includes—using a regression model, which establishes a relationship between an error and the confidence value 22 for determining the error corresponding to the reported confidence value 22, including in the noise statistics 30 the error provided by the regression model for the reported confidence value 22.
As mentioned before, such operations may be carried out by any routine within the processing unit 102 or eventually within the Kalman filter 101. In addition, the aforementioned regression model can, for example, be used instead of a table providing a mapping between a first column providing the confidence values 22 and a second column providing the related errors that can be mapped to these confidence values 22. By using data of such a table, it can be also possible to get a scatter plot on a Cartesian graph wherein the values of the two columns may be reported on the two axis of the graph.
Accordingly, the regression model may be determined on the basis of a scatter plot obtained from a plurality of detections (i.e., noisy observation data 21) of the object 40 by the sensor 20, wherein each point of the scatter plot is derived from a confidence value 22 and an error determined at each detection.
As suggested in one embodiment, the noise statistics 30 may me regarded as confidence-dependent noise statistics if the confidence values 22 reported by the sensors 20 are used. These noise statistics 30, or confidence-dependent noise statistics, have two main applications when provided as inputs into a tracker 100, especially into the Kalman algorithm 10 of the tracker 100. The first application is to use the confidence values 22 to improve tracking performance of the Kalman tracker 100, and the second application is to use the confidence values 22 in order to fuse multiple sensors 20. Indeed, it is sometimes worthwhile to subdivide the covariance estimation across different axes. For example, an LIDAR-based object sensor 20 may accurately predict the orientation of an object 40, e.g., a vehicle, but such a sensor 20 may make much more mistakes on a pedestrian.
As first use case aiming to use confidence information for better tracking,
Indeed, as first way, the expected covariance matrix vk can be fed into the tracker 100 as an “observation noise covariance” parameter. The aforementioned parameter, when measured correctly, ensures that the tracker 100 optimally weights its internal state versus the external observations through the noisy observation data 21. The lower the predicted noise of a particular variable in an observation, the higher the assigned weight of the observation is to be. In contrast to typical applications of Kalman filters in which all observations are assigned similar errors, or in which the errors each time are determined heuristically, in the present disclosure they are predicted separately for every detection based on actual measurements.
As a second way, the expected precision P can be used to determine whether the noisy observation data 21 may be incorporated into a track, especially into tracking data 110 output by the tracker 100. It can be determined on the basis of the maximum rate of false-positives FP that the tracker allows according to predefined setting parameters.
Thus, by providing a separate noise covariance matrix vk for each noisy observation data 21, we can naturally leverage the tracker 100 to use better or higher uncertainty estimates when incorporating the more uncertain detections, i.e., the more uncertain noisy observation data 21. These uncertainty estimates are also carried over into the track noise estimates maintained by the tracker 100, resulting in better decisions on when a track is no longer reliable. The noise covariance matrix vk may be obtained in an automatic manner during a calibration phase and thus require no manual tuning of parameters.
The second use case, mentioned above, aims to use confidence information for fusing multiple object sensors 20. Such a case is schematically illustrated through
Accordingly, fusing together the output of multiple processes, even when applied to the same noisy observation data 21, can provide superior performances. However, there is significant difficulties when trying to fuse the confidence values 22 provided by separate object detectors, such as the sensors 20a, 20b, since there is no standard way to report confidence values 22, and thus no reliable way to compare confidences of different sensors 20a, 20b. Indeed, differences in the reported confidence values 22 may include: a) Confidences lying in completely different ranges, or b) Confidences lying in the same range but are based on different scales. For example, a confidence value of 0.6 might denote a high confidence for the sensor 20a, whereas the same value (0.6) might denote a low confidence for the other sensor 20b. The differences in the reported confidence values may include c) The scales may also be exactly the same, but the sensors 20a, 20b themselves have different qualities of prediction. For example, a low-confidence observation provided by a “good” sensor 20 can be comparable to a high-confidence observation issued from a “bad” sensor 20.
Therefore, comparing confidence values 22 of multiple sensors 20 may require designing the sensors to output the same “type” of confidences. However, doing this can require careful engineering and may prohibit using off-the-shelf software components.
Advantageously, the present disclosure also efficiently solves the problem of comparing confidences of different sensors 20 by transforming the confidence values 22 issued by these sensors 20 into a conventional coordinate system that can easily be compared. Indeed, for each of the sensors 20, it is possible to determine the expected noise covariance (noise covariance matrix vk) and precision P of the sensor 20 in accordance with the previously disclosed embodiments. More specifically, the expected noise covariance vk and precision P can be determined as a function of the confidence value 22 of the related detector (e.g., the sensor 20), based on its precomputed calibration data. Instead of using the confidence value 22 for assessing the noisy observation data 21, each observation (noisy observation data) is now associated with its expected precision P and expected noise covariance matrix vk, as schematically depicted in
By replacing the confidence value 22 with a noise term (noise statistics 30) comprising the expected noise (noise covariance matrix vk) and precision P of the sensor 20, we transform something that only has meaning in the context of a single sensor 20 to something that has a universal meaning, especially a meaning for a plurality of sensors 20 and in particular a meaning for any sensor 20. As a result, the uncertainties reported by sensors 20 can be directly compared to each other, without requiring any modification to the sensors 20 themselves. Furthermore, the new values comprised in the noise statistics 30 have an interpretable “real world” meaning. Advantageously, this makes it much simpler to make design decisions and incorporate constraints, e.g., specify maximum tolerances on the predictions of specific variables of interest.
The present disclosure also relates to a non-transitory computer-readable medium 60 comprising program instructions for causing a processor, such as the processing unit 102 of
The term “non-transitory” does not exclude legitimate tangible temporary storage media such as flash drives or any rewritable storage media. Generally speaking, a computer-accessible medium may include any tangible or non-transitory storage media or memory media such as electronic, magnetic, or optical media. Such a media may be a disk, a CD/DVD-ROM or any other medium which may couple to the processing unit 102, e.g., via a dedicated communication interface or one of the already disclosed communication interfaces, or which may be located within another device (e.g., in a permanent manner), such as the processing unit 102 for instance.
The terms “tangible” and “non-transitory,” as used herein, are intended to describe a computer-readable storage medium (or “memory”) excluding propagating electromagnetic signals but are not intended to otherwise limit the type of physical computer-readable storage device that is encompassed by the phrase computer-readable medium or memory. For instance, the terms “non-transitory computer-readable medium” or “tangible memory” are intended to encompass types of storage devices that do not necessarily store information permanently, including for example, random access memory (RAM) and flash memory. Program instructions and data stored on a tangible computer-accessible storage medium in non-transitory form may further be transmitted by transmission media or signals such as electrical, electromagnetic, or digital signals.
The present disclosure further relates to a tracker 100 for implementing the aforementioned method according to any of its embodiments or according to any possible combination of its embodiments. As shown in
The aforementioned steps of the method encompass all the required steps to carry out any embodiment of the method or any possible combination of its embodiments. The estimate state of the object 40 is provided through the tracking data 110 outputs by the tracker 100, e.g., via a third communication interface 103. It is to be noted that the number of communication interfaces may be different, for example a single communication interface may replace several communication interfaces, especially among the communication interfaces 103, 104, 105. Furthermore, the processing unit 102 may comprise the Kalman filter 101, as depicted in
Finally, the present disclosure also relates to vehicle 70, preferably to a motor vehicle 70, comprising at least the aforementioned tracker 100, as schematically depicted in
Although an overview of the inventive subject matter has been described with reference to specific example embodiments, various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of embodiments of the disclosure disclosed in the present description.
Number | Date | Country | Kind |
---|---|---|---|
21163577.6 | Mar 2021 | EP | regional |