The present application claims the benefit under 35 U.S.C. § 119 of European Patent Application No. EP 21 16 9324.7 filed on Apr. 20, 2021, which is expressly incorporated herein by reference in its entirety.
The present invention relates to the evaluation of radar, ultrasound or audio classifiers that are used to interpret radar, ultrasound or audio spectra, e.g., for the purpose of automatically steering a vehicle through road traffic.
Automatic steering of a vehicle through road traffic requires capturing the environment of the vehicle and taking action in case a collision with an object in the environment of the vehicle is imminent. Safe automated driving also requires obtaining a representation of the environment of the vehicle and localizing objects.
Capturing objects by means of radar is independent from the lighting conditions in the traffic scene. For example, even at night, objects may be detected at a large distance without blinding oncoming traffic with high-beam illumination. Also, radar measurements immediately yield the distance to an object and the speed of that object. This information is important for determining whether a collision with an object is possible. However, the radar data does not allow a direct determination of the type of an object.
German Patent Application No. DE 10 2018 222 672 A1 describes a method for determining the spatial orientation of an object based on a measurement signal that comprises the response of the object to electromagnetic interrogation radiation. In particular, this response may comprise a reflection of the interrogation radiation. A classifier, and/or a regressor, is used to determine the sought spatial orientation.
The present invention provides a method for measuring the performance of a classifier for radar, ultrasound or audio spectra. Such a spectrum comprises the dependence of at least one measurement quantity that has been derived from a radar, ultrasound or audio signal on spatial coordinates. For example, reflected radar or ultrasound radiation may give rise to a radar or ultrasound signal. For example, spatial coordinates may comprise a range and one or more angles. In accordance with an example embodiment of the present invention, the classifier is configured to map a radar, ultrasound or audio spectrum to a set of classification scores with respect to classes of a given classification. For example, such classification scores may represent confidences with which the classifier attributes the spectrum to the respective classes. The classes may, for example, represent types of objects, such as other traffic participants or obstacles, or overall assessments of situations, such as a risk level.
In accordance with an example embodiment of the present invention, the method uses a given set of test radar, ultrasound or audio spectra. These spectra form part of, and/or define, a common distribution or manifold. Basically, this distribution or manifold may be understood to define what the test radar, ultrasound or audio spectra have in common. For example, spectra that have been acquired on a lot of different cars form part of a manifold that generically comprises spectra that have been acquired on cars. A spectrum that has been acquired on another car is likely to form part of this same manifold. But a spectrum that has been acquired on a house is not likely to form part of this manifold.
For example, the test spectra may belong to the same distribution or manifold as training spectra that were used to train the given classifier. For training and subsequent testing of a classifier, it is a frequent practice to partition one large set of spectra that belong to a common distribution or manifold into training data for training the classifier, test data for testing the performance of the classifier, and optionally also validation data.
In accordance with an example embodiment of the present invention, at least one evaluation spectrum is obtained. This evaluation spectrum may be a modification of at least one test spectrum with substantially the same semantic content as this at least one test spectrum. In particular, a modification may be chosen such that, compared with the original test spectrum, the evaluation spectrum is moved towards a spectrum that is no longer part of the common distribution or manifold. Figuratively speaking, the evaluation spectrum is then moved from within the common distribution or manifold towards a “boundary” of this common distribution or manifold. This “boundary” is not to be confused with a decision boundary between classes. That is, if the evaluation spectrum is becoming more and more out-of-distribution, this does not yet make it similar to another class. Rather, the evaluation spectrum is moving away from all data of all classes.
But the evaluation spectrum may also be chosen from the start to be a spectrum that does not form part of the common distribution or manifold. For example, the spectrum may comprise only noise, or it may represent a situation that is different from all situations represented by the test spectra in some aspect. For example, a spectrum that represents a house is not part of a common distribution or manifold of spectra that represent cars.
In accordance with an example embodiment of the present invention, the at least one evaluation spectrum is mapped to a set of evaluation classification scores by the given classifier. Based on these evaluation classification scores, and/or on a further outcome produced by the given classifier during the processing of the evaluation spectrum, the sought performance is determined. The further outcome may, for example, be a latent representation that is created in the classifier by a stack of convolutional layers and is to be processed into the set of evaluation classification scores by a classification layer, such as a fully connected layer.
An advantage of measuring the performance of the given classifier in this manner is that the development process of the classifier is facilitated. Usually, a trained classifier is tested in a lot of real situations and has to pass a certain list of tests before it is deemed to be safe for use, e.g., in an at least partially automated vehicle in road traffic. This may involve acquiring a lot of radar, ultrasound or audio spectra on test drives, feeding all these spectra into the classifier under test, and then rating whether an action that the vehicle would perform based on the results obtained from the classifier would be appropriate in the traffic situation at hand. If the classifier fails this test, it is sent back for further development. This process is very time-consuming and expensive. In particular, obtaining spectra may also involve obtaining labels. It may be hard to evaluate measured data without having any labels.
The present method does not need any more labels than are already available in the original training or test dataset. The evaluation spectrum provides its own “ground truth” for rating the outcome of the classifier. Therefore, the present method is usable as a “pre-test” for a classifier that may be performed on a computer before said testing under real conditions. If a given classifier fails this “pre-test”, it is highly improbable that it will pass the test under real conditions, so it may be sent back for further development right away. The situation is in some way analogous to university studies: Students are given high workloads in difficult subjects already in the first semesters as a “pre-test” whether they will be able to successfully complete the course of studies. If a student is flunked out in the first semester already for want of performance in the “pre-test” subjects, this is less expensive for all involved than giving him a failing grade at the end of ten or more semesters.
One unwanted behavior of classifiers that may be detected in this manner is over-confidence. In many applications, there is a desire that the classifier outputs a “one-hot” classification vector that has a nonzero component for only one class. The downside of this is that, even when a perturbation starts to move the spectrum away from the common distribution or manifold, the confidence outputted by the classifier remains on a high plateau. The confidence suddenly drops when the class boundary is crossed. For example, this unwanted behavior of classifiers may be induced by training the classifier according to a cross-entropy loss in combination with hard ground truth labels (i.e., one-hot maximum confidence 1 for exactly one class).
The present method specifically measures a capability that is very important for the use of machine-learning classifiers in automated driving systems, namely the power to generalize. As every human driver knows from his or her own experience with driving classes, this power is indispensable. A human driver typically spends only on the order of some tens of hours behind the wheel and covers less than 1000 km before being licensed. After being licensed, the driver is expected to handle every situation at least well enough that no harm to others is caused, even if the situation is totally unseen and has not been part of the training. For example, if the training has been done during spring and summer, the freshly licensed driver will have zero experience with conditions in autumn and winter, but will nonetheless be expected to drive safely under these conditions.
For the monitoring of a vehicle environment based on radar, ultrasound and/or audio measurements, the power to generalize is even more important. In particular, it is inherent to such measurements that even slight changes in the view point and/or perspective may drastically change the spectrum, so that even two spectra acquired in immediate succession may differ. Therefore, even when the traffic situation itself remains more or less constant, a wide variety of different spectra may occur. Also, radar, ultrasound or audio spectra are hard to interpret for humans, so it is advantageous to have a quantitative and objective measure for the performance of the classifier.
Detecting modifications of test spectra as spectra that basically have the same semantic content is one side of the generalization medal. The other side is how the classifier handles spectra that are clearly out of the common distribution or manifold of the test spectra. In this case, the spectrum does not really fit into any one of the available classes. For example, if the classifier is trained to classify vehicles into passenger cars, trucks, buses, cycles and motorcycles, none of these classes really match a spectrum that represents a house. Yet, it is a frequent occurrence that a classifier just chooses any one of the available classes and assigns a high score to this class, as if the classifier had landed on a Web form that allows only choosing exactly one class and also grays out the “Proceed” button until this one choice has been made. The desired behavior for a clearly out-of-distribution spectrum is that such a spectrum is assigned uniformly low scores with respect to all available classes.
Modifications of test spectra may be used to assess how robust a decision of the classifier is against such modifications. Ideally, if the classifier has learned to assess the pure semantic content of the spectrum, the classification scores should not change if the inputted spectrum is modified in a way that does not change the semantic content. Therefore, in a particularly advantageous embodiment, the determining is based at least in part on a comparison between an outcome of the classifier for the evaluation spectrum and an outcome that the classifier has outputted or should output for
In particular, the outcome that is used for the comparison may comprise:
In a particularly advantageous embodiment of the present invention, the obtaining of the at least one evaluation spectrum may comprise:
That is, an evaluation spectrum is not limited to being created from one single test spectrum only. Rather, several test spectra may serve as building materials for creating one evaluation spectrum.
In particular, the perturbation may specifically be chosen to be a perturbation that is likely to occur during the acquisition of a radar, ultrasound or audio signal with at least one sensor, and/or during the signal processing that derives the at least one measurement quantity from said signal. If the outcome of the classifier does not change too much in response to these perturbations being applied, it is very probable that these perturbations will also not impair the operation of the classifier too much when it is tested under real conditions.
Examples for such perturbations include:
In particular, the sought performance may be determined as a function of a strength of the applied perturbation. This results in a more objective rating of the resiliency against such perturbations. It is to be expected that there is some quantitative limit to the resiliency against most, if not all, disturbances.
As discussed above, a classifier that has been trained to recognize the semantic content of spectra is expected not to change its outcome too much in response to realistic perturbations that basically leave the semantic content intact. Therefore, advantageously, the smaller a difference determined during said comparison with outcomes for test spectra is, the better the sought performance is determined to be.
As discussed above, it is also important that out-of-distribution spectra are recognized as such, rather than being classified into one of the in-distribution classes with high confidence. Therefore, in a further advantageous embodiment, the sought performance is determined based at least in part on a distinguishing performance of the classifier in distinguishing between spectra that do not form part of the common distribution or manifold and spectra that form part of the common distribution or manifold.
Such distinguishing performance may, for example, be measured using an integral of a receiver operating characteristic curve, and/or using a mean-maximal confidence. The latter is optimized when the classifier assigns low confidences to unknown out-of-distribution data.
In a further advantageous embodiment of the present invention, the sought performance may be determined based at least in part on the uniformity of the evaluation classification scores outputted by the classifier for an evaluation spectrum that does not form part of the common distribution or manifold. As discussed above, if the classifier does not really know what to do with an out-of-distribution spectrum, it should not assign this spectrum to one particular class with a high confidence because no class really fits and there is no reason why it should be assigned to said one class and not another.
When designing a classifier, some design choices regarding the architecture, such as sizes, numbers or types of layers, may be expressed as hyperparameters. Also, further hyperparameters may affect the behavior of the training. Examples of these hyperparameters comprise a learning rate and a relative weighting of different terms that together make up an objective function (loss function). The method for measuring the performance of the classifier described above may advantageously be used for automatically optimizing these hyperparameters. The present invention therefore also provides a method for training a classifier for radar, ultrasound or audio spectra.
This method starts with setting at least one hyperparameter that affects the architecture of the classifier, and/or the behavior of the training of this classifier. Training spectra are provided, and at least some of these training spectra are labelled with ground truth classification scores. The classifier is trained in an at least partially supervised manner with the objective that, when given the labelled training spectra, it maps them to the ground truth classification scores.
After the classifier has been trained, its performance is measured with the method described above. At least one hyperparameter is optimized with the objective that, when the classifier is trained and its performance is measured again, this performance is likely to improve. In this manner, the further degree of freedom in the hyperparameter is exploited to improve the final performance of the classifier further.
The present invention also provides a further method that covers the complete sequence of actions up to and including the actuation of a vehicle.
In accordance with an example embodiment of the present invention, this method starts with providing a classifier for radar, ultrasound or audio spectra. This classifier is trained with the method described above.
Using at least one radar, ultrasound or audio sensor carried by a vehicle, at least one radar, ultrasound or audio spectrum is acquired. Using the trained classifier, the at least one radar, ultrasound or audio spectrum is mapped to classification scores. Based at least in part on these classification scores, an actuation signal is determined. The vehicle is then actuated with this actuation signal.
In this manner, when a traffic situation is sensed by the sensor, the probability is increased that the action caused by the actuating of the vehicle is appropriate in this traffic situation because the classifier sufficiently generalizes from its training to this particular traffic situation.
The methods described above may be wholly or partially computer-implemented, and thus embodied in software. The present invention therefore also relates to a computer program, comprising machine-readable instructions that, when executed by one or more computers, cause the one or more computers to perform a method described above. In this respect, control units for vehicles and other embedded systems that may run executable program code are to be understood to be computers as well. A non-transitory storage medium, and/or a download product, may comprise the computer program. A download product is an electronic product that may be sold online and transferred over a network for immediate fulfilment. One or more computers may be equipped with said computer program, and/or with said non-transitory storage medium and/or download product.
Below, the present invention and its preferred embodiments are illustrated using the figures without any intention to limit the scope of the present invention.
In step 110, a set of test radar, ultrasound or audio spectra 2 is provided.
In step 120, at least one evaluation spectrum 4 is obtained. This evaluation spectrum 4 is a modification of at least one test spectrum 2 with substantially the same semantic content as this at least one test spectrum 2, and/or it does not form part of the common distribution or manifold.
In particular, according to block 121, the obtaining 120 of the evaluation spectrum 4 may comprise applying at least one perturbation to at least one test spectrum. This produces a perturbed spectrum. From this at least one perturbed spectrum, the evaluation spectrum 4 may then be determined according to block 122.
According to block 121a, the perturbation may be specifically chosen to be a perturbation that is likely to occur during the acquisition of a radar, ultrasound or audio signal with at least one sensor 10, and/or during the signal processing that derives the at least one measurement quantity of the radar, ultrasound or audio spectrum from said signal.
In step 130, the given classifier 1 maps the at least one evaluation spectrum 4 to a set of evaluation classification scores 5.
In step 140, the sought performance 6 is determined based on the set of evaluation classification scores (5), and/or on a further outcome produced by the given classifier 1 during the processing of the evaluation spectrum 4.
In particular, according to block 141, this determining 140 may be based at least in part on a comparison between an outcome of the classifier 1 for the evaluation spectrum 4 and an outcome that the classifier 1 has outputted or should output for
One example of an outcome that the classifier 1 “should output” for a test spectrum 2 is a ground truth label associated with this test spectrum 2.
According to block 142, the sought performance 6 may be determined as a function of a strength of an applied perturbation.
According to block 143, the sought performance 6 may be determined based at least in part on a distinguishing performance of the classifier 1 in distinguishing between spectra that do not form part of the common distribution or manifold and spectra that form part of the common distribution or manifold.
According to block 144, the sought performance 6 may be determined based at least in part on the uniformity of the evaluation classification scores 5 outputted by the classifier 1 for an evaluation spectrum 4 that does not form part of the common distribution or manifold.
In step 210, at least one hyperparameter 7 is set. This hyperparameter 7 affects the architecture of the classifier 1, and/or the behavior of the training of this classifier 1.
In step 220, training spectra 2a that are labelled with ground truth classification scores 2b are provided.
In step 230, the classifier 1 is trained with the objective that, when given the training spectra 2a, it maps them to the ground truth classification scores 2b. The trained classifier is labelled with the reference sign 1*.
In step 240, the performance 6 of the trained classifier 1* is measured with the method 100 described above.
In step 250, the at least one hyperparameter 7 is optimized with the objective that, when the classifier 1 is trained in step 230 and its performance 6 is measured in step 240 again, this performance 6 is likely to improve. This optimization may be terminated according to any suitable termination criterion. The finally obtained optimized value of the hyperparameter 7 is labelled with the reference sign 7*.
In step 310, a classifier 1 for radar, ultrasound or audio spectra 2 is provided.
In step 320, the classifier 1 is trained with the method 200 described above.
In step 330, at least one radar, ultrasound or audio spectrum 2 is acquired using at least one radar, ultrasound or audio sensor 10 that is carried by a vehicle 50.
In step 340, the at least one radar, ultrasound or audio spectrum 2 is mapped to classification scores 3 using the trained classifier 1*.
In step 350, based at least in part on the classification scores 3, an actuation signal 350a is determined.
In step 360, the vehicle 50 is actuated with the actuation signal 350a.
Number | Date | Country | Kind |
---|---|---|---|
21 16 9324.7 | Apr 2021 | EP | regional |