This application claims priority to European Patent Application Number EP22176916.9, filed Jun. 2, 2022, which in turn claims priority to European Patent Application Number EP21180296.2, filed Jun. 18, 2021, the disclosures of which are incorporated by reference in their entirety.
In the field of driver assistance systems and autonomous driving, radar sensors are often used to perceive information about the vehicle's environment. One category of problems to be solved is to determine which parts of the environment of the ego-vehicle are occupied (for example in terms of an occupancy grid) or where the ego-vehicle can safely drive (in terms of an underdrivability classification). For such purpose(s), it may be relevant to decide which portions of the environment are occupied or whether a detected object in front of the ego-vehicle is underdrivable (like a bridge) or not (like the end of a traffic jam).
Often, the capability of automotive radars is limited with regards to the resolution and accuracy of measuring, for example relating to the distance and/or elevation angle of objects (from which the object height can be determined). Because of such limitation(s), advanced methods may be desired to resolve the uncertainty in occupancy grid detection and/or in classifying between underdrivable and non-underdrivable objects.
Nowadays, machine learning techniques are widely used to “learn” the parameters of a model for such an occupancy grid determination or classification. One factor for developing machine learning methods is the availability of ground truth data for a given problem.
Generally, machine learning methods are trained. A possible way of training a machine learning method includes providing ground truths (for example, describing the ideal or wished output of the machine learning method) to the machine learning method during training.
There are some standard but expensive and/or time-consuming methods to generate ground truth data, including manual labeling of the data by a human expert and/or using a second sensor (e.g., Lidar/camera). The second sensor can be available along with the radar on the same vehicle, and extrinsic calibration and/or temporal sync can be performed between the sensors. Furthermore, additional methods can be used to determine an occupancy grid and/or underdrivability with the second sensor.
Thus, there is a need for improved methods for providing ground truth data.
The present disclosure relates to methods and system for generating ground truth data, and in particular for employing future knowledge when generating ground truth data—e.g., for radar-based machine learning on grid output.
Further, the present disclosure provides a computer implemented method, a computer system, and a non-transitory computer readable medium according to the independent claims. Example implementations are given in the dependent claims, the description and the drawings.
In one aspect, the present disclosure is directed at a computer-implemented method for generating ground truth data, with the method including the following steps carried out by computer hardware components: for a plurality of points in time, acquiring sensor data for the respective point in time; and for at least a subset of the plurality of points in time, determining ground truth data of the respective point in time based on the sensor data of at least one present and/or past point of time and at least one future point of time.
In other words, sensor data from future point(s) in time may be used to determine ground truth data of a present point in time. It will be understood that this is possible, for example, by recording sensor data for a plurality of points in time and then by, for each of the plurality of points in time, determining ground truth data for the respective point in time based on several of the plurality of points in time (e.g., including a point in time which is after the respective point in time).
Ground truth data may represent information that is assumed to be known to be real or true, and it is usually provided by direct observation and measurement (e.g., by empirical evidence) as opposed to information provided by inference.
According to various aspects, the present point of time, past point of time and/or future point of time are relative to the respective point in time.
According to various aspects, the sensor data comprises at least one of radar data or lidar data.
According to various aspects, the computer-implemented method further includes training a machine-learning model (e.g., an artificial neural network) based on the ground truth data. Alternatively, the ground truth data may be used for any other purpose where ground truth data may be required, for example for evaluation. Ground truth data may refer to data that represents the real situation; for example, when training a machine-learning model, the ground truth data can represent the desired output of the machine-learning model.
According to various aspects, the machine-learning model comprises a step for determining an occupancy grid; and/or the machine-learning model comprises a step for underdrivability classification. “Underdrivability classification” may provide a classification (for example, of cells of a map) into “underdrivable” (e.g., suited for a specific vehicle to drive under or underneath) and “non-underdrivable” (e.g., not suited for a specific vehicle to drive under or underneath). An example of a “non-underdrivable” cell may be a cell which includes a bridge which is too low for the vehicle to drive under or a tunnel which is too low for the vehicle to drive through.
According to various aspects, the ground truth data is determined based on at least two maps. It has been found that by using at least two maps, an efficiency of the method may be increased due to the more versatile data available in at least two maps (as compared to a single map).
According to various aspects, the at least two maps include a limited-range map based on scans below a pre-determined range threshold (here, “range” means the distance between sensor and object). Thus, the limited-range map may (e.g., only) include scans with a limited range. For the limited-range map, scans (or data which includes the scans) above the pre-determined range threshold may not be used (or may be discarded) when determining the limited-range map. For example, scans from sensors which provide scans of a long range may be limited to those below a specific range (so that only scans with a range below the pre-determined range-threshold are used for determining the limited-range map). Alternatively, sensors which only can measure up to the pre-determined range threshold may be used (so that no scans have to be discarded, because all scans provided by the sensor are per-se below the pre-determined range threshold).
According to various aspects, the at least two maps include a full-range map based on scans irrespective of a range of the scans. Thus, the full-range map may include scans with a full range.
According to various aspects, a cell is labelled as non-underdrivable or underdrivable based on a probability of the cell of the full-range map and a probability of the corresponding cell of the limited-range map. A probability of each cell may indicate a probability that an object is present in that cell. In other words, the probability can be related to occupancy.
According to various aspects, the cell is labelled as non-underdrivable if the probability of the cell in the limited-range map is above a first pre-determined threshold. The first pre-determined threshold may, for example, be set to a value that is at least equal to the default probability for a cell before a detection is made. The default probability for a cell before a detection is made may be 0.5. The first pre-determined threshold may, for example, be set to 0.5 or to 0.7. Thus, it may be ensured that the probability value exceeds this threshold in order to be sure about the occupancy of the cell (e.g., “that there is an object”).
According to various aspects, the cell is labelled as underdrivable if the probability of the cell in the full-range map is above a second pre-determined threshold and the probability of the cell in the limited-range map is equal to a value representing no occupation in the cell, for example 0.5 (which means that there is no occupancy). The second pre-determined threshold may, for example, be set to 0.5 or 0.7. For example, if the second pre-determined threshold in the full-range map is exceeded in combination with the probability of 0.5 for the limited-range map, it may be determined that there is an object and that the object is underdrivable.
In another aspect, the present disclosure is directed at a computer system, with said computer system including a plurality of computer hardware components configured to carry out several or all steps of the computer-implemented method described herein. The computer system can be part of a vehicle.
In another aspect, the present disclosure is directed at a computer system, with said computer system including a plurality of computer hardware components configured to use the machine-learning model trained according to the computer-implemented method as described herein. According to various aspects, the computer system can include or be part of an advanced driver-assistance system.
The computer system may include a plurality of computer hardware components (for example a processor, for example a processing unit or processing network, at least one memory, for example a memory unit or memory network, and at least one non-transitory data storage). It will be understood that further computer hardware components may be provided and used for carrying out steps of the computer-implemented method in the computer system. The non-transitory data storage and/or the memory unit may include a computer program for instructing the computer to perform several or all steps or aspects of the computer-implemented method described herein, for example using the processing unit and the at least one memory unit. The computer system may further include a sensor configured to acquire the sensor data.
In another aspect, the present disclosure is directed at a non-transitory computer-readable medium including instructions for carrying out several or all steps or aspects of the computer-implemented method described herein. The computer-readable medium may be configured as: an optical medium, such as a compact disc (CD) or a digital versatile disk (DVD); a magnetic medium, such as a hard disk drive (HDD); a solid-state drive (SSD); a read-only memory (ROM), such as a flash memory; or the like. Furthermore, the computer-readable medium may be configured as a data storage that is accessible via a data connection, such as an internet connection. The computer-readable medium may, for example, be an online data repository or a cloud storage.
The present disclosure is also directed at a computer program for instructing a computer to perform several or all steps or aspects of the computer-implemented method described herein.
Example implementations and functions of the present disclosure are described herein in conjunction with the following drawings, showing schematically:
Employing machine learning methods, for example artificial neural networks, on low-level radar data for object detection and environment classification may provide superior results compared to traditional methods working on conventionally filtered radar detections, as shown by RaDOR.Net (in European Patent Application No. 20187674.5, now European Published Patent Application EP 3 943 968, published Jan. 26, 2022, which is incorporated herein in its entirety for all purposes). The low-level radar data may, for example, include radar data arranged in a cube, which can be sparse as all beamvectors below a CFAR (constant false alarm rate) level may be suppressed. In some cases, missing antenna elements in the beamvector may be interpolated, and calibration may be applied—e.g., with the bin-values being scaled according to the radar equation.
The superior results may be explained by the fact that the radar data contains plenty of information that is removed due to detection filtering and by the ability of the machine learning method to filter this large amount of data in a sophisticated way.
In addition to rich and genuine input sensor data, the preparation of ground truth (GT) data may be relatively important. The GT data can represent the desired output of the machine learning method while not forcing the machine learning method to create an output that fails to actually be represented by the input sensor data.
For example, creating the GT data (manually or automatically) based on a stronger reference (e.g., Lidar) may yield a detailed and precise GT but may overstrain the machine learning method by requesting an output it cannot actually see from the input sensor position or due to the different kind of data acquisition of reference and input sensor (e.g., Lidar and Radar). This effect can bear a potential negative effect on the system output.
According to various implementations, the GT data may be determined without using an additional reference sensor. Example applications are determining of an Occupancy Grid (OCG) via a machine learning method or underdrivability classification using a machine learning method. The training pipeline may employ a traditional OCG method on conventionally filtered radar detections to automatically create the GT for the network to train. The relatively naïve procedure of presenting the respective OCG frame output to the network at training would apparently limit the network to output OCG data resembling the quality of the utilized OCG method.
Due to the radar filtering, this method may react only to relatively “strong” signals and may thus delay the time until distant oncoming structures are identified. The machine learning method, in contrast, may have the capability to identify relatively “weak” signals in the radar data (for example, the low-level radar data) to detect these oncoming structures earlier in case it was taught to using appropriate GT that includes these more-distant structures.
According to various implementations, this appropriate GT may be created by feeding the method additional sensor data from “future timestamps” when creating the GT for a current timestamp. This results in a more complete ground truth data while still being based on data of the input sensor only, which incorporates distant and high structures as well, as they lead to “strong” signals in these additional future frames.
According to various implementations, lower-level radar data may be used with an OCG method or traditional underdrivability classification for GT creation.
According to various implementations, a combination with an additional sensor (e.g., Lidar) may be provided.
According to various implementations, the methods as described herein may be used for alternative network output (e.g., multiclass SemSeg instead of OCG). SemSeg stands for semantic segmentation where each data point is assigned a higher level, meaning like a sidewalk or a road. At the same time, OCG may show whether the particular data point represents an occupied region or a free space, but in contrast to SemSeg no higher meaning.
According to various implementations, the methods as described herein may be used for a radar-based automatic ground truth annotation system for underdrivability classification. For example, the method may be for automatically generating ground truth data for the classification problem of under- and non-underdrivability with a radar sensor.
With the automatic ground truth generation as described herein, GT may be established with the used radar itself, an offline system to generate GT data for an online system may be provided, no manual labeling may be needed, no additional sensor may be needed, no additional installing of sensor hardware may be required, no extrinsic calibration/temporal sync may be required for any additional sensors (while calibration and/or synchronization may still be described for the radar itself), no additional software may be needed, and/or fast testing of new radars may be possible (for example, the radar may just need to be installed and driving may start).
According to various implementations, the limited elevation field of view (FoV) may be leveraged to label regions as under- or non-underdrivable.
Due to the limited elevation FoV, underdrivable objects may not be observable at close ranges in comparison to non-underdrivable objects which are also observed at lower ranges.
In order to be able to generate labels for high ranges as well, not only data from the past to the present may be used, but data from the future path of the ego vehicle may be considered.
Furthermore, the information where the ego vehicle (equipped with the radar sensor) drives may be considered during the labeling process.
According to various implementations, in order to automatically generate ground truth data, two different occupancy grid maps may be created:
Labeling may be possible in regions which are considered by the mapping process.
In some cases, cells within the mask region 316 may be automatically labeled, but other remaining cells may be set as “unknown” and may be ignored during training of the machine-learning model.
An example label logic based on FRom, LRom and the mask may be:
The default probability for the occupancy grid maps may be, for example, 0.5.
By the above logics, a cell which is occupied according to the limited-range map is labelled as “non-underdrivable” (since objects which can be detected from a short distance “usually” are non-underdrivable). If an object is not present according to the limited-range map, but it is present according to the full-range map, the cell may be labelled as “underdrivable” (since objects which can be detected from a large distance, but not from a shorter distance, “usually” are underdrivable).
The labeling approach according to various implementations may allow to generate ground truth for cells in a world-centric grid related to the classification of under- or non-underdrivable. As for the grid maps, a world-centric coordinate system and the detections from up to all scans may be used (including the future, and hence, the approach may be referred to as “omniscient”). The labels may then also be available for high ranges where the underdrivable objects are clearly observable within the FoV. This fact may allow machine learning methods to be trained that classify under- and non-underdrivable regions based on radar sensor information like elevation information or RCS (radar cross section) measurements for very high ranges.
The processor 502 may carry out instructions provided in the memory 504. The non-transitory data storage 506 may store a computer program, including the instructions that may be transferred to the memory 504 and then executed by the processor 502. The sensor 508 may be used for determining the sensor data for the respective points in time.
The processor 502, the memory 504, and the non-transitory data storage 506 may be coupled with each other, e.g., via an electrical connection 510, such as, e.g., a cable or a computer bus or via any other suitable electrical connection to exchange electrical signals. The sensor 508 may be coupled to the computer system 500, for example via an external interface, or may be provided as part(s) of the computer system 500 (e.g., internal to the computer system, for example coupled via the electrical connection 510).
The terms “coupling” or “connection” are intended to include a direct “coupling” (for example via a physical link) or direct “connection” as well as an indirect “coupling” or indirect “connection” (for example via a logical link), respectively.
It will be understood that what has been described for one of the methods above may analogously hold true for the computer system 500.
Number | Date | Country | Kind |
---|---|---|---|
21180296.2 | Jun 2021 | EP | regional |
22176916.9 | Jun 2022 | EP | regional |