This application claims priority to European Patent Application Number 21150055.8, filed Jan. 4, 2021, the disclosure of which is hereby incorporated by reference in its entirety herein.
Object detection is an essential pre-requisite for various tasks, in particular in autonomously driving vehicles. Radar is being increasingly implemented for object detection.
Accordingly, there is a need to provide efficient and reliable object detection using radar.
The present disclosure provides a computer implemented method, a computer system, and a non-transitory computer readable medium according to the independent claims. Embodiments are given in the subclaims, the description and the drawings.
In one aspect, the present disclosure is directed at a computer implemented method for detection of objects in a vicinity of a vehicle, the method comprising the following steps performed (in other words: carried out) by computer hardware components: acquiring radar data from a radar sensor; determining a radar data cube based on the radar data; providing the radar data cube to a plurality of layers (for example convolution(al) layers or inner products) of a neural network; resampling the output of the plurality of layers into a vehicle coordinate system; and detecting an object based on the resampled output.
In other words, radar data may be acquired and may be processed in various domains, and the data in the various domains may be used for object detection.
According to another aspect, the computer implemented method further comprises the following steps carried out by the computer hardware components: fusing data from a plurality of radar sensors; and detecting the object further based on the fused data.
According to another aspect, the computer implemented method further comprises the following step carried out by the computer hardware components: acquiring camera data from a camera; wherein the object is detected further based on the camera data.
According to another aspect, the computer implemented method further comprises the following step carried out by the computer hardware components: acquiring lidar data from a lidar sensor; wherein the object is detected further based on the lidar data.
According to another aspect, the computer implemented method further comprises the following step carried out by the computer hardware components: determining an angle of arrival based on the radar data cube.
According to another aspect, wherein the angle of arrival is detected (in other words: determined) using an artificial network with a plurality of layers, for example a plurality of fully connected layers.
Using the artificial network allows an independence of the result from a calibration to the specific sensor arrangement and possible sensor misalignment.
According to another aspect, wherein the artificial neural network further comprises a dropout layer. The dropout layer may increase robustness of the processing.
According to another aspect, the object is detected further based on a regression subnet. The regression network may combine data from the camera, lidar sensor, and the various radar domains.
According to another aspect, the regression subnet comprises at least one of a u-shaped network and a LSTM.
According to another aspect, the regression subnet comprises an ego-motion compensation module. The egomotion compensation may provide that the data that is combined in the regression subnet is provided in the same coordinate system, even if data from various time steps is combined.
According to another aspect, the ego-motion compensation module carries out ego-motion compensation of an output of a recurrent network of a previous time step, and inputs the result of the ego-motion compensation into a recurrent network of present time step.
According to another aspect, the ego-motion compensation module carries out an interpolation, wherein the interpolation comprises a nearest neighbor interpolation and further comprises recording a residual part of a movement. This may avoid a drift (due to an accumulation of location errors) in the location over time.
In another aspect, the present disclosure is directed at a computer implemented method for compressing radar data, the method comprising the following steps carried out by computer hardware components: acquiring radar data comprising a plurality of Doppler bins; determining which of the plurality of Doppler bins represent stationary objects; and determining compressed radar data based on the determined Doppler bins which represent stationary objects.
The compressed radar data may be provided to later processing, for example to further layers of an artificial network. It will be understood that depending on the specific needs, the artificial neural network may be trained specifically. By providing the compressed data to the further processing, the further processing may be carried out efficiently. The further processing may include any processing as described herein.
According to another aspect, the radar data comprises a data cube or a thresholded data cube.
According to another aspect, the computer implemented method further comprises the following step carried out by the computer hardware components: determining an angle of arrival based on the radar data.
According to another aspect, the angle of arrival is determined using an artificial network with a plurality of layers.
According to another aspect, determining compressed radar data comprises: combining the determined Doppler bins which represent stationary objects into a range angle image; subtracting the range angle image from a beamvector.
According to another aspect, the compressed radar data comprises a plurality of images.
According to another aspect, the images represent at least one of an energy map or a velocity map.
According to another aspect, the compressed radar data comprises a velocity map; and the velocity map is determined based on a ramp vector.
According to another aspect, the ramp vector comprises a plurality of monotonically increasing entries.
According to another aspect, the compressed radar data comprises a velocity map and an energy map; and the energy map is determined based on the velocity map.
According to another aspect, the compressed radar data comprises at least one velocity map; and the at least one velocity map is determined based on at least one of a moving learnable Gaussian kernel or a learnable sinusoid kernel.
According to another aspect, the compressed radar data is determined further based on Doppler bins neighboring to the Doppler bins which represent stationary objects.
According to another aspect, the computer implemented method further comprises the following step carried out by the computer hardware components: determining a plurality of trained features using an encoder layer. The compressed data may comprise the trained features.
According to another aspect, the computer implemented method further comprises the following step carried out by the computer hardware components: determining a deviation map based on the velocity map. The compressed data may comprise the deviation map.
Various aspects provide a machine learning pipeline for low level sensor data (e.g. raw radar data).
The methods as described herein may create vehicle detections from neural networks.
In another aspect, the present disclosure is directed at a computer system, said computer system comprising a plurality of computer hardware components configured to carry out several or all steps of the computer implemented method described herein. The computer system may be part of a vehicle.
The computer system may comprise a plurality of computer hardware components (for example a processor, for example processing unit or processing network, at least one memory, for example memory unit or memory network, and at least one non-transitory data storage). It will be understood that further computer hardware components may be provided and used for carrying out steps of the computer implemented method in the computer system. The non-transitory data storage and/or the memory unit may comprise a computer program for instructing the computer to perform several or all steps or aspects of the computer implemented method described herein, for example using the processing unit and the at least one memory unit.
In another aspect, the present disclosure is directed at a vehicle comprising the computer system and the radar sensor.
In another aspect, the present disclosure is directed at a non-transitory computer readable medium comprising instructions for carrying out several or all steps or aspects of the computer implemented method described herein. The computer readable medium may be configured as: an optical medium, such as a compact disc (CD) or a digital versatile disk (DVD); a magnetic medium, such as a hard disk drive (HDD); a solid state drive (SSD); a read only memory (ROM), such as a flash memory; or the like. Furthermore, the computer readable medium may be configured as a data storage that is accessible via a data connection, such as an internet connection. The computer readable medium may, for example, be an online data repository or a cloud storage.
The present disclosure is also directed at a computer program for instructing a computer to perform several or all steps or aspects of the computer implemented method described herein.
Exemplary embodiments and functions of the present disclosure are described herein in conjunction with the following drawings, showing schematically:
Radar signal processing in its core has many similarities to image processing. An FFT (fast Fourier transform) may be used to generate a range Doppler map, in which after sidelobe suppression peaks above the local noise level are identified using a variable threshold. These beam-vectors may be subsequently filtered and finally super resolution angle finding methods may be applied. The present disclosure relates to methods and systems for compressing radar data.
A 2D FFT (fast Fourier transform) may decompose the input signal for each antenna into frequency components and thus range and Doppler. Targets may appear as peaks in an integrated range Doppler map, and peaks above a certain energy level may be processed with FFT/DOA estimation methods and may be used to detect the direction of arrival of targets with a specific range Doppler value. The energy (res) may be extracted, rescaling to scale to output unit systems and looktype compensation, detection point cloud generation, and tracking and object hypothesis generation may be provided, as described in more detail herein.
Looktype compensated Doppler and range values as well as RCS (radar cross-section) values may be generated for the remaining peaks, which form a detection list. The detection list may be transformed to Cartesian coordinates and the measurements may be processed by a tracker which filters mistakes and creates temporally stable tracks from flickering detections as well as bounding boxes. Deep (artificial neural) networks may be used in image processing, language processing and other fields. Deep architectures have proven to deliver superior results to the previous hand crafted algorithm, feature engineering and classification. The question is, how to
Looktypes may be provided to handle the sampling problem: The sampling of a signal needs to obey Nyquist sampling theorem, and violation of the Nyquist sampling theorem may result in ambiguities in the reconstructed signal. The problem may be resolved for the range by low pass filtering to remove frequencies higher than fsampling/2. Regarding Doppler, using different resolutions for different scans (Looktypes) yields different ambiguous results, and temporal fusion methods (e.g. tracking) may then be used to resolve the ambiguities by using at least two detections of the same object. In this context, range looktypes refers to different resolutions in range so that it is also possible to get finer resolution (every n-th frame). For example, the data cube may include different resolutions, for example four different resolutions (in other words: four looktypes).
According to various embodiments, a network, which may be referred to as the RaDOR.Net-Radar Deep Object Recognition network, may be provided to solve these problems.
Various embodiments may allow for superior performance compared to traditional methods and may provide data driven performance gains.
In a full end2end pipeline, only input from the CDC domain 118 may be used.
As an input, a 3D Compressed Data Cube (CDC) 112 may be used. This cube may be sparse as all beamvectors below CFAR (constant false alarm rate) level may be suppressed. Missing antenna elements in the beamvector may be interpolated. Calibration may be applied, and the bin-values may be scaled according to the radar equation. According to various embodiments, uncompressed data cubes may be used, and a ML (machine learning) based bin suppression method may be utilized.
Data from the CDC domain 118 may be used in a CDC domain subnet 126. In the CDC domain subnet 126, on each beamvector of the range Doppler map, an angle finding network is applied. This network may be a MLP (multilayer perceptron) which may share the same parameters across all beam-vectors or more complex like described in more detail below. The CDC domain subnetwork 126 may create a range, angle, Doppler cube which may be subsequently processed with convolution layers to filter the input.
Data from the polar domain 120 may be provided to the polar domain subnet 128. In the polar domain subnet 120, the Doppler bins may be ego-motion compensated depending on the angle bins (different projections of the ego-speed).
The Doppler component may be compressed into multiple feature maps using an encoder subnetwork. A resampling layer may map different looktypes to a common representation and a coordinate conversion resampling layer may convert the feature planes from polar coordinates to a Cartesian feature plane output. In order to alleviate problems with interpolation artifacts, both conversations may be combined into one step.
The idea behind this transformation is to process further information in a feature space where object shape is invariant to translation (e.g. the same convolutions may be applied at different spatial locations in this space).
A looktype transformer 130, a polar to vehicle coordinate system transformer 132 and an ego motion transformer 134 may be provided.
Data from the VCS sensor domain 122 may be provided to the VCS sensor domain subnet 136. In the VCS sensor domain subnet 136, a max pooling may be applied to fuse the results of different radars, in order to generate feature planes combining the observation feature planes from all radars. In further embodiments, other fusion methods like gated fusion may be applied. A sensor fusion module 138 may be provided.
Data from the VCS fused domain 124 may be provided to the VCS fused domain subnet 140.
The various domain subnets 126, 128, 136, 140 may be referred to as radar subnet 116.
In a regression subnetwork 142, data from the camera subnetwork 106, the lidar subnetwork 110, and the radar subnetwork 116 may be received.
The regression subnetwork 142 may include a U-shaped network 144, a LSTM (long short-term memory) 146, and an ego-motion transformer 148, and may provide an output 150.
The U-shaped network 144 may be used to detect objects. Ego motion compensated recurrent networks like LSTMs 146 may be used to combine multiple timesteps into one result.
RaDor.Net according to various embodiments may connect data cube input from on radar or from multiple radars and object output in an end to end fashion.
The processor 202 may carry out instructions provided in the memory 204. The non-transitory data storage 206 may store a computer program, including the instructions that may be transferred to the memory 204 and then executed by the processor 202.
The processor 202, the memory 204, and the non-transitory data storage 206 may be coupled with each other, e.g. via an electrical connection 214, such as e.g. a cable or a computer bus or via any other suitable electrical connection to exchange electrical signals. The at least one camera 208, the at least one lidar sensor 210, and/or the at least one radar sensor 212 may be coupled to the computer system 200, for example via an external interface, or may be provided as parts of the computer system (in other words: internal to the computer system, for example coupled via the electrical connection 214).
The terms “coupling” or “connection” are intended to include a direct “coupling” (for example via a physical link) or direct “connection” as well as an indirect “coupling” or indirect “connection” (for example via a logical link), respectively.
It will be understood that what has been described for one of the methods above may analogously hold true for the computer system 200.
The input data may be a data cube or a thresholded data cube (which may also be called compressed data cube or CDC). The CDC may contain only those beam vectors (and their neighborhood), where a sum over all antennas exceeds a noise threshold (for example CFAR).
According to various embodiments, an end to end differentiable CDC compression may be provided.
Radar signal preprocessing may be driven by a quick reduction of data and calculation time. Deep learning methods may be provided for data compression (in other words: reduction).
In deep learning, raw data may be connected with high level results using an appropriate architecture. Then, each layer may be trained using backpropagation, finally adapting the network parameters to generate appropriate intermediate and final results.
The embodiments described in the following may provide a modification of the Rador Net as described above and may extend Rador Net with a new front design with a data bottleneck (in other words: with data reduction or data compression). According to various embodiments, data cubes may be compressed. A framework may be provided to generate small, interpretable and visualizable intermediate layers, which are ended to end differentiable, allowing to use network components before and after compression.
In the radar processing chain, a downmixed and preprocessed time domain signal is aligned as 2d signal for each antenna and then processed with a range and range rate FFTs to generate 2d planes for each antenna with complex responses for reflections for different range and range-rate bins. Then the planes are integrated into a 2d plane, a running average CFAR threshold is then used to sub select bins with sufficient energy indicating targets. Then a beamforming is used to find the direction of arrival for targets, the maxima in the range Doppler bin are used to generate detections together with additional processing steps translating bins to range and angle and generating additional data which is needed for the output. The final detections are sent out. They make about ˜ 1/100000 of the original cube data. In this data bottleneck, not only a lot of data is lost, but there is also the fact, that a detection is usually defined as a representative point of energy maxima. This definition might not be in line with the needs of the later process steps. Deep learning may concept to connect low level data with high level output, optimizing different processing steps to the final needs.
According to various embodiments, data may be reduced, and information may be propagated, such that it fits to the final needs. According to various embodiments, a representation mechanism is provided which is extensible, interpretable and differentiable, which is not only an interesting property for engineering, but features like differentiability are also essential for a deep learning architecture, as processing steps before the compression may be tailored towards the needs of later processing steps.
Differentiability may be provided by only using elementary mathematical operations like addition, subtraction, multiplication, and division.
According to various embodiments, a first processing pipeline may be provided which has the defined properties and is efficient, so that the whole processing, may be implemented on an embedded processor not even requiring a machine learning accelerator.
The ra cube 802 may be a three dimensional cube with dimensions range, angle and doppler. The block 804 may represent 1D convolutions over the doppler dimension. These convolutions may be shared over all range and angle bins, so that on every range angle position the same 1D convolution over the Doppler dimension is applied. This may refine and sharpen the doppler spectrum given for each range angle bin.
First, it may be identified which Doppler bins may represent stationary objects (for example based on a stationary map 806), and these bins are then combined into a range angle image from the cube and the entry is subtracted from the beamvector.
The radar sensor may measure the range and the Doppler (or range-rate, which may be the radial speed of objects, which is the change of range over time) on each of its receiving antenna. The measured range and doppler data may be output in a discretized manner, e.g. in the form of bins. Each bin may cover the information of a defined interval.
A range angle image may be an image or a matrix which cells are indexed by range and angle. In other words, a range angle image may contain a value for each pair of range and angle indices.
A beamvector as used herein may be understood as a doppler spectrum.
The meaning of combining bins into a range angle image from the cube and subtracting the entry from the beamvector may thus be that the Doppler bins corresponding to stationary objects are subtracted or zeroed out for the calculation of the velocity map and the moving energy.
The remaining energy is summed up to generate the energy 812 for moving targets. Furthermore, a weighted multiplication with a velocity ramp vector 808 may be provided, wherein each ramp entry is weighted with the corresponding energy for the ramp value and divided by the sum of all energy entries. This entry may calculate the expectation value for the speed (for example in a velocity map 810), thus the mean Doppler speed after subtracting the stationary bins.
The stat offset block 814 is related to the ego motion compensation. According to various embodiments, instead of carrying out ego motion compensation on the ra cube 802 (which may be expensive as the cube may be big and an index shift on the doppler dimension with a different shift for each range and angle may be necessary, which may result in an expansive memory reordering) ego motion compensation may be realized by adding the range and angle dependent speed offset to the velocity map. The speed offset may be the absolute speed that a target with zero relative speed (e.g., a stationary target in relative speed) may have.
The vel energy 812 as a sum over energy may be shift independent.
In the stat map 806, the bins which represent stationary information may be identified.
Both the stat offset 814 and the stat map 806 may be defined using the ego speed and both may be range angle dependent. The stat offset 814 may consist of a speed value representing the offset between relative and absolute speed. The stat map 806 may consist of a bin index to identify which Doppler bin correspond to zero absolute speed.
It will be understood that the scheme illustrated in
The scheme illustrated in
The former Doppler compression layer was a set of dense layers on the doppler dimension to reduce the Doppler bins to a reduced feature size. This set of dense layers on the Doppler dimension are shared over all range angle bins.
According to various embodiments, the number of Doppler bin may be reduced to a significantly smaller feature number.
The former features were fully learned by a dense layer and are therefore not interpretable, but able to learn an abstract encoding. The maps 806, 810, 812 are interpretable, meaningful, and proved to perform good, however they are human designed and fixed. According to various embodiments, in order to get the best of both worlds, the Enc map 914 may be provided as shown in
In the system illustrated in
Another velocity map may be created using learnable sinusoid kernels 1016. The learnable sinusoid kernels 1016 may consist of sinusoids with three learnable parameters (amplitude, frequency, and phase). The stationary bin index 1004 may be used to shift the phase so that the sinusoids have a unified representation for different ego speeds as well. The learnable sinusoid kernels 1016 may create the second velocity maps 1018.
A kernel in this context may be a vector with the length of the Doppler dimension. The values of the kernel may represent either a Gaussian or a Sinusoid function defined by the learnable parameters mentioned above. For each range-angle cell in the ra cube, the inner products of the range-angle cell's Doppler dimension vector and all of the kernel vectors may be calculated. So, for each kernel, one velocity map (or the stationary map) may be obtained, and in the end, there may be as many maps as there are kernels.
According to various embodiments, the radar data may include or may be a data cube or a thresholded data cube.
According to various embodiments, the method may further include determining an angle of arrival based on the radar data.
According to various embodiments, the angle of arrival may be determined using an artificial network with a plurality of layers.
According to various embodiments, determining compressed radar data may include: combining the determined Doppler bins which represent stationary objects into a range angle image; and subtracting the range angle image from a beamvector.
According to various embodiments, the compressed radar data may include or may be a plurality of images.
According to various embodiments, the images may represent at least one of an energy map or a velocity map.
According to various embodiments, the compressed radar data may include or may be a velocity map, and the velocity map may be determined based on a ramp vector.
According to various embodiments, the ramp vector may include a plurality of monotonically increasing entries.
According to various embodiments, the compressed radar data may include or may be a velocity map and an energy map, and the energy map may be determined based on the velocity map.
According to various embodiments, the compressed radar data comprises at least one velocity map; and the at least one velocity map is determined based on at least one of a moving learnable Gaussian kernel or a learnable sinusoid kernel.
According to various embodiments, the compressed radar data may be determined further based on Doppler bins neighboring to the Doppler bins which represent stationary objects.
According to various embodiments, the method may further include determining a plurality of trained features using an encoder layer.
According to various embodiments, the method may further include determining a deviation map based on the velocity map.
Each of the steps 1102, 1104, 1106 and the further steps described above may be performed by computer hardware components, for example by a computer system 200 as illustrated in
Number | Date | Country | Kind |
---|---|---|---|
21150055.8 | Jan 2021 | EP | regional |