The present invention is related to a method, a computer program, and an apparatus for determining a driving context of a vehicle. The invention is further related to a driver assistance system, which makes use of such a method or apparatus for determining a driving context of a vehicle, and to an autonomous or semi-autonomous vehicle comprising such a driver assistance system.
The driving strategies deployed by highly autonomous driving systems are dependent on the driving context, i.e. different driving strategies are used when the ego-car is driving on a motorway, in a city, or when it is trying to park. Accordingly, in order to enable a highly autonomous driving system to select an optimal driving strategy, it first needs to be aware of the context in which the vehicle is driving.
Occupancy grids are widely used to map indoor spaces for autonomous navigation by self-driving agents. In this context, convolutional neural networks have been trained on 2D range data for the semantic labelling of places in an unseen environment, as described in the article by R. Goeddel et al.: “Learning semantic place labels from occupancy grids using CNNs” [1]. This approach allows a robot to use Lidar for space classification with convolutional neural networks, where occupancy grids created from Lidar scans have been converted to grey images used in training the convolutional neural networks. In this document, using the trained convolutional neural networks the robot is able to distinguish between three classes, that is, room, corridor, and doorway. This output is further used to create a localization space map. However, this solution to indoor mapping does not apply to outdoor autonomous driving, where the traffic scene has a more complex structure. In particular, driving through an outdoor environment implies the interaction with dynamic objects, an interaction that is not taken into consideration by the method presented in [1].
Also the construction of occupancy grids from the interaction of a robot with its surrounding environment has been reported. Use of recurrent neural networks for tracking and classifying the surroundings of a robot placed in a dynamic and partially observable environment is described in the article by P. Ondruska et al.: “End-to-End Tracking and Semantic Segmentation Using Recurrent Neural Networks” [2]. A recurrent neural network filters the input stream of raw laser measurements in order to infer the objects locations together with their identity, in both visible and occluded areas. The algorithm takes inspiration from Deep Tracking, a deep learning system that leverages on deep neural networks for end-to-end tracking. Raw sensory data is used to construct an occupancy grid, where the visible pixels are labelled for the supervised training of the classifier. The training data has been recorded from a static and stationary position of the robot, resulting in low data variability.
An approach to the usage of neural networks on occupancy data is described in the article by S. Hoermann et al.: “Dynamic Occupancy Grid Prediction for Urban Autonomous Driving: A Deep Learning Approach with Fully Automatic Labeling” [3]. In this document, an environment modelled with a Bayesian filtering technique is processed through a deep neural network with the purpose of obtaining a long-term driving situation prediction for intelligent vehicles. The algorithm predicts future static and dynamic objects using a convolutional neural network trained on occupancy grids.
It is an object of the present invention to provide an improved solution for determining a driving context of a vehicle, which is suitable for application to real-world complex and dynamic scenes, such as autonomous driving.
This object is achieved by a method for determining a driving context of a vehicle according to claim 1, by a computer program code according to claim 10, and by an apparatus for determining a driving context of a vehicle according to claim 11. The dependent claims include advantageous further developments and improvements of the present principles as described below.
According to a first aspect, a method for determining a driving context of a vehicle comprises:
Similarly, a computer program code comprises instructions, which, when executed by at least one processor, cause the at least one processor to determine a driving context of a vehicle by performing the steps of:
The term computer has to be understood broadly. In particular, it also includes electronic control units and other processor-based data processing devices.
The computer program code can, for example, be made available for electronic retrieval or stored on a computer-readable storage medium.
According to a further aspect, an apparatus for determining a driving context of a vehicle comprises:
The proposed solution leverages on the power of deep neural architectures in order to learn a grid-based representation of the traffic scene. Using occupancy grids instead of raw image data allows coping with common uncertainties present in autonomous driving scenes. Examples of such uncertainties are changes in the sensors calibration, pose, time and latency. The occupancy grids are computed in real-time, during the movement of the autonomous car, and allow classifying the environment where the car is currently located. The occupancy grids can immediately be used for classification without a need to accumulate a certain amount of information. The described solution shows a high classification accuracy. Furthermore, the algorithm is very efficient, making it suitable for real-time applications, and can be implemented on low performance processors.
In one advantageous embodiment, the convolutional neural network constructs a grid representation of the driving environment by converting the occupancy grid into an image representation, where the grid cells of the occupancy grid are coded as image pixels. The colors of the pixels can be used to represent states of the grid cells. For example, a first color can indicate an obstacle, whereas a second color indicates free space. A third color may be used to designate an unknown state. In addition, the pixel intensity with respect to a specific color code may be used to represent the occupancy confidence. The image representation is well-suited for subsequent processing by the convolutional neural network.
In one advantageous embodiment, the occupancy grid is constructed using the Dempster-Shafer theory. The Dempster-Shafer theory, also known as the theory of evidence or the theory of belief functions, is well understood and often used as a method of sensor fusion.
In one advantageous embodiment, the occupancy information of the grid cells of the occupancy grid is gradually decreased over time. The content of the grid layer thus gets degraded over time. The grid content is constantly updated in real-time with each sensory measurement. In this way, the interaction with dynamic objects is taken into consideration, which is a useful measure in an outdoor environment.
In one advantageous embodiment, the convolutional neural network consists of a first convolutional layer with 48 kernels and a second convolutional layer with 96 kernels. The size of the convolution kernel is 9×9 for the first convolutional layer and 5×5 for the second convolutional layer. The resulting smaller activation maps help to achieve a real-time performance required by highly autonomous driving systems.
In one advantageous embodiment, the convolutional neural network comprises three fully connected layers linked to a final Softmax activation function for calculating driving context probabilities. In this way the number of layers is reduced to a necessary minimum, which helps to keep the architecture of the convolutional neural network simple.
In one advantageous embodiment, the sensor data are at least one of Sonar data, Lidar data, and Radar data. These types of data are typically available in autonomous or semi-autonomous vehicles. They are well-suited for detecting obstacles and thus for determining an occupancy grid.
In one advantageous embodiment, the driving context is one of inner city, motorway, and parking lot. Highly autonomous driving systems typically deploy different driving strategies when the ego-car is driving on a motorway, driving in the inner city, or when it is trying to park. As such, it is useful if at least these three contexts can be identified.
Advantageously, a driver assistance system comprises an apparatus according to the invention or is configured to perform a method according to the invention for selecting a driving strategy. Such a driver assistance system is favorably used in an autonomous or semi-autonomous vehicle. In this way it is ensured that during autonomous driving optimal driving strategies are selected.
Further features of the present invention will become apparent from the following description and the appended claims in conjunction with the figures.
The present description illustrates the principles of the present disclosure. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the principles of the disclosure.
All examples and conditional language recited herein are intended for educational purposes to aid the reader in understanding the principles of the disclosure and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.
Moreover, all statements herein reciting principles, aspects, and embodiments of the disclosure, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
Thus, for example, it will be appreciated by those skilled in the art that the diagrams presented herein represent conceptual views of illustrative circuitry embodying the principles of the disclosure.
The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term “processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, read only memory (ROM) for storing software, random access memory (RAM), and nonvolatile storage.
Other hardware, conventional and/or custom, may also be included. Similarly, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.
In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a combination of circuit elements that performs that function or software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The disclosure as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.
The occupancy grid fusion unit 22 and the convolutional neural network 23 may be controlled by a controller 24. A user interface 27 may be provided for enabling a user to modify settings of the occupancy grid fusion unit 22, the convolutional neural network 23, or the controller 24. The occupancy grid fusion unit 22, the convolutional neural network 23, and the controller 24 can be embodied as dedicated hardware units. Of course, they may likewise be fully or partially combined into a single unit or implemented as software running on a processor.
A block diagram of a second embodiment of an apparatus 30 for determining a driving context of a vehicle is illustrated in
The processing device 31 as used herein may include one or more processing units, such as microprocessors, digital signal processors, or a combination thereof.
The local storage unit 25 and the memory device 32 may include volatile and/or non-volatile memory regions and storage devices such as hard disk drives, optical drives, and/or solid-state memories.
In the following, a more detailed description of the present approach towards a deep learning system for driving context determination shall be given with reference to
Occupancy grids are often used for environment perception and navigation, applications which require techniques for data fusion and obstacles avoidance. In the present case, the grids are constructed using the Dempster-Shafer theory, also known as the theory of evidence or the theory of belief functions. A pedagogical example of the Dempster-Shafer approach is illustrated in
The basic idea behind occupancy grids is the division of the environment into 2D cells, where each cell represents the probability, or belief, of occupation. In the present approach, Sonar, Lidar, and Radar sensory data are used to model the uncertainty of obstacles measurements and to derive the occupancy belief. A belief is assigned to every cell which intersects the ray of a range measurement. This information is then accumulated over time and fused into a single grid. The content of the grid layer gets degraded over time by gradually decreasing the occupancy information for every grid cell. The grid content is updated over and over again, in real-time, with each sensory measurement.
The occupancy grid computed with the above-described method represents the input to a convolutional neural network, which constructs a grid representation of the driving environment. The grid map is firstly converted into an image representation, where each grid cell is coded as an image pixel. White pixels represent obstacles, free space is coded with medium grey, while unknown states are represented in black. The higher a pixel intensity towards a specific colour code is, the higher the occupancy confidence is.
The system architecture has been developed for deployment within a highly autonomous driving software platform. Therefore, smaller activation maps have been designed in order to achieve real-time performance. The convolutional neural network consists of two convolutional layers with 48 and 96 kernels, respectively. The convolutional kernel has been reduced to a 9×9 size for the first network layer and to 5×5 for the second one. A rectified linear unit filters each convolution, followed by a normalization layer and a pooling operation. The network also contains three fully connected layers linked to a final Softmax activation function, which calculates the driving context probabilities.
To train and validate the described approach, a dataset has been created using sensory data recorded from a test car equipped with Sonar, Lidar and Radar sensors. The test car has been driven in various inner city areas, motorways, and inside parking lots. The recordings were done during daytime and include crowded, as well as light traffic conditions. The occupancy grids have been computed as 2D arrays covering an area of 10×10 m2 for each occupancy grid, with a resolution of 0.25 m. The ego-vehicle is always located in the centre of the occupancy grid.
The system was trained and validated on 6,000 data samples, as follows. The recorded dataset was manually annotated into three classes: Inner city, motorway, and parking lot. From the total amount of samples, 80% were used for training, 15% for validation and 5% for testing.
The training of the system was performed using the NVIDIA deep learning GPU training system (DIGITS), which can be used to rapidly train deep neural networks for image classification, segmentation, and object detection tasks. The classification model was trained from scratch, using the dataset described above, a learning rate α of 0.0001, and Stochastic gradient Descent (SGD) as solver. SGD updates the network's weights W using a linear combination of the previous weight update Vt and the negative gradient ∇L(W). The weight of the previous update is called momentum μ and the learning rate α is the weight of the negative gradient. The following rule was used to calculate the updated value Vt+1 and the updated weight Wt+1 at moment t+1:
V
t+1
=μV
t
−α∇L(Wt)Wt+1 (1)
W
t+1
=W
t
+V
t+1 (2)
The driving context classification accuracy of the system was evaluated. The achieved accuracy was 0.95. The classification performance is summarized in the confusion matrix from the following table, where slight differences in the per-class performance are visible. The class inner city has a higher detection accuracy, since its respective occupancy grids have a more distinctive structure. On the opposite side, a lower accuracy has been obtained for the parking lot class, mainly due to a lower number of training samples.
Apart from its high classification accuracy, one other advantage of the system is represented by the speed of the algorithm, making it suitable for real-time applications. The algorithm runs on a single occupancy grid sample, without the need to accumulate grid data over time. The architecture is simple, the number of layers being reduced to a necessary minimum while keeping an optimal accuracy. Performance tests have shown that the driving context could be classified in approximately 100 ms, on an NVIDIA Quadro K1100M GPU with 384 CUDA Cores (Compute Unified Device Architecture), which by current standards is considered to be a low-performance GPU (Graphics Processing Unit).
The obtained classification results can be further used not only to select different autonomous driving strategies, but also to generate testing scenarios for highly autonomous driving. By adding driving context-related information, specific test cases may be generated for testing autonomous driving functionalities.
A couple of visual samples from the collected test drive data, accompanied by computed occupancy grids and activations of the first layer of the convolutional neural network are shown in
Number | Date | Country | Kind |
---|---|---|---|
19465505.6 | Feb 2019 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2020/052185 | 1/29/2020 | WO | 00 |