The present disclosure relates to building a neural network that is part of a lane segment or lanelet classification system for an autonomous vehicle. The present disclosure is also directed towards training the neural network of the lanelet classification system.
An autonomous driving system for a vehicle is a complex system that includes many different aspects. For example, an autonomous driving system may include multiple sensors to gather perception data with respect to the vehicle's surrounding environment. The autonomous driving system may build a lane graph structure composed of a plurality lane segments that are homogeneous in driving-related characteristics. Some examples of driving-related characteristics include, but are not limited to, the type of paint markings at an edge of the lane, the number of adjacent lanes, and/or the presence of specific proscriptive markings such as turn arrows. The lane segments, which are also referred to as lanelets, are based on the perception data collected by the sensors as well as map data. A lanelet represents a single interconnectable lane segment and is classified based on one or more lane attributes. The lane attributes represent one or more permitted maneuvers associated with a subject lanelet such as, but not limited to, straight, turn left, turn right, bi-directional, split, parking, straight-plus-left, straight-plus-right, and non-drivable.
A lanelet classification model classifies each lanelet into one or more specific lane attributes based on one or more machine learning techniques. Specifically, before classifying the lanelets, the lanelet classification model is first trained using training data obtained from the perception data. The sensors collect the perception data during real-life driving events. However, labeling the training data is a manual process that is performed by one or more individuals, and may be cumbersome and time consuming. Furthermore, the data samples that are part of the training data may be unbalanced and dominated by only a few specific driving scenarios. For example, a majority of the data samples may be collected during highway driving and does not include many examples of driving in a city or urban area.
Thus, while autonomous driving systems achieve their intended purpose, there is a need in the art for an improved approach to train a lanelet classification model.
According to several aspects, a lanelet classification system for an autonomous vehicle is disclosed and includes one or more controllers including a classifier having a neural network that classifies lanelets of a lane graph structure based on one or more lane attributes. The one or more controllers execute instructions to build the neural network by determining a higher dimension feature for a plurality of local lanelets and a subject lanelet, where a spatial relationship exists between the subject lanelet and the local lanelets. The one or more controllers compute an attention score for each of the local lanelets based on the higher dimension feature, where the attention score indicates a weight value that the subject lanelet has on a particular local lanelet. The one or more controllers determine a normalized shared attention mechanism applicable to all of the local lanelets based on the attention score. The one or more controllers compute a transformed feature vector of the local lanelet based on the higher dimension feature and the normalized shared attention mechanism. Finally, the one or more controllers fuse the transformed feature vector for each of the local lanelet together to determine a single fused feature vector, where the single fused feature vector is input to build a subsequent layer of the neural network.
In another aspect, the spatial relationship indicates an upstream, downstream, left, and right relationship between the subject lanelet and the local lanelets.
In still another aspect, the higher dimension feature is determined based on:
where zi(l) is the higher dimension feature, hi(l) represents a set of node features that represent a summarization of local attributes of the local lanelets and the subject lanelet, and W(l) represents a weight matrix.
In an aspect, the attention score is computed based on a nonlinear activation function.
In another aspect, the normalized shared attention mechanism is determined based on a normalization function that sums to 1.
In still another aspect, the transformed feature vector is determined based on:
where h*l(l) represents the transformed feature vector, zi(l) is the higher dimension feature, (i) represents a neighborhood of the particular local lanelet i, aij(l) represents the normalized shared attention mechanism, and σ represents a nonlinear transform function.
In an aspect, the single fused feature vector is determined based on:
where hi(i+1) represents the single fused feature vector, D(l) represents a vector having the same length as the single fused feature vector hi(i+1), and h*i(l) represents the transformed feature vector.
In another aspect, the neural network of the classifier is a graph attention network.
In still another aspect, a lanelet classification system for an autonomous vehicle is disclosed, and includes one or more controllers including a classifier having a neural network that classifies lanelets of a lane graph structure based on one or more lane attributes. The one or more controllers execute instructions to receive simulated data, where the simulated data is a combination of map data and simulated perception data. The one or more controllers combine the simulated data with manual annotations that label the simulated perception data together to create a ground truth data set. The one or more controllers determine training data by mapping labels of one or more groups of labeled ground truth data points that are part of the ground truth data set to one or more groups of perturbed lane edge points that have been displaced from an original group of labeled ground truth data points to another group of labeled ground truth data points that are part of the ground truth data set. The one or more controllers train the neural network to classify each lanelet of the lane graph structure based on the training data.
In an aspect, the one or more controllers execute instructions to determine the neural network of the classifier is completely trained. In response to determining the neural network of the classifier is completely trained, the one or more controllers evaluate perception data generated by a plurality of sensors and the map data.
In another aspect, the one or more controllers execute instructions to identify an amount of overlap between a particular group of perturbed lane edge points and a particular group of labeled ground truth data points by calculating an intersection-over-union evaluation metric.
In still another aspect, the one or more controllers execute instructions to calculate the intersection-over-union evaluation metric based on:
where IOU is the intersection-over-union evaluation metric, SEGj represents a particular group of perturbed lane edge points, GTi represents a particular group of labeled ground truth data points, and P represents lane edge points expressed as a unique identification (ID) numbers.
In an aspect, the one or more controllers execute instructions to introduce noise to the simulated data to create noisy lanelet training samples.
In another aspect, the noise is modeled based on a variance profile of an error in a lane edge of the lane graph structure and a covariance.
In still another aspect, the error in the lane edge is modeled as a Gaussian Process, and wherein a kernel function models a spatial correlation of the error between two lane edge points.
In an aspect, a Matern 3/2 kernel function determines the variance profile.
In another aspect, the covariance is determined based on:
where Σ is the covariance, C is a correlation matrix, and S is a diagonal scale matrix.
In another aspect, the one or more controllers execute instructions to introduce occlusion features to the simulated data, wherein the occlusion features represent an occluded region of a roadway that the autonomous vehicle is traveling along.
In an aspect, a lanelet classification system for an autonomous vehicle is disclosed, and includes a plurality of sensors collecting perception data indicative of an environment surrounding the autonomous vehicle and one or more controllers in electronic communication with the plurality of sensors. The one or more controllers include a classifier having a neural network that classifies lanelets of a lane graph structure based on one or more lane attributes. The one or more controllers execute instructions to receive simulated data, wherein the simulated data is a combination of map data and simulated perception data. The one or more controllers combine the simulated data with manual annotations that label the simulated perception data together to create a ground truth data set. The one or more controllers introduce noise and occlusion features to the simulated data to create noisy lanelet training samples. The one or more controllers determine training data by mapping labels of one or more groups of labeled ground truth data points that are part of the ground truth data set to one or more groups of perturbed lane edge points the noisy lanelet training samples that have been displaced from an original group of labeled ground truth data points to another group of labeled ground truth data points that are part of the ground truth data set. The one or more controllers train the neural network to classify each lanelet of the lane graph structure based on the training data and the noisy lanelet training samples. The one or more controllers determine the neural network of the classifier is completely trained. Finally, in response to determining the neural network of the classifier is completely trained, the one or more controllers evaluate perception data generated by a plurality of sensors and the map data.
In another aspect, the one or more controllers execute instructions to identify an amount of overlap between a particular group of perturbed lane edge points and a particular group of labeled ground truth data points calculating an intersection-over-union evaluation metric.
Further areas of applicability will become apparent from the description provided herein. It should be understood that the description and specific examples are intended for purposes of illustration only and are not intended to limit the scope of the present disclosure.
The drawings described herein are for illustration purposes only and are not intended to limit the scope of the present disclosure in any way.
The following description is merely exemplary in nature and is not intended to limit the present disclosure, application, or uses.
Referring to
The lanelet classification system 10 includes one or more controllers 20 in electronic communication with a plurality of sensors 22 configured to collect perception data 24 indicative of an environment surrounding the autonomous vehicle 12. In the non-limiting embodiment as shown in
As explained below, the classifier 68 first builds a neural network 120 that classifies lanelets 92 that are part of a lane graph structure 90 (shown in
The annotation block 56 of the ground truth generation module 42 receives the simulated data 54 from the simulator 50 in combination with manual annotations 59 that label the simulated perception data 48, where the manual annotations 59 represent labels that are created by an individual. In other words, a person such as a subject matter expert creates the manual annotations 59. The annotation block 56 combines the simulated data 54 with the manual annotations 59 to create a labeled training data set, which is referred to as the ground truth data set 60. The ground truth data set 60 is transmitted to the ground truth mapper 58, which is shown in
Referring to
The identifier-based intersection-over-union block 84 receives the one or more groups of labeled ground truth data points GTi that have been extracted from the ground truth data set 60. The identifier-based intersection-over-union block 84 identifies an amount of overlap between a particular group of perturbed lane edge points SEGj and a particular group of labeled ground truth data points GTi when calculating an intersection-over-union evaluation metric. The intersection-over-union evaluation metric is expressed in Equation 1:
where P represents the lane edge point 94 that are expressed as the unique ID numbers and IOU is the intersection-over-union evaluation metric that describes an extent of overlap between a particular group of perturbed lane edge points SEGj and a particular group of labeled ground truth data points GTi.
The identifier-based intersection-over-union block 84 transmits the amount of overlap between the particular group of perturbed lane edge points SEGj and the particular group of labeled ground truth data points GTi for each lanelet 92 of the lane graph structure 90 (
Referring back to
In one embodiment, the noise 72 is introduced by a noise vector injected into each training iteration of simulated data 54 from the simulator 50. In one non-limiting embodiment, the noise 72 is implemented as jitter. Referring to
The noise 72 is modeled based on a variance profile var(d) of an error in the lane edge 96 of the lane graph structure 90 (
where ρ represents a constant, and a0 and a1 represents constants. The variance profile var(d) indicates the variance of the error in the lane edge 96 created by the noise 72 is expressed as a function of distance.
An error ei at the lane edge point coordinate pi is expressed in Equation 6 as:
where wi represents the zero mean Gaussian noise and is expressed in Equation 7 as:
The covariance Σ is determined based on a C correlation matrix and a diagonal scale matrix S, and is expressed in Equation 8 as:
where the correlation matrix C is expressed in Equation 9 and the diagonal scale matrix S is represented in Equation 10.
where pi, pj represent the ith and jth lane edge point coordinates and ∥pi∥ represents the distance to the point pi.
Referring to
As seen in
Referring to
Referring to both
The classifier 68 then computes an attention score eij(l) for each of the local lanelets 92B based on the higher dimension feature zi(l), where the attention score eij(l) indicates a weight value that the subject lanelet 92A has on a particular local lanelet 92B. The attention score eij(l) is computed based on a nonlinear activation function. In one embodiment, the nonlinear activation function is a Leaky Rectified Linear Unit function, and the attention score eij(l) is expressed in Equation 12 as:
where zi(l), zj(l) represent the higher dimension features for the subject lanelet 92A and the particular local lanelet, respectively, and {right arrow over (a)}(1)
The classifier 68 then determines a normalized shared attention mechanism aij(l) that is applicable to all of the local lanelets 92B that are part of a specific orientation group based on the attention score eij(l) by a normalization function. In one embodiment, the normalization function is a softmax function that turns a vector of k real values into a vector of k real values that sum to 1. The normalized shared attention mechanism aij(l) is determined based on Equation 13 as:
where N(i) represents a neighborhood of the particular local lanelet i of the specific orientation group.
The classifier 68 then computes a transformed feature vector h*i(l) of the subject lanelet 92A based on the higher dimension feature zi(l) corresponding to the subject lanelet 92A and the particular local lanelet 92B that are part of the specific orientation group and the normalized shared attention mechanism aij(l). The transformed feature vector h*i(l) is expressed in Equation 14 as:
where σ represents a nonlinear transform function.
The classifier 68 then fuses the transformed feature vector h*i(l) for the subject lanelet 92A and each local lanelet 92B together to determine a single fused feature vector hi(i+1), where the single fused feature vector hi(i+1) is input to build a subsequent layer of the neural network 120 (
where D(l) represents a vector or trainable parameter having the same length as the single fused feature vector hi(i+1).
Referring back to
In block 204, the annotation block 56 of the ground truth generation module 42 combines the simulated data 54 with the manual annotations 59 to create a labeled training data set, which is referred to as the ground truth data set 60 (
In block 206, the occlusion and noise block 64 introduces the noise 72 to the simulated data 54 received from the simulator 50. As mentioned above, the noise 72 is modeled to represent variation and error that are observed with real-world map and perception data. Additionally, or in the alternative, the occlusion and noise block 64 of the training module 44 determines the occlusion features 74 based on a mask 110 (
In block 208, the segmentation block 66 receives the simulated data 54 that has been injected with the noise 72 and occlusion features 74 and identifies the lanelets 92 of the lane graph structure 90 (
In block 210, the ground truth mapper 58 determines the training data 52 by mapping the one or more groups of labeled ground truth data points GTi to the one or more groups of perturbed lane edge points SEGj of the noisy lanelet training samples 62 that have been displaced from the original group of labeled ground truth data points GTi to another group of labeled ground truth data points GTi that are part of the ground truth data set 60. As mentioned above, the noisy lanelet training samples 62 are displaced by the noise 72 and the occlusion features 74 introduced into the simulated data 54 at the occlusion and noise block 64. The method 200 may then proceed to block 212.
In block 212, the classifier 68 trains the neural network 120 to classify each lanelet 92 of the lane graph structure 90 (
In decision block 214, in response to determining the neural network 120 of the classifier is completely trained, method 200 may proceed to block 216. Otherwise, the method 200 may proceed back to block 212.
In block 216, in response to determining the neural network 120 of the classifier 68 is trained, the classifier 68 evaluates the perception data 24 generated by the plurality of sensors 22 (
Referring generally to the figures, the disclosed lanelet classification system provides various technical effects and benefits. Specifically, the lanelet classification system is trained based on a limited number of annotated data samples with an unlimited number of simulation examples. This reduces the number of real-world data samples that are collected and require manual labeling, which may become cumbersome and time consuming for an individual to complete. The disclosed lanelet classification system also utilizes noise and occlusion features, which mimic real-world variations within the map and perception data. The disclosure also provides an approach for building a neural network that classifies the lanelets as well.
The controllers may refer to, or be part of an electronic circuit, a combinational logic circuit, a field programmable gate array (FPGA), a processor (shared, dedicated, or group) that executes code, or a combination of some or all of the above, such as in a system-on-chip. Additionally, the controllers may be microprocessor-based such as a computer having a at least one processor, memory (RAM and/or ROM), and associated input and output buses. The processor may operate under the control of an operating system that resides in memory. The operating system may manage computer resources so that computer program code embodied as one or more computer software applications, such as an application residing in memory, may have instructions executed by the processor. In an alternative embodiment, the processor may execute the application directly, in which case the operating system may be omitted.
The description of the present disclosure is merely exemplary in nature and variations that do not depart from the gist of the present disclosure are intended to be within the scope of the present disclosure. Such variations are not to be regarded as a departure from the spirit and scope of the present disclosure.