The present invention relates to the prediction of the movement of a vehicle on the basis of observations of the surroundings of this vehicle.
Vehicles that are at least partially automated in road traffic will not suddenly replace human-driven vehicles, nor will they be isolated from human-driven traffic on separate routes. Rather, these vehicles will have to move safely in mixed traffic with human-controlled third-party objects, wherein these third-party objects also comprise pedestrians or cyclists as weaker road users. With human-controlled foreign objects, there is always uncertainty as to which movement action these foreign objects will perform next. A control system for at least partially automated driving is therefore dependent on at least partially deducing the future behavior of foreign objects from the observation of previous behavior.
DE 10 2018 210 280 A1 describes a method for predicting the trajectories of foreign objects in the surroundings of an ego vehicle. This prediction is based on a determination of the near target to which the movement of each of the foreign objects leads and the basic rules according to which this movement takes place.
In the context of the invention, a method was developed for predicting the movement of at least one traffic-relevant object on the basis of observations of the surroundings of this object.
A traffic-relevant object is any moving object whose movement may make it necessary for at least one other road user to change their behavior. In particular, these can be objects whose trajectory could intersect a planned or currently tracked trajectory of the other road user, such as an ego vehicle. This can, for example, cause the other road user to change their own trajectory in order to avoid a collision with the object.
Traffic-relevant objects can be, for example, motor vehicles or non-motorized vehicles such as bicycles or soapboxes. Pedestrians and animals are also traffic-relevant objects.
However, an object does not necessarily have to be human, animal or automatically controlled in order to be relevant to traffic. For example, a dustbin blown by the wind can also be a traffic-relevant object.
As part of the process, an observation oτ0 of the object's surroundings at a time τ0 is mapped by a trained encoder network to a representation zτ0 with reduced dimensionality. This time τ0 can also represent a time interval and, for example, be a reference point of this time interval, such as the beginning, middle or end. This observation can be recorded, for example, with a sensor carried by the object, but also, for example, with another sensor in whose detection range the object is located. It is only important that a movement of the object affects the future observations.
Observations o can comprise, for example, images of the object's surroundings. These images can comprise, for example, camera images, video images, radar images, lidar images, thermal images and/or ultrasound images.
By means of an action aτ0 performed by the object at the time τ0 and the representation zτ0, using at least one trained prediction network
The prediction {circumflex over (z)}τ1 and/or âτ1 is used to determine a prediction {circumflex over (d)}τ1 for the dynamic state dτ1 of the object at the time τ1. In particular, this dynamic state dτ1 can comprise, for example, the position xτ1, of the object at time τ1, and/or the speed vτ1, of the object at time τ1, and/or the orientation θτ1 of the object at time τ1. Accordingly, the prediction {circumflex over (d)}τ1, can, in particular, comprise predictions {circumflex over (x)}τ1, {circumflex over (v)}τ1 and/or {circumflex over (θ)}τ1, for example. Alternatively or in combination with this, the dynamic state dτ1 can also comprise longitudinal and lateral velocity and acceleration or path curvature. All these variables can be relevant for downstream components and can therefore be derived from the current state dτ0 and the action aτ1 (or prediction aτ1) using a model. This information can be used, for example, to update a forecast for the trajectory of the object.
It was recognized that determining a prediction based on a representation zτ0 improves the accuracy of the prediction and also makes this prediction robust against disturbances in the observations, such as noise. For example, if the encoder network is trained in tandem with a decoder network so that an observation o processed by the encoder network into a representation z is reconstructed as well as possible after processing this representation z by the decoder network, the information from the observation o is squeezed through the “bottleneck” of the significantly lower dimensionality of the representation z. The encoder network is therefore forced to make a selection as to which information from the observation o is particularly important in relation to the respective application. For example, noise is not part of the information that is absolutely necessary for the reconstruction of the original observation o and is therefore suppressed in the representation.
Furthermore, the division of the overall task into the determination of the predictions {circumflex over (z)}τ1 and/or âτ1 on the one hand and into the further processing for the prediction {circumflex over (x)}τ1, {circumflex over (v)}τ1 or {circumflex over (θ)}τ1 for the position xτ1, the speed vτ1 or the orientation θτ1 of the object at the time τ1 on the other hand makes it possible to outsource the aspects of the kinematics of the object that can be explained by a model to the latter further processing. When training the encoder network and the prediction network(s), only those aspects of the kinematics that cannot already be explained by other means then play a role. In this way, the accuracy of the overall obtained predictions {circumflex over (x)}τ1, {circumflex over (v)}τ1 or {circumflex over (θ)}τ1 is improved, and the determination of these predictions is also made more robust.
For example, in a particularly advantageous embodiment, the prediction {circumflex over (d)}τ1 for the dynamic state dτ1 of the object, such as the prediction {circumflex over (d)}τ1, and/or the prediction {circumflex over (v)}τ1, and/or the prediction {circumflex over (θ)}τ1, can be determined from the prediction {circumflex over (z)}τ1 and/or âτ1 using a predetermined kinematic model of the object. In this way, any existing prior knowledge about the respective object can be used. For example, vehicles have certain minimum turning circles that limit rapid changes of direction.
The situation is somewhat analogous to the two ways in which a 3D model of a given object can be obtained for its production with a 3D printer. The first way is to obtain a design description of the object by measuring certain shapes on the object and to manufacture the object on the basis of this design description. This promises good manufacturing accuracy and requires comparatively little computing time. The second way is to photograph a geometry of the object, for which no construction description is available, and to reconstruct the geometry using photogrammetry. This requires a lot of computing time and promises less precise production than production according to a design description, but works universally with any object. As part of the production task, it is now advantageous to produce those parts of the object for which there is a design description using this design description and only supplement the missing parts with photogrammetry.
In a further advantageous embodiment, a first prediction network is used to determine the prediction {circumflex over (z)}τ1, and a second prediction network is used to determine the prediction {circumflex over (z)}τ1 from the prediction âτ1. The development from time τ0 to time τ1 then takes place entirely in the space of lower-dimensional representations z. Only then is the result translated into an action a. In this way, each of the prediction networks can specialize in its task, which further improves the accuracy of the prediction âτ1 ultimately obtained.
In the interest of further improving accuracy, the second prediction network can optionally also use the action aτ0 and/or the representation zτ0 to determine the prediction âτ1.
In a further advantageous embodiment
As explained above, in at least partially automated driving, the predictions {circumflex over (d)}τ1 for the dynamic state dτ1, {circumflex over (x)}τ1 for the position xτ1, {circumflex over (v)}τ1 for the speed vτ1 and/or {circumflex over (θ)}τ1 for the orientation θτ1 of the object at the time τ1 can be used in particular to check whether the trajectory of the object possibly intersects the trajectory of a vehicle to be guided.
Therefore, in a particularly advantageous embodiment, a control signal for a vehicle is determined from at least one prediction {circumflex over (d)}τ1, and/or {circumflex over (x)}τ1, and/or {circumflex over (v)}τ1, and/or {circumflex over (θ)}τ1. This vehicle is controlled with the control signal. In this way, in situations in which a collision between the vehicle and the object is imminent, this collision can be avoided with a higher probability, for example by braking the vehicle and/or diverting it onto an evasive course. At the same time, in situations where there is no objective threat of a collision, there is a lower probability of an evasive or braking maneuver being carried out. Such unprovoked maneuvers could greatly irritate the occupants of an automated vehicle, for example, and would also come as a complete surprise to a human driver of a following vehicle. This driver could therefore possibly react too late and rear-end the vehicle.
The invention also provides a method for training an arrangement comprising an encoder network and one or more prediction networks for use in the method described above.
As part of this process, an encoder network and one or more prediction networks are provided. Furthermore, a time series of observations o of the surroundings of the object whose movement is to be predicted is provided. The observations o are mapped to representations z using the trained encoder network.
Using at least one representation zτ0, which relates to an observation oτ0 of the surroundings of the object at a time τ0, predictions {circumflex over (z)}τ1 and âτ1 are determined using the previously described method according to one of claims 1 through 5. Parameters that characterize the behavior of the prediction network(s) are optimized to ensure that
In this way, the prediction network(s) can be “self-monitored”. This means that the training only has to draw on information that results from the observation of the object itself. It is not necessary to “label” these observations with target predictions.
Both optimization targets can, for example, each contribute a term to a cost function L (“loss function”) for the optimization:
However, the action aτ1 performed by the object at the time τ1 does not necessarily have to be derived from the observation of the object itself. If a target action aτ1* is available that the object should perform at the time τ1, this can be used instead of aτ1.
The aspect of self-monitoring can be further strengthened in another advantageous embodiment. In this configuration, representations zτ1 and zτ0 a reconstruction âτ0 of the action aτ0 performed by the object at an earlier time τ0 are also determined. The parameters that characterize the behavior of the prediction network(s) are then additionally optimized to ensure that the reconstruction âτ0 matches the action aτ0 as closely as possible. For example, the above loss function can be extended for this purpose:
In particular, the reconstruction âτ0 can be determined using a trainable reconstruction network, for example. The parameters that characterize this reconstruction network can also be optimized to ensure that the reconstruction âτ0 matches the action aτ0 as closely as possible. The reconstruction âτ0 is, so to speak, a “prediction” of the past on the basis of the observations accumulated up to the present o.
If the predictions {circumflex over (x)}τ1, {circumflex over (x)}τ1, {circumflex over (v)}τ1 or {circumflex over (θ)}τ1 associated target predictions (“ground truth”) dτ1*, xτ1*, vτ1* or θτ1* or are available, a deviation ∥{circumflex over (d)}τ1−dτ1*∥, ∥{circumflex over (x)}τ1−xτ1*∥, ∥{circumflex over (v)}τ1−vτ1*∥ or ∥{circumflex over (θ)}τ1−θτ1*∥ can also be included in the loss function L. This means that the parameters that characterize the behavior of the prediction network(s) are additionally optimized with the aim of minimizing the respective deviations from the target predictions xτ1*, vτ1* or θτ1*. The target predictions xτ1*, vτ1* or θτ1* can be obtained by measurement, for example. The deviation can be determined using the Huber distance, for example,
The encoder network can be obtained in a fully trained state. However, it can also be trained or re-trained as part of the training procedure. In particular, this training can be aimed at the same goals as the training of the prediction networks. For example, the encoder network can be trained together with the prediction networks. Thus, in a particularly advantageous embodiment, parameters that characterize the behavior of the encoder network are also optimized together with the parameters that characterize the behavior of the prediction network.
However, the encoder network can also be trained in an encoder-decoder arrangement with a decoder network, for example. For this purpose, in a further advantageous embodiment, an encoder network to be trained and a decoder network to be trained are provided, wherein the decoder network is designed to map representations z to observations o. Training observations o# are processed into representations z# with the encoder network.
Observations o## are reconstructed from the representations z# using the decoder network. Parameters that characterize the behavior of the encoder network and the decoder network are optimized to ensure that the reconstructed observations o# match the training observations o# as closely as possible.
The method can in particular be fully or partly computer-implemented. Therefore, the invention also relates to a computer program comprising machine-readable instructions which, when executed on one or more computers, cause the computer or computers to perform the described method of training the neural network. In this sense, control units for vehicles and embedded systems for technical devices that are likewise capable of executing machine-readable instructions are also to be regarded as computers.
The invention furthermore also relates to a machine-readable data carrier and/or to a download product comprising said computer program. A download product is a digital product that can be transmitted via a data network, i.e. can be downloaded by a user of the data network, and can be offered for sale in an online shop for immediate download, for example.
A computer can moreover be equipped with the computer program, with the machine-readable data carrier or with the download product.
Further measures improving the invention are shown in more detail below, together with the description of the preferred exemplary embodiments of the invention, with reference to the figures.
The figures show:
In step 110, an observation oτ0 of the surroundings 2 of the object 1 at a time τ0 is mapped from a trained encoder network 3 to a representation zτ0 with reduced dimensionality.
In step 120, based on an action aτ0 performed by the object 1 at the time τ0 and the representation zτ0 with at least one trained prediction network 4, 5
Here, the prediction {circumflex over (z)}τ1 can be determined from the prediction according to block 121 with a first prediction network 3 and the prediction âτ1 can be determined from the prediction {circumflex over (z)}τ1 according to block 122 with a second prediction network 4. According to block 122a, the second prediction network may additionally utilize the action aτ0 and/or the representation zτ0 to determine the prediction âτ1.
In step 130, the prediction {circumflex over (z)}τ1 and/or âτ1 is used to determine a prediction {circumflex over (d)}τ1 for the dynamic state dτ1 of the object 1 at the time τ1. In particular, this prediction {circumflex over (d)}τ1 can comprise, for example, a prediction {circumflex over (x)}τ1 for the position xτ1 of the object 1 at time τ1, and/or a prediction {circumflex over (v)}τ1 for the speed vτ1 of the object 1 at time τ1, and/or a prediction {circumflex over (θ)}1 for the orientation θτ1 of the object 1 at time τ1. According to block 131, these predictions can be determined from the prediction {circumflex over (z)}τ1 and/or âτ1 using a predetermined kinematic model of the object 1.
In step 140,
In step 150, a control signal 150a for a vehicle 50 is determined from at least one prediction {circumflex over (d)}τ1, {circumflex over (x)}τ1 and/or {circumflex over (v)}τ1, and/or {circumflex over (θ)}τ1. In step 160, this vehicle 50 is controlled with the control signal 150a.
In step 210, an encoder network 3 and one or more prediction networks 4, 5 to be trained are provided.
Within the box 210, an example of how the encoder network 3 can be procured in a trained or at least pre-trained state is shown in detail.
According to block 211, an encoder network 3 to be trained and a decoder network 7 to be trained can be provided. The decoder network 7 is designed to map representations z to observations o.
Training observations o# can be processed into representations z# according to block 212 with the encoder network 3. These representations z# can then be used to reconstruct observations o## according to block 213.
According to block 214, parameters 3a, 7a, which characterize the behavior of the encoder network 3 and the decoder network 7, can be optimized with the aim that the reconstructed observations o## match the training observations o# as well as possible.
In step 220, a time series of observations o of the surroundings 2 of the object 1 is provided.
In step 230, the observations o are mapped to representations z using the trained encoder network 3.
In step 240, predictions {circumflex over (z)}τ1 and âτ1 are determined based on at least one representation zτ0, which refers to an observation oτ0 of the surroundings 2 of the object 1 at a time τ0, using the method 100 described above.
In step 250, parameters 4a, 5a that characterize the behavior of the prediction network(s) 4, 5 are optimized with the aim that
The fully trained state of the parameters 4a, 5a is indicated by the reference signs 4a*, 5a*.
According to block 241, representations zτ1 and zτ0 a reconstruction âτ0 of the action aτ0 performed by the object 1 at the earlier time τ0 can be determined. The parameters 4a, 5a, which characterize the behaviour of the prediction network(s) 4, 5, can then be additionally optimized according to block 251 with the aim of ensuring that the reconstruction âτ0 matches the action aτ0 as closely as possible.
In particular, the reconstruction can be determined with a trainable reconstruction âτ0 network 6, for example according to block 241a. The parameters that characterize the behavior of the reconstruction network 6 can then also be optimized according to block 241b to ensure that the reconstruction âτ0 matches the action aτ0 as closely as possible.
According to block 252, furthermore, the parameters characterizing the behavior of the prediction network(s) 4, 5 may additionally be optimized with the aim that deviations ∥{circumflex over (d)}τ1−dτ1*∥, ∥{circumflex over (x)}τ1−xτ1*∥, ∥{circumflex over (v)}τ1−vτ1*∥ or ∥{circumflex over (θ)}τ1−θτ1*∥ from the predictions {circumflex over (d)}τ1, {circumflex over (x)}τ1, {circumflex over (v)}τ1 or {circumflex over (θ)}τ1 from associated target predictions dτ1*, xτ1*, vτ1*, or θτ1* are minimized. The target predictions can be obtained, for example, from measurements of the variables dτ1, xτ1, vτ1 or θτ1.
According to block 253, together with the parameters 4a, 5a, which characterize the behavior of the prediction networks 4, 5, parameters 3a, which characterize the behavior of the encoder network 3, can also be optimized.
Number | Date | Country | Kind |
---|---|---|---|
10 2021 206 014.5 | Jun 2021 | DE | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2022/064936 | 6/1/2022 | WO |