The present application claims the benefit under 35 U.S.C. § 119 of EP Patent Application No. EP 19174326.9 filed on May 14, 2019, which is expressly incorporated herein by reference in its entirety.
Exemplary embodiments relate to a method of controlling a vehicle or robot, wherein the method comprises the following steps: determining a first control sequence for controlling the vehicle or robot, determining a second control sequence for controlling the vehicle or robot depending on the first control sequence, a current state of the vehicle or robot, and a model characterizing a dynamic behavior of the vehicle or robot, controlling the vehicle or robot depending on the second control sequence.
Further exemplary embodiments relate to an apparatus for controlling a vehicle or robot.
Further exemplary embodiments relate to a method, particularly a computer-implemented method, of training a conditional variational autoencoder, CVAE.
A method of controlling a vehicle is described in Williams, G., Wagener, N., Goldfain, B., Drews, P., Rehg, J. M., Boots, B., & Theodorou, E. A. (2017, May), “Information theoretic mpc for model-based reinforcement learning”, in International Conference on Robotics and Automation (ICRA), “[reference 1]”, which is incorporated by reference in its entirety herein. More specifically, section IV. B. (“MPC Algorithm”) and Algortihm 2: MPPI of [reference 1] describe steps of determining a first control sequence, determining a second control sequence depending on the first control sequence, a current state of the vehicle, and a model characterizing a dynamic behavior of a vehicle, controlling the vehicle depending on the second control sequence. As an example, the controlling step is represented by the function “SendToActuators (u0)” of Algorithm 2 of [reference 1], the determination of the second control sequence is represented by the preceding for-loop updating control sequence vector ut, and the determination of the first control sequence (for a subsequent control cycle) is—at least to some extent, i.e. as far as the last element of control sequence vector ut is concerned—represented by the “Initialize( )” function of the last line of Algorithm 2 of [reference 1].
Exemplary preferred embodiments of the present invention include a method of controlling a vehicle or robot, wherein the method includes the following steps: determining a first control sequence, determining a second control sequence for controlling the vehicle or robot depending on the first control sequence, a current state of the vehicle or robot, and on a model characterizing a dynamic behavior of the vehicle or robot, controlling the vehicle or robot depending on the second control sequence, wherein the determining of the first control sequence comprises: providing a first candidate control sequence, determining a first accumulated trajectory cost associated with the first candidate control sequence, providing a second candidate control sequence, determining a second accumulated trajectory cost associated with the second candidate control sequence, comparing the first accumulated trajectory cost with the second accumulated trajectory cost, and, depending on the comparison, using the first candidate control sequence as the first control sequence or using a weighted sum of the first candidate control sequence and the second candidate control sequence as the first control sequence. This enables to flexibly determine the second control sequence for vehicle or robot control, whereby an additional degree of freedom is provided by the second candidate control sequence, as compared to the conventional approach of [reference 1].
Preferably, the example method in accordance with the present invention is a computer-implemented method.
While the description herein primarily refer to exemplary embodiments of vehicle control, the example embodiments of the present invention may also be applied to control of robots and/or components of robots, particularly movable components of, e.g., stationary robots such as robotic arms and/or grapplers of robotized automation systems, and the like.
According to further preferred embodiments of the present invention, the first and/or second accumulated trajectory cost may be determined depending on at least one of the following elements: a) state based costs associated with a current state of the vehicle, b) a control effort associated with a respective control sequence. This way, costs related to the first and/or second candidate control sequences may be determined enabling to assess which candidate control sequence may be preferable for determining the second control sequence for control of the vehicle.
According to further preferred embodiments of the present invention, instead of the weighted sum, other ways of combining the first candidate control sequence and the second candidate control sequence are also usable for determining the second control sequence.
Further preferred embodiments of the present invention uses aspects of a model predictive control (MPC) technique, due to using the model characterizing the dynamic behavior of the vehicle. According to further preferred embodiments, aspects of the model as described in [reference 1], equation (1) (cf. section III., A. of [reference 1]), and section IV. may be used.
According to further preferred embodiments of the present invention, the weighted sum is determined according to the equation
wherein u*,1 represents the first candidate control sequence, wherein û represents the second candidate control sequence, wherein S* represents the first accumulated trajectory cost, wherein Ŝ represents the second accumulated trajectory cost, and wherein u* represents the weighted sum.
According to further preferred embodiments of the present invention, one or more control cycles are used for controlling the vehicle, wherein at least one of the control cycles, preferably all control cycles, include the steps of determining the first control sequence, determining the second control sequence, and controlling the vehicle depending on the second control sequence, wherein the step of providing the first candidate control sequence includes using an initial control sequence as the first candidate control sequence or determining the first candidate control sequence based on the second control sequence of a preceding control cycle.
According to further preferred embodiments of the present invention, the step of providing the second candidate control sequence includes using a, preferably trained, first (preferably artificial) neural network, that is configured to receive first input parameters and to output the second candidate control sequence depending on the first input parameters.
According to further preferred embodiments of the present invention, the first neural network is a decoder of a conditional variational autoencoder, CVAE, wherein the CVAE further includes an encoder including a second neural network, wherein the encoder is configured to receive second input parameters, the second input parameters characterizing potential trajectories of the vehicle (e.g., obtained by simulation during a training process) and/or conditions (e.g., presence of obstacles, a predetermined global path) for the vehicle, and to map the second input parameters to a normal distribution q(z|X, C) with a mean μ and a variance Σ in a latent space z, wherein X represents the potential trajectories of the vehicle, and wherein C represents the conditions for the vehicle.
According to further preferred embodiments of the present invention, the first neural network and/or the second neural network includes a) four layers, preferably four fully connected layers, and/or b) rectified linear units, ReLUs, for implementing an activation function.
According to further preferred embodiments of the present invention, the method further includes: training the CVAE by applying at least one of: a) a domain-specific loss function floss(X, C) depending on the potential trajectories X of the vehicle and/or the conditions C for the vehicle, b) a Kullback-Leibler (KL)-divergence in the latent space z, particularly according to =KL[q(|X, C)∥p(|C)]+floss(X, C), wherein is a resulting loss function, wherein KL[q(|X, C)∥p(|C)] is the Kullback-Leibler divergence in the latent space z, wherein q(z|X, C) is the normal distribution, and wherein p(z|C) characterizes a desired latent space distribution.
According to further preferred embodiments of the present invention, the training is performed at a first point in time, wherein the steps of determining the first control sequence, determining the second control sequence and controlling the vehicle depending on the second control sequence are performed at a second point in time after the first point in time.
According to further preferred embodiments of the present invention, it is also possible to (further) train the CVAE during a control process of the vehicle, e.g., at the the second point in time.
According to further preferred embodiments of the present invention, the training is performed by a first entity, and the steps of determining the first control sequence, determining the second control sequence, and controlling the vehicle depending on the second control sequence are performed by the first entity and/or a second entity. As an example, according to further preferred embodiments of the present invention, an apparatus for performing the method according to the embodiments may both perform the training of the CVAE and the control of the vehicle. As a further example, according to further preferred embodiments of the present invention, a further device may (“only”) perform the training of the CVAE, and the control of the vehicle may be performed by the apparatus according to the embodiments, based on the previously trained CVAE.
Further preferred embodiments of the present invention relate to an apparatus for controlling a vehicle, wherein the apparatus is configured to perform the method according to the embodiments of the present invention, wherein preferably the apparatus includes at least one of the following elements: a) a calculating unit, b) a memory unit associated with the at least one calculating unit for at least temporarily storing a computer program and/or data (e.g., data of the neural network(s) such as, e.g., weights of a the trained CVAE), wherein the computer program is preferably configured to at least temporarily control an operation of the apparatus, c) a control output interface for providing control output to the vehicle, d) an input interface configured to receive at least one of the following elements: d1) sensor data, preferably characterizing a position of the vehicle and/or an orientation of the vehicle, d2) position information, which may, e.g., be provided by a further device, d3) map information.
Further preferred embodiments of the present invention relate to a vehicle including an apparatus according to the embodiments, wherein preferably the vehicle is a land vehicle, particularly at least one of: a car, an autonomously driving car, a robot, an intralogistics robot, a cleaning robot, particularly home cleaning robot, a robotic lawn mower.
Further preferred embodiments of the present invention relate to a computer program including instructions, which, when the program is executed by a computer, cause the computer to carry out the method according to according to the embodiments.
Further preferred embodiments of the present invention relate to a computer-readable storage medium comprising instructions which, when executed by a computer, cause the computer to carry out the method according to the embodiments.
Further preferred embodiments of the present invention relate to a data carrier signal carrying the computer program according to the embodiments.
Further preferred embodiments of the present invention relate to a use of the method according to the embodiments and/or of the apparatus according to the embodiments and/or of the vehicle according to the embodiments and/or of the computer program according to the embodiments for a) optimizing a trajectory for a vehicle and/or b) obstacle avoidance.
Further preferred embodiments of the present invention relate to a method, particularly a computer-implemented method, of training a conditional variational autoencoder, CVAE, wherein the CVAE comprises a first neural network as a decoder and a second neural network as an encoder, wherein the decoder is configurable to receive first input parameters and to output a candidate control sequence for a method of controlling a vehicle depending on the first input parameters, particularly a method according to the embodiments, wherein the encoder is configurable to receive second input parameters, the second input parameters characterizing potential trajectories of the vehicle and/or conditions for the vehicle, and to map the second input parameters to a normal distribution q(z|X, C) with a mean μ and a variance Σ in a latent space z, wherein X represents the potential trajectories of the vehicle, and wherein C represents the conditions for the vehicle, wherein the training comprises using at least one of: a) a domain-specific loss function floss(X, C) depending on the potential trajectories X of the vehicle and/or the conditions C for the vehicle, b) a Kullback-Leibler divergence in the latent space z, particularly according to =KL[q(|X, C)∥p(|C)]+floss(X, C), wherein is a resulting loss function, wherein KL[q(|X, C)∥p(|C)] is the Kullback-Leibler divergence in the latent space z, wherein q(z|X, C) is the normal distribution, and wherein p(z|C) characterizes a desired latent space distribution.
According to further preferred embodiments of the present invention, the apparatus according to the embodiments may perform the training of the CVAE.
Some exemplary embodiments are described below with reference to the figures.
Block B3 of
According to further preferred embodiments, for determining the position and/or orientation of the vehicle 100, 100′, the SLAM block B3 may receive sensor data sd from the vehicle 100′, the sensor data, e.g., comprising information from at least one of the following elements: position sensor(s), wheelspeed sensors, and the like.
According to further preferred embodiments, based on the initial control sequence ics and the position information pi, the block B1 and/or apparatus 200 may perform motion control for, particularly optimize the trajectory of, the vehicle 100 (
In other words, according to further preferred embodiments, apparatus 200 may, e.g., use a motion control (or trajectory optimization) algorithm configured to generate (robot) motion for control of the vehicle 100, 100′, advantageously considering the reference path P1 as, e.g., computed by the global path planner B2, the estimated position pi of the vehicle 100′ and, e.g., a map of the environment E (
In the following, further preferred embodiments of the present invention are explained, wherein
The apparatus 200, cf.
According to further preferred embodiments of the present invention, the at least one calculating unit 202 (
According to further preferred embodiments of the present invention, the at least one calculating unit 202 may comprise at least one of the following elements: a microprocessor, a microcontroller, a digital signal processor (DSP), a programmable logic element (e.g., FPGA, field programmable gate array), an ASIC (application specific integrated circuit), hardware circuitry, a tensor processor. According to further preferred embodiments of the present invention, any combination of two or more of these elements is also possible.
According to further preferred embodiments of the present invention, the memory unit 204 comprises at least one of the following elements: a volatile memory 204a, particularly a random-access memory (RAM), a non-volatile memory 204b, particularly a Flash-EEPROM. Preferably, the computer program PRG is at least temporarily stored in the non-volatile memory 204b. Data DAT, which may, e.g., be used for executing the method according to the embodiments, may at least temporarily be stored in the RAM 204a.
According to further preferred embodiments of the present invention, an optional computer-readable storage medium SM comprising instructions, e.g., in the form of a further computer program PRG′, may be provided, wherein the further computer program PRG′, when executed by a computer, i.e., by the calculating unit 202, may cause the computer 202 to carry out the method according to the embodiments. As an example, the storage medium SM may comprise or represent a digital storage medium such as a semiconductor memory device (e.g., solid state drive, SSD) and/or a magnetic storage medium such as a disk or harddisk drive (HDD) and/or an optical storage medium such as a compact disc (CD) or DVD (digital versatile disc) or the like.
According to further preferred embodiments of the present invention, the apparatus 200 may comprise an optional data interface 205, preferably for bidirectional data exchange with an external device (not shown). As an example, by means of the data interface 205, a data carrier signal DCS may be received, e.g., from the external device, for example via a wired or a wireless data transmission medium, e.g., over a (virtual) private computer network and/or a public computer network such as, e.g., the Internet. According to further preferred embodiments, the data carrier signal DCS may represent or carry the computer program PRG according to the embodiments, or at least a part thereof.
According to further preferred embodiments of the present invention, the apparatus 200 may comprise a control output interface 206 for providing control output co, e.g., in the form of one or more output signals, to the vehicle 100 (
As an example, according to further preferred embodiments, the control output co, mathematically represented by the vector ut or by at least ut=0=u0, may be output, e.g., at an end of a control cycle, to one or more corresponding actuators (not shown) of the vehicle 100 (
According to further preferred embodiments of the present invention, the apparatus 200 may comprise an input interface 207 configured to receive at least one of the following elements: d1) sensor data sd (e.g., as provided by the vehicle 100′, cf.
According to further preferred embodiments, the input interface 207 may also be configured to receive the initial control sequence ics, e.g., from the global planner block B2 (
According to further preferred embodiments, the apparatus 200 as exemplarily explained above with reference to
Further preferred embodiments relate to a method of controlling the vehicle 100, 100′, wherein the method comprises the following steps, cf. the flow chart of
As an example, the current state cst of the vehicle 100, 100′ may, e.g., be represented by an n-dimensional state vector xt∈n, and the model M characterizing a dynamic behavior of the vehicle 100, 100′ may be represented by a, preferably fully-connected, multi-layer, neural network, which may, e.g., be trained as disclosed in section IV. MPC WITH NEURAL NETWORK DYNAMICS, A. Learning Neural Network Models, of [reference 1].
According to further preferred embodiments, the determining 300 (
According to further preferred embodiments, the step 305 of comparing may comprise determining whether the second accumulated trajectory cost atc2 is smaller than the first accumulated trajectory cost atc1. If this is the case, step 307 may be executed determining the first control sequence cs1 as the weighted sum ws. This way, a positive influence of the second candidate control sequence ccs2 on trajectory cost may be considered, which may not be given by using 306 the first candidate control sequence ccs1 as the first control sequence cs1. Note that step 306 basically corresponds to the fact that the conventional algorithm 2 (“MPPI”) of [reference 1] initializes (last line of algorithm 2 of [reference 1]) the control sequence uT-1, particularly unconditionally, after updating the control sequence (preceding for-loop of algorithm 2 of [reference 1]). By contrast, preferred embodiments conditionally take into consideration the second candidate control sequence ccs2, based on the comparison 305, thus enabling to overcome at least some constraints associated with the prior art technique according to [reference 1].
According to further preferred embodiments, the weighted sum ws is determined according to the equation
wherein u*,1 represents the first candidate control sequence ccs1, wherein û represents the second candidate control sequence ccs2, wherein S* represents the first accumulated trajectory cost atc1, wherein Ŝ represents the second accumulated trajectory cost atc2, and wherein u* represents the weighted sum ws.
According to further preferred embodiments, the first and/or second accumulated trajectory cost atc1, atc2 may be determined depending on at least one of the following elements: a) state based costs associated with a current state of the vehicle 100,100′, b) a control effort associated with a respective (candidate) control sequence ccs1, ccs2. This way, costs related to the first and/or second candidate control sequences may be determined enabling to assess which candidate control sequence may be preferable for determining the second control sequence cs2 for control of the vehicle 100, 100′.
According to further preferred embodiments, instead of the weighted sum ws, other ways of combining the first candidate control sequence ccs1 and the second candidate control sequence ccs2 are also usable for determining the second control sequence cs2. This may also be done in step 307 according to further preferred embodiments, i.e. alternatively to determining the weighted sum ws.
According to further preferred embodiments, one or more control cycles cc, cf.
According to further preferred embodiments, the step 303 (
According to further preferred embodiments, also cf. the simplified block diagram of
According to further preferred embodiments, the first neural network NN1 and/or the second neural network NN2 comprises a) four layers, preferably four fully connected layers, and/or b) rectified linear units, ReLUs, for implementing an activation function. As an example, a transfer function of a ReLU may be f(x)=max(0, x), wherein max( ) is the maximum function.
According to further preferred embodiments, the method further comprises, cf. the flow-chart of
=KL[q(|X, C)∥p(|C)]+floss(X, C), wherein is a resulting loss function, wherein KL[q(|X, C)∥p(|C)] is the Kullback-Leibler divergence in the latent space z, wherein q(z|X, C) is the normal distribution, and wherein p(z|C) characterizes a desired latent space distribution.
According to further preferred embodiments, the training 10 (
According to further preferred embodiments, the training 10 (
According to further preferred embodiments, the CVAE 400 may be configured to learn a distribution of the input data ip2, particularly also based on given conditions C such as, e.g., the global path P1 (
In other words, according to further embodiments, the CVAE 400 may be trained to imitate a distribution of observed data X(i)∈ conditioned on C∈ using an unobserved, latent representation Z∈, i.e. p(X|C)=∫zp(X|z, C) p(z|C)dz of the states.
Further preferred embodiments relate to a method of training a conditional variational autoencoder, CVAE, 400 wherein the CVAE 400 comprises the first neural network NN1 as a decoder 402 and the second neural network NN2 as an encoder 401, wherein the decoder 402 is configurable to receive first input parameters ip1 and to output a candidate control sequence ccs2 for a method of controlling a vehicle 100, 100′ depending on the first input parameters ip1, particularly a method according to the embodiments, wherein the encoder 401 is configurable to receive second input parameters ip2, the second input parameters ip2 characterizing potential trajectories (cf., e.g., block pt of
In the following paragraphs, further preferred embodiments and exemplary aspects of implementation related to a control of the vehicle 100 of
In the following paragraphs, algorithm 1 as presented above is explained on a per-line basis. Line 1 defines input parameters, wherein K represents a number of (trajectory) samples to be processed, wherein tf represents a planning horizon, wherein U represents a control set (also cf. the control input sequence explained above with reference to the control output co of
Line 2 starts a while-loop effecting a repetition of the execution of lines 3 to 23 until a predetermined exit condition is met, which can be considered as a repetition of several control cycles, similar to the control cycles cc of
Line 3 represents execution of a function “StateEstimation( )” which serves to determine, e.g., a current position and/or orientation of the vehicle 100 in the environment E (
Line 4 of table 1 invokes a further function, InformControls(u*,1, σ, φ0, v0, K, tf, U, Σ, ϕ, c, F), which represents an implementation example for the step of providing 300 (
Lines 5 to 14 of table 1 comprise a loop L5, 14 over K many trajectory samples, wherein the model F is evaluated, cf. line 10 of table 1, and associated costs for each of the trajectory samples are determined, cf. line 11 of table 1. According to further preferred embodiments, on this basis, the first control sequence cs1, represented by vector u* of line 4, may be updated in lines 16 to 18, which represent a further loop L16, 18.
More specifically, according to further preferred embodiments, loop L5, 14 comprises the following elements. In line 6, state vector x is updated, e.g., depending on information x0 as obtained by the state estimation, cf. line 3 of table 1. In the context of
In line 15, an importance sampling step is performed using the function ImportanceSamplingWeights(Sk, λ), which yields a weight vector wk based on the accumulated trajectory cost Sk and the parameter λ, which may be a hyper-parameter of algorithm 1 (similar to, e.g., Σ, ϕ) according to further preferred embodiments. Further details regarding an exemplary implementation of the function ImportanceSamplingWeights( ) are provided further below with reference to table 3.
In a further loop L16, 18 comprising lines 16 to 18 of table 1, the first control sequence cs1, represented by vector u* (also cf. line 4 of table 1), is updated, wherein, according to further preferred embodiments, the updated control sequence u* obtained according to line 17 of table 1 may, e.g., correspond to the second control sequence cs2 according to
In line 19, the first element u0* of the updated control sequence u* (corresponding with the second control sequence cs2), is applied for controlling the vehicle 100 (
The further loop L20, 22 comprising lines 20 to 22 of table 1 may be used to “shift backward” the elements of the (updated) control sequence u* along the discrete time index t.
After this, in line 23, the value ut-1* of the control sequence may be initialized (as this may be undefined due to the backward shifting of lines 20 to 22 of table 1. The updated control sequence obtained at the end of the loop, cf. line 24, may be used within a subsequent control cycle according to the loop of lines 2, 24 of table 1, which may, according to further preferred embodiments, e.g., correspond with a control cycle cc as depicted by
In the following paragraphs, an implementation example for the step of providing 301 (
The input to the function of algorithm 2, cf. line 1 of table 2, has already been explained above with reference to line 4 of table 1. In line 2 of table 2, the function CVAEDecoder( ) is executed, which uses the (trained) decoder 402 of the CVAE 400, e.g., according to
The loop in lines 3 to 8 and the lines 9, 10 of table 2 determines accumulated trajectory costs Ŝ, S* for both the first candidate control sequence ccs1 (represented by expression u*,1 of table 2) and the second candidate control sequence ccs2 (represented by expression û of table 2), and line 11 comprises a comparison as an exemplary implementation of step 305 of
In the following paragraphs, an implementation example for the function ImportanceSamplingWeights( ) as explained above with respect to line 15 of table 1 is disclosed with reference to an algorithm “[algorithm 3]” as presented in table 3 below.
Further details related to the exemplary importance sampling procedure according to further preferred embodiments as illustrated above by table 3 may be taken from [reference 1], section III. “C. Importance Sampling”.
It is emphasized that the above described algorithms of tables 1, 2, 3 are examples for an implementation of aspects of the method according to further preferred embodiments, which are not limiting.
Further preferred embodiments relate to a vehicle 100, 100′ comprising an apparatus 200 (
Further preferred embodiments relate to a use of the method according to the embodiments and/or of the apparatus 200 according to the embodiments and/or of the vehicle 100, 100′ according to the embodiments and/or of the computer program PRG according to the embodiments for a) optimizing a trajectory for a vehicle 100, 100′ and/or b) obstacle avoidance. As an example, using the method according to preferred embodiments may yield an optimized path P2 (
At least some preferred embodiments explained above enable to improve the conventional Information Theoretic Model Predictive Control Technique (IT-MPC) presented in [reference 1]. The IT-MPC of [reference 1] can be interpreted as a method for (locally) generating robot motion by considering a stochastic nonlinear system in dynamic environments. The generated trajectories P3 minimize a defined cost function (i.e. closeness to a reference path, path clearance).
Finding optimal control problems solutions for stochastic nonlinear systems in dynamic environments remains a challenging task. Recently, sampling-based Model Predictive Control (MPC) has proved to be a useful tool for solving stochastic problems in complex domains with highly nonlinear dynamic systems. These conventional MPC methods sample on a prior distribution to generate trajectories, strongly conditioning the solution of the problem to this prior, influencing the performance and efficiency of a controller implementing such conventional MPC method. According to further aspects, for multi-modal and/or highly dynamic settings, sampling around the predicted controls may not perform well, since it is constraining the distribution to a specific state space cost area.
Various preferred embodiments as explained above with respect to
By applying the learned distributions in an informing fashion, e.g., in the form of step 303 of
Preferred embodiments of the present invention, which, e.g., apply environmentally and/or task aware learned distributions (e.g., in the form of the trained decoder 402 of the CVAE 400) enable an increase in the performance of motion control in terms of path quality and planning efficiency, particularly when compared to conventional techniques using conventional trajectory sampling schemes.
According to Applicant's analysis, which is based on tests of the method according to preferred embodiments in simulated environments, the methods according to preferred embodiments generate better behaviors regarding motion control of the vehicle 100, 100′ for different tasks, i.e. path tracking and obstacle avoidance. For path tracking, the approach based on the method according to preferred embodiments has been compared to the conventional IT-MPC as disclosed by [reference 1], where it has been found out that the approach based on the method according to preferred embodiments generates lower cost solutions while being more successful and also faster to accomplish a designed task. In terms of obstacle avoidance, the approach based on the method according to preferred embodiments has been compared to IT-MPC according to [reference 1] and to a conventional technique based on “Dynamic Windows”, cf., e.g., [4] Dieter Fox, Wolfram Burgard, and Sebastian Thrun. The dynamic window approach to collision avoidance. IEEE Robotics & Automation Magazine, 4(1):23-33, 1997. Also in this case the approach based on the method according to preferred embodiments generates better solutions in terms of cost, time to finish the task, and number of successful operations.
According to further preferred embodiments, the method explained above with respect to, e.g.,
According to further preferred embodiments, the principle according to the embodiments may be used to extend the conventional IT-MPC as disclosed by [reference 1], for example by informing the controller 200 (FIG. 3) with samples u* (cf. line 4 of table 1) lying on a lower cost state space area. Advantageously, such samples may be generated by the CVAE 400 (
According to further preferred embodiments, the CVAE 400 may learn a sampling distribution, e.g., from an offline generated dataset (e.g., generated by simulation), for example by using an ad hoc task-based loss function, e.g.,
=KL[q(|X, C)∥p(|C)]+floss(X, C) as explained above.
According to further preferred embodiments, particularly for achieving even better learning of the input distribution, the CVAE parameters may be optimized based on a task-specific loss function floss.
According to further preferred embodiments, the function InformControls( ), cf. table 2 above, may be used to generate a new mean from which the algorithm of, e.g., table 1 may, preferably randomly, draw (cf. line 4 of table 1) new controls. As mentioned above, advantageously, the function InformControls( ) may use the CVAE 400 (
According to further preferred embodiments, an exemplary cost function c(x) (e.g., for use in line 11 of table 1) may be chosen depending on the following equation:
wherein the first summand, weighted by a first weight w1>0, represents a task of reaching a sub-goal P1h selected from the global path P1 (
According to further preferred embodiments, for the training 10 (
Number | Date | Country | Kind |
---|---|---|---|
19174326 | May 2019 | EP | regional |
Number | Name | Date | Kind |
---|---|---|---|
20180032039 | Huynh | Feb 2018 | A1 |
20180218262 | Okada | Aug 2018 | A1 |
20190310650 | Halder | Oct 2019 | A1 |
20200293901 | Wachi | Sep 2020 | A1 |
Number | Date | Country |
---|---|---|
2019010659 | Jan 2019 | WO |
Entry |
---|
Seong Hyeon Park et al., “Sequence-to-Sequence Prediction ofVehicle Trajectory via LSTM Encoder-Decoder Architecture”, Oct. 22, 2018, pp. 1-7; https://doi.org/10.48550/arXiv.1802.06338 (Year: 2018). |
Williams, G., et al., “Information Theoretic MPC for Model-Based Reinforcement Learning”, in International Conference on Robotics and Automation (ICRA), 2017, pp. 1-8. |
Dieter Fox, et al., “The Dynamic Window Approach To Collision Avoidance”, IEEE Robotics & Automation Magazine, 1997, pp. 23-33. |
Diederik P. KINGMA, et al., “Adam: A Method for Stochastic Optimization”, 2015, pp. 1-15. https://arxiv.org/abs/1412.6980. |
Drews Paul et al., “Agressive Deep Driving Model Predictive Control With a CNN Cost Model”, Cornell University, 2017, pp. 1-11. |
Williams Grady et al., “Autonomous Racing With Autorally Vehicles and Differential Games”, Cornell University, 2017, pp. 1-8. |
Number | Date | Country | |
---|---|---|---|
20200363810 A1 | Nov 2020 | US |