REINFORCEMENT LEARNING ALGORITHM-BASED PREDICTIVE CONTROL METHOD FOR LATERAL AND LONGITUDINAL COUPLED VEHICLE FORMATION

Description

CROSS REFERENCE TO THE RELATED APPLICATIONS

This application is based upon and claims priority to Chinese Patent Application No. 202211087600.7, filed on Sep. 7, 2022, the entire contents of which are incorporated herein by reference.

TECHNICAL FIELD

The present application belongs to the field of automobile control technology, specifically to a reinforcement learning algorithm-based predictive control method for lateral and longitudinal coupled vehicle formation.

BACKGROUND

In the past few decades, the exponential growth in the number of vehicles has brought great challenges to energy security and traffic safety. According to the National Highway Traffic Safety Administration, about 84% of traffic accidents are attributed to human errors. Vehicle formation can significantly reduce traffic accidents caused by driver fatigue and mishandling, thereby improving road safety. In addition, vehicle formation (especially heavy truck formation) can reduce air drag between vehicles, resulting in lower emissions, fuel consumption, and increased road capacity. These potential benefits have led to an increased interest among scholars in vehicle formation control.

The research on vehicle formation mainly includes longitudinal control and lateral control, which involve:

The goal of longitudinal control is to align the desired speed of the vehicle and to maintain the desired spacing between neighboring vehicles.

The task of lateral control is to steer the vehicle within the designated lane.

The existing research on vehicle formation control mostly adopts the decoupling control method, where separate lateral and longitudinal controllers are designed to realize lateral lane keeping and longitudinal speed tracking respectively. Specifically, the lateral control methods mainly include PID control, fuzzy control, H¥ robust control, etc., and the longitudinal control methods mainly include sliding mode control, adaptive control, model predictive control, etc.

In summary, although the decoupling strategy has some effectiveness in vehicle formation control, the vehicle system is a non-linear, multi-variable, and strongly coupled system, in the conditions of higher acceleration, greater lateral and longitudinal forces, or lower road adhesion coefficients, the lateral and longitudinal coupling effects become particularly significant. During such conditions, the tracking performance of the decoupling control may decrease, leading to a significant reduction in control accuracy, which may lead to vehicle collisions and failure to achieve formation.

In addition, most of the current research on vehicle formation is based on the assumption of linear tires, which are only applicable within a limited operating range of the vehicle. When the vehicle moves out of the area (such as obstacle avoidance), the nonlinear and lateral and longitudinal coupled effects of the vehicle system become more pronounced, resulting in a mismatch between the decoupling control and emergency control of the vehicle.

SUMMARY

In light of these challenges raised in the above background technology, the purpose of the present application is to provide a reinforcement learning algorithm-based predictive control method for lateral and longitudinal coupled vehicle formation.

In order to achieve the aforementioned objectives, the present application provides the following technical solution:

A reinforcement learning algorithm-based predictive control method for lateral and longitudinal coupled vehicle formation, including:

- S1, combining a 3-degree of freedom (DOF) vehicle dynamics model that takes into account a nonlinear magic formula tire model with a lane keeping model, in order to establish a vehicle formation model

x
_i(k+1)=f(x_i(k),u_i(k));

in the formula, x_i(k) is a state quantity and u_i(k) is an input quantity;

- S2, constructing a distributed control framework and designing a local predictive controller for each following vehicle based on the vehicle formation model under the control framework

J
_i(x_i(k),U_i(k))=Σ_l=0^T^p^-1∥x_i(k+i)−r_i(k+i)∥_Q_i²+∥x_i(k+i)−{circumflex over (x)}_i(k+i)∥_F_i²+∥x_i(k+i)−{circumflex over (x)}_i-1(k+i)∥_G_i²+∥u_i(k+i)∥_R_i²

in the formula, k is a current moment, k+i is a first moment in the prediction time domain, x_i(•) is a prediction state, r_i(•) is an ideal state, {circumflex over (x)}_i-1(•) and {circumflex over (x)}_i(•) represent assumed trajectory states of the vehicle, {circumflex over (x)}_i-(•) is obtained through inter-vehicle communication, T_pis a prediction time domain, Q_i, F_i, G_i, R_iare the weight matrices;

- S3, using a reinforcement learning algorithm to solve the optimal control strategy of the local predictive controller and applying the optimal control strategy to the target following vehicle.

Preferably, the lateral force of the tire in the nonlinear magic formula tire model is calculated by the following magic formula:

F
_i
^y
=D sin(C arctan(Bα−E(Bα−arctan Bα)))

- in the formula, α is a cornering angle of the tire, and B, C, D, E are simulation parameters.

Preferably, the 3-DOF vehicle dynamics model is expressed as:

$? (m_{i} {\dot{v}}_{i}^{x} - m_{i} v_{i}^{y} {\dot{φ}}_{i}) = F_{i}^{x}$

$? (m_{i} {\dot{v}}_{i}^{y} - m_{i} v_{i}^{x} {\dot{φ}}_{i}) = F_{i}^{yf} + F_{i}^{yr};$

$? I_{i}^{z} {\ddot{φ}}_{i} = a_{i} F_{i}^{yf} \cos δ_{i} - b_{i} F_{i}^{y r}$

$? indicates text missing or illegible when filed$

each parameter in the formula is the parameter of i^thvehicle, and

- v_i^x,v_i^y,{dot over (φ)}_iare longitudinal speed, lateral speed, and yaw rate, F_i^xis a longitudinal force, F_i^yfand F_i^yrare the front and rear wheel lateral forces, m_iis a vehicle mass, I_i^zis a moment of inertia of the vehicle around the z axis, δ_iis the front wheel angle, a_iand b_iare the distances from the center of mass to the front and rear axles, respectively.

Preferably, the lane keeping model is expressed as:

$? {\dot{e}}_{i}^{p} = v_{i}^{x} - v_{0}^{x}$

$? {\dot{e}}_{i}^{y} = v_{i}^{x} e_{i}^{φ} - v_{i}^{y} - L {\dot{φ}}_{i}$

$? {\dot{e}}_{i}^{φ} = {\dot{φ}}_{i, des} - {\dot{φ}}_{i}$

$? indicates text missing or illegible when filed$

- in the formula, {dot over (φ)}_i,desis expected heading angular speed, L is a preview distance, e_i^pis a longitudinal spacing error, e_i^yis a lateral position error between the vehicle and the lane line, and e_i^φis a heading angle error between the vehicle heading angle and the road tangent.
- combining the 3-DOF vehicle dynamics model with the lane keeping model to derive the vehicle formation model:

$? {\dot{v}}_{i}^{x} = v_{i}^{y} {\dot{φ}}_{i} + \frac{1}{m_{i}} F_{i}^{x}$

$? {\dot{v}}_{i}^{y} = - v_{i}^{x} {\dot{φ}}_{i} + \frac{F_{i}^{y f} + F_{i}^{y r}}{m_{i}}$

$? {\ddot{φ}}_{i} = \frac{1}{I_{i}^{z}} (a_{i} F_{i}^{y f} - b_{i} F_{i}^{y r});$

$? {\dot{e}}_{i}^{p} = v_{i}^{x} - v_{0}^{x}$

$? {\dot{e}}_{i}^{y} = v_{i}^{x} e_{i}^{φ} - v_{i}^{y} - L {\dot{φ}}_{i}$

$? {\dot{e}}_{i}^{φ} = {\dot{φ}}_{i, des} - {\dot{φ}}_{i}$

$? indicates text missing or illegible when filed$

By discretizing the model above and using the state variable x_i=[v_i^xv_i^y{dot over (φ)}_ie_i^pe_i^ye_i^φ]^Tand the control variable u_i=[F_i^xδ_i]^T, x_i(k+1)=f(x_i(k),u_i(k)) can be obtained.

Preferably, when utilizing the reinforcement learning algorithm to solve the optimal control strategy of the local predictive controller:

- constructing and training an actor strategy function neural network to optimize the strategy parameters; specifically, when optimizing the strategy parameters, the actor strategy functional neural network uses a network composed of T_pradial basis functions to approximate the T_p-step optimal strategy, and the actor strategy functional neural network takes the state s as the input and the action a as the output;
- constructing and training a critic value function neural network to evaluate the pros and cons of the current control strategy optimized by the actor strategy function neural network, specifically, when assessing the pros and cons of the current control strategy, the critic value function neural network is evaluated by a network consisting of T_pradial basis functions, and the critic value function neural network takes state s and action a as input and state-action value q(s,a) as output;
- obtaining the optimal control strategy involves the alternating convergence of the actor strategy function neural network and the critic value function neural network.

Preferably, basis vectors ϕ(x) and ψ(x) in the actor strategy function neural network and the critic value function neural network are both radial basis functions, and

$\begin{matrix} ϕ (x) = ψ (x) \\ = {(\exp^{- { x - x_{1} }^{2} / κ^{2}}, \exp^{- { x - x_{2} }^{2} / κ^{2}}, \dots e^{- { x - x_{M} }^{2} / κ^{2}})}^{T}; \end{matrix}$

- in the formula, κ is set to 1, {x_i, i=1, 2, . . . M} as the center of the radial basis function, and M is a number of hidden layers.

Preferably, the center of radial basis function is obtained by normalization:

$x_{normalize} = \frac{x_{collect} - x_{\min}}{x_{\max} - x_{\min}};$

- in the formula, x_normalizeis normalized data, x_collectis collected data, x_maxand x_minare the maximum and minimum values in the collected data respectively;
- the collected data involves a randomly given within the control input range, and the input data and output data of the vehicle formation model are collected by simulation.

Specifically, during the alternating convergence process to obtain the optimal control strategy:

- initializing an actor strategy function neural network weight θ and a critic value function neural network weight ω;
- obtaining action a by the actor strategy function neural network according to the current state s of the target following vehicle, and acting action a on the target following vehicle to obtain new state s^cand instant reward r;
- obtaining new action a^cby the actor strategy function neural network according to the new state s^c;
- evaluating and scoring action a and action a^cby the critic value function neural network to obtain q(s,a) and q(sⁱⁱ,a) and then calculating an error TDerror: TDerror=y_i−q(s,a), y_i=r+γq(sⁱⁱ,a) between a predicted value q(s,a) and an expected value y_iof the critic value function neural network according to the Bellman equation;
- using a gradient descent method to minimize the iterative update of the weight 9 and the weight ω, and the optimal control strategy U_i*, L(θ)=q(s,a), L(ω)=½TDerror²;

Preferably, in the predictive time domain, the optimal control strategy U_i* is applied to the target following vehicle through the local predictive controller.

Compared with the existing technology, the present application has the following beneficial effects:

- (1) Compared with the traditional decoupling control method, the present application is more in line with the nonlinear, multi-variable, and strong coupling characteristics of the vehicle system, the lateral and longitudinal coupling modeling of the vehicle formation is completed, and the nonlinear characteristics of the tire are considered; In addition, under the framework of distributed model predictive control, a local predictive controller is designed for each following vehicle. This approach transforms the global optimization problem of vehicle formation into a local optimization problem for each following vehicle, which avoids the burden of centralized control.
- (2) The reinforcement learning actor-critic algorithm is used to optimize the policy, specifically, the alternating critic training (policy evaluation) and actor training (policy improvement) are used to train the actor network to approach or approximate the optimal control strategy, compared with the common policy-based learning methods, the present application can realize the single-step update of the policy, thereby reducing the number of iterative learning.
- (3) The present application uses the hidden layer space composed of radial basis functions to map high-dimensional input data thereby transforming the originally linear inseparable problem into a linearly separable one. This allows for the use of a linear optimization strategy to adjust the weights, resulting in a fast network learning speed and effectively improving the efficiency of solving optimization problems.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of the 3-DOF vehicle dynamics model of the present application;

FIG. 2 is a Map table reflecting the relationship curve between the tire cornering angle and the tire lateral force.

FIG. 3 is a control flow chart of the local predictive controller of the present application;

FIG. 4 is a structure diagram of the reinforcement learning algorithm of the present application;

FIG. 5 is a network structure diagram composed of the radial basis function of the present application.

DETAILED DESCRIPTION OF THE EMBODIMENTS

The technical solution of the embodiment of the present application will be described clearly and completely, combined with the accompanying drawings. Obviously, the described embodiment is only part of the embodiments of the present application, not all of the embodiments. Based on the embodiment in the present application, all other embodiments obtained by ordinary technicians in this field without making creative labor belong to the scope of protection of the present application.

A reinforcement learning algorithm-based predictive control method for lateral and longitudinal coupled vehicle formation, including:

S1, combining the 3-DOF vehicle dynamics model that takes into account the nonlinear magic formula tire model with the lane keeping model, in order to establish the vehicle formation model;

- among them:
- as shown in FIG. 1, the 3-DOF vehicle dynamics model is expressed as

$? (m_{i} {\dot{v}}_{i}^{x} - m_{i} v_{i}^{y} {\dot{φ}}_{i}) = F_{i}^{x}$

$? (m_{i} {\dot{v}}_{i}^{y} - m_{i} v_{i}^{x} {\dot{φ}}_{i}) = F_{i}^{yf} + F_{i}^{y r};$

$? I_{i}^{z} {\ddot{φ}}_{i} = a_{i} F_{i}^{yf} \cos δ_{i} - b_{i} F_{i}^{y r}$

$? indicates text missing or illegible when filed$

- each parameter in the formula is the parameter of i^thvehicle, and
- v_i^x, v_i^y, {dot over (φ)}_iare longitudinal speed, lateral speed, and yaw rate, F_i^xis a longitudinal force, F_i^yfand F_i^yrare the front and rear wheel lateral forces, m_iis a vehicle mass, I_i^zis a moment of inertia of the vehicle around the z axis, δ_iis the front wheel angle, a_iand b_iare the distances from the center of mass to the front and rear axles, respectively.

The tire lateral force of the tire in the nonlinear magic formula tire model is calculated by the following magic formula:

F
_i
^y
=D sin(C arctan(Bα−E(Bα−arctan Bα)))

- in the formula, α is a cornering angle of the tire, and B, C, D, E are simulation parameters.

Specifically, when considering the nonlinear magic formula tire model, calculating the longitudinal load of the front and rear tires first, and then obtaining the relationship curve between the tire sideslip angle and the tire lateral force by using the interpolation method with reference to FIG. 2, and then obtaining the parameter B, C, D, E by fitting the above magic formula of the tire.

The lane keeping model is expressed as.

in the formula, {dot over (φ)}_i,desis expected heading angular speed, L is a preview distance, e_i^pis a longitudinal spacing error, e_i^yis a lateral position error between the vehicle and the lane line, and e_i^φis a heading angle error between the vehicle heading angle and the road tangent.

In summary, combining the above 3-DOF vehicle dynamics model with the lane keeping model, the vehicle formation model can be obtained as follows;

- taking the state quantity x_i=[v_i^xv_i^y{dot over (φ)}_ie_i^pe_i^ye_i^φ]^T, the control quantity u_i=[F_i^xδ_i]^Tand the sampling time T_s, the discrete form of the vehicle formation model can be obtained after discretization of the above vehicle formation model: x_i(k+1)=f(x_i(k),u_i(k)); in the formula, x_i(k) is the state variable and u_i(k) is the input variable.
- S2, constructing a distributed control framework and designing a local predictive controller for each following vehicle based on the vehicle formation model under the control framework

${\hat{u}}_{i} (k + j | k + 1) = \begin{matrix} ? u_{i}^{*} (k + j | k), j = 0, \dots T_{p} - 2 \\ ? 0, j = T_{p} - 1 \end{matrix};$

$? indicates text missing or illegible when filed$

It is assumed that the trajectory can be calculated from the assumed input:

$? {\hat{x}}_{i} (k + j + 1 | k + 1) = f ({\hat{x}}_{i} (k + j | k + 1), {\hat{u}}_{i} (k + j | k + 1))$

$? {\hat{x}}_{i} (k + 1 | k + 1) = x_{i}^{*} (k + 1 | k)$

$? indicates text missing or illegible when filed$

- S3, using a reinforcement learning algorithm to solve the optimal control strategy of the local predictive controller and applying the optimal control strategy to the target following vehicle through the local predictive controller.

(31) Constructing the actor strategy function neural network and the critic value function neural network.

Specifically, combined with the structure shown in FIG. 4, the actor strategy function neural network and the critic value function neural network are set as follows:

The actor strategy function neural network uses a network consisting of T_pradial basis functions to approximate the T_p-step optimal strategy;

- evaluating the critical value function neural network by a network consisting of T_pradial basis functions;
- the network structure composed of T_pradial basis functions is shown in FIG. 5, in the approximation optimal strategy, the actor strategy function neural network takes state s as input and action a as output; in the evaluation, the critic value function neural network takes state s and action a as input and state-action value q(s,a) as output.

Preferably, the basis vectors ϕ(x) and ψ(x) in the actor strategy function neural network and the critic value function neural network are radial basis functions, and

$\begin{matrix} ϕ (x) = ψ (x) \\ = {(\exp^{- { x - x_{1} }^{2} / κ^{2}}, \exp^{- { x - x_{2} }^{2} / κ^{2}}, \dots e^{- { x - x_{M} }^{2} / κ^{2}})}^{T} \end{matrix}$

In the formula, x is set to 1, {x_i=1, 2, . . . M} as the center of the radial basis function, and M is the number of hidden layers.

The center of radial basis function is obtained by normalization:

$x_{normalize} = \frac{x_{collect} - x_{\min}}{x_{\max} - x_{\min}};$

- in the formula, x_normalizeis normalized data, x_collectis collected data, x_maxand x_minare the maximum and minimum values in the collected data respectively; specifically, the collected data involves a randomly given within the control input range, and the input data and output data of the vehicle formation model are collected by simulation.

(32) Training actor strategy function neural network and critic value function neural network.

initializing the actor strategy function neural network weight θ and the critic value function neural network weight ω;

obtaining action a by the actor strategy function neural network according to the current state s of the target following vehicle, and acting action a on the target following vehicle to obtain new state s^cand instant reward r;

- obtaining new action a^cby the actor strategy function neural network according to the new state s^c;
- evaluating and scoring action a and action a^cby the critic value function neural network to obtain q(s,a) and q(sⁱⁱ,a) and then calculating an error TDerror: TDerror=y_i−q(s,a), y_i=r+γq(sⁱⁱ,a) between a predicted value q(s,a) and an expected value y_iof the critic value function neural network according to the Bellman equation;

in order to minimize the value function obtained by the action output of the actor strategy function neural network, taking the value function q(s,a) as the loss function L(θ) of the actor strategy function neural network, that is, L(θ)=q(s,a), and using the gradient descent method to iteratively update the weight θ; in order to make the score of the critic value function neural network more accurate, taking the loss function L(ω)=½TDerror²of the critic value function neural network, and updating the weight ω iteratively by the gradient descent method; specifically, when the number of iterations or the accuracy meets the preset conditions, the optimal control strategy U_i* is obtained.

(33) Using the above actor strategy function neural network and critic value function neural network to solve the optimal control strategy of the local predictive controller, and the optimal control strategy U_i* solved in the prediction time domain acts on the target following vehicle through the local predictive controller.

Although the embodiment of the present application has been presented and described, it is understandable to ordinary technicians in the field that these embodiments can be varied, modified, replaced, and amended without departing from the principles and spirit of the present application, Therefore, the scope of the present application is limited by the accompanying claims and their equivalents.

Claims

1. A reinforcement learning algorithm-based predictive control method for a lateral and longitudinal coupled vehicle formation, comprising: S1, combining a 3-degree of freedom (DOF) vehicle dynamics model with a lane keeping model to establish a vehicle formation model, wherein the 3-DOF vehicle dynamics model takes into account a nonlinear magic formula tire model; xi(k+1)=f(xi(k),ui(k));
2. The reinforcement learning algorithm-based predictive control method for the lateral and longitudinal coupled vehicle formation according to claim 1, wherein a lateral force of a tire in the nonlinear magic formula tire model is calculated by the following magic formula: Fiy=D sin(C arctan(Bα−E(Bα−arctan Bα)))wherein α is a cornering angle of the tire, and B, C, D, E are simulation parameters.
3. The reinforcement learning algorithm-based predictive control method for the lateral and longitudinal coupled vehicle formation according to claim 1, wherein the 3-DOF vehicle dynamics model is expressed by the following formula:
4. The reinforcement learning algorithm-based predictive control method for the lateral and longitudinal coupled vehicle formation according to claim 3, wherein the lane keeping model is expressed as:
5. The reinforcement learning algorithm-based predictive control method for the lateral and longitudinal coupled vehicle formation according to claim 1, wherein when utilizing the reinforcement learning algorithm to solve the optimal control strategy of the local predictive controller: constructing and training an actor strategy function neural network to optimize strategy parameters;constructing and training a critic value function neural network to evaluate a pros and a cons of a current control strategy optimized by the actor strategy function neural network;obtaining the optimal control strategy according to an alternating convergence of the actor strategy function neural network and the critic value function neural network.
6. The reinforcement learning algorithm-based predictive control method for the lateral and longitudinal coupled vehicle formation according to claim 5, wherein when optimizing the strategy parameters, the actor strategy functional neural network uses a network composed of Tp radial basis functions to approximate a Tp-step optimal strategy and takes a state s as a first input and an action a as a first output; when assessing the pros and the cons of the current control strategy, the critic value function neural network is evaluated by the network composed of the Tp radial basis functions and takes the state s and the action a as a second input and a predicted value q(s,a) as a second output.
7. The reinforcement learning algorithm-based predictive control method for the lateral and longitudinal coupled vehicle formation according to claim 6, wherein basis vectors ϕ(x) and ψ(x) in the actor strategy function neural network and the critic value function neural network are the radial basis functions, and
8. The reinforcement learning algorithm-based predictive control method for the lateral and longitudinal coupled vehicle formation according to claim 7, wherein the center of the radial basis functions is obtained by a normalization:
9. The reinforcement learning algorithm-based predictive control method for the lateral and longitudinal coupled vehicle formation according to claim 8, wherein the optimal control strategy is obtained through the alternating convergence; initializing an actor strategy function neural network weight 9 and a critic value function neural network weight ω;obtaining the action a by the actor strategy function neural network according to a current state s of the each target following vehicle, and acting the action a on the each target following vehicle to obtain a state sc and an instant reward r;obtaining an action ac by the actor strategy function neural network according to the state sc;evaluating and scoring the action a and the action ac by the critic value function neural network to obtain the predicted value q(s,a) and a predicted value q(sii,a) and then calculating an error TDerror: TDerror=yi−q(s,a), yi=r+γq(sii,a) between the predicted value q(s,a) and an expected value yi of the critic value function neural network according to a Bellman equation;using a gradient descent method to minimize an iterative update of the actor strategy function neural network weight θ and the critic value function neural network weight ω to obtain the optimal control strategy Ui*, L(θ)=q(s,a), L(ω)=½TDerror2.
10. The reinforcement learning algorithm-based predictive control method for the lateral and longitudinal coupled vehicle formation according to claim 8, wherein in the prediction time domain, the optimal control strategy Ui* is applied to the each target following vehicle through the local predictive controller.

Priority Claims (1)

Number	Date	Country	Kind
202211087600.7	Sep 2022	CN	national

REINFORCEMENT LEARNING ALGORITHM-BASED PREDICTIVE CONTROL METHOD FOR LATERAL AND LONGITUDINAL COUPLED VEHICLE FORMATION

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)