This application is based upon and claims priority to Chinese Patent Application No. 202211087600.7, filed on Sep. 7, 2022, the entire contents of which are incorporated herein by reference.
The present application belongs to the field of automobile control technology, specifically to a reinforcement learning algorithm-based predictive control method for lateral and longitudinal coupled vehicle formation.
In the past few decades, the exponential growth in the number of vehicles has brought great challenges to energy security and traffic safety. According to the National Highway Traffic Safety Administration, about 84% of traffic accidents are attributed to human errors. Vehicle formation can significantly reduce traffic accidents caused by driver fatigue and mishandling, thereby improving road safety. In addition, vehicle formation (especially heavy truck formation) can reduce air drag between vehicles, resulting in lower emissions, fuel consumption, and increased road capacity. These potential benefits have led to an increased interest among scholars in vehicle formation control.
The research on vehicle formation mainly includes longitudinal control and lateral control, which involve:
The goal of longitudinal control is to align the desired speed of the vehicle and to maintain the desired spacing between neighboring vehicles.
The task of lateral control is to steer the vehicle within the designated lane.
The existing research on vehicle formation control mostly adopts the decoupling control method, where separate lateral and longitudinal controllers are designed to realize lateral lane keeping and longitudinal speed tracking respectively. Specifically, the lateral control methods mainly include PID control, fuzzy control, H¥ robust control, etc., and the longitudinal control methods mainly include sliding mode control, adaptive control, model predictive control, etc.
In summary, although the decoupling strategy has some effectiveness in vehicle formation control, the vehicle system is a non-linear, multi-variable, and strongly coupled system, in the conditions of higher acceleration, greater lateral and longitudinal forces, or lower road adhesion coefficients, the lateral and longitudinal coupling effects become particularly significant. During such conditions, the tracking performance of the decoupling control may decrease, leading to a significant reduction in control accuracy, which may lead to vehicle collisions and failure to achieve formation.
In addition, most of the current research on vehicle formation is based on the assumption of linear tires, which are only applicable within a limited operating range of the vehicle. When the vehicle moves out of the area (such as obstacle avoidance), the nonlinear and lateral and longitudinal coupled effects of the vehicle system become more pronounced, resulting in a mismatch between the decoupling control and emergency control of the vehicle.
In light of these challenges raised in the above background technology, the purpose of the present application is to provide a reinforcement learning algorithm-based predictive control method for lateral and longitudinal coupled vehicle formation.
In order to achieve the aforementioned objectives, the present application provides the following technical solution:
A reinforcement learning algorithm-based predictive control method for lateral and longitudinal coupled vehicle formation, including:
x
i(k+1)=f(xi(k),ui(k));
in the formula, xi(k) is a state quantity and ui(k) is an input quantity;
J
i(xi(k),Ui(k))=Σl=0T
in the formula, k is a current moment, k+i is a first moment in the prediction time domain, xi(•) is a prediction state, ri(•) is an ideal state, {circumflex over (x)}i-1(•) and {circumflex over (x)}i(•) represent assumed trajectory states of the vehicle, {circumflex over (x)}i-(•) is obtained through inter-vehicle communication, Tp is a prediction time domain, Qi, Fi, Gi, Ri are the weight matrices;
Preferably, the lateral force of the tire in the nonlinear magic formula tire model is calculated by the following magic formula:
F
i
y
=D sin(C arctan(Bα−E(Bα−arctan Bα)))
Preferably, the 3-DOF vehicle dynamics model is expressed as:
each parameter in the formula is the parameter of ith vehicle, and
Preferably, the lane keeping model is expressed as:
By discretizing the model above and using the state variable xi=[vix viy {dot over (φ)}i eip eiy eiφ]T and the control variable ui=[Fix δi]T, xi(k+1)=f(xi(k),ui(k)) can be obtained.
Preferably, when utilizing the reinforcement learning algorithm to solve the optimal control strategy of the local predictive controller:
Preferably, basis vectors ϕ(x) and ψ(x) in the actor strategy function neural network and the critic value function neural network are both radial basis functions, and
Preferably, the center of radial basis function is obtained by normalization:
Specifically, during the alternating convergence process to obtain the optimal control strategy:
Preferably, in the predictive time domain, the optimal control strategy Ui* is applied to the target following vehicle through the local predictive controller.
Compared with the existing technology, the present application has the following beneficial effects:
The technical solution of the embodiment of the present application will be described clearly and completely, combined with the accompanying drawings. Obviously, the described embodiment is only part of the embodiments of the present application, not all of the embodiments. Based on the embodiment in the present application, all other embodiments obtained by ordinary technicians in this field without making creative labor belong to the scope of protection of the present application.
A reinforcement learning algorithm-based predictive control method for lateral and longitudinal coupled vehicle formation, including:
S1, combining the 3-DOF vehicle dynamics model that takes into account the nonlinear magic formula tire model with the lane keeping model, in order to establish the vehicle formation model;
The tire lateral force of the tire in the nonlinear magic formula tire model is calculated by the following magic formula:
F
i
y
=D sin(C arctan(Bα−E(Bα−arctan Bα)))
Specifically, when considering the nonlinear magic formula tire model, calculating the longitudinal load of the front and rear tires first, and then obtaining the relationship curve between the tire sideslip angle and the tire lateral force by using the interpolation method with reference to
The lane keeping model is expressed as.
in the formula, {dot over (φ)}i,des is expected heading angular speed, L is a preview distance, eip is a longitudinal spacing error, eiy is a lateral position error between the vehicle and the lane line, and eiφ is a heading angle error between the vehicle heading angle and the road tangent.
In summary, combining the above 3-DOF vehicle dynamics model with the lane keeping model, the vehicle formation model can be obtained as follows;
J
i(xi(k),Ui(k))=Σl=0T
in the formula, k is a current moment, k+i is a first moment in the prediction time domain, xi(•) is a prediction state, ri(•) is an ideal state, {circumflex over (x)}i-1(•) and {circumflex over (x)}i(•) represent assumed trajectory states of the vehicle, {circumflex over (x)}i-(•) is obtained through inter-vehicle communication, Tp is a prediction time domain, Qi, Fi, Gi, Ri are the weight matrices; where the assumed input of each vehicle is defined as follows:
It is assumed that the trajectory can be calculated from the assumed input:
(31) Constructing the actor strategy function neural network and the critic value function neural network.
Specifically, combined with the structure shown in
The actor strategy function neural network uses a network consisting of Tp radial basis functions to approximate the Tp-step optimal strategy;
Preferably, the basis vectors ϕ(x) and ψ(x) in the actor strategy function neural network and the critic value function neural network are radial basis functions, and
In the formula, x is set to 1, {xi=1, 2, . . . M} as the center of the radial basis function, and M is the number of hidden layers.
The center of radial basis function is obtained by normalization:
(32) Training actor strategy function neural network and critic value function neural network.
initializing the actor strategy function neural network weight θ and the critic value function neural network weight ω;
obtaining action a by the actor strategy function neural network according to the current state s of the target following vehicle, and acting action a on the target following vehicle to obtain new state sc and instant reward r;
in order to minimize the value function obtained by the action output of the actor strategy function neural network, taking the value function q(s,a) as the loss function L(θ) of the actor strategy function neural network, that is, L(θ)=q(s,a), and using the gradient descent method to iteratively update the weight θ; in order to make the score of the critic value function neural network more accurate, taking the loss function L(ω)=½TDerror2 of the critic value function neural network, and updating the weight ω iteratively by the gradient descent method; specifically, when the number of iterations or the accuracy meets the preset conditions, the optimal control strategy Ui* is obtained.
(33) Using the above actor strategy function neural network and critic value function neural network to solve the optimal control strategy of the local predictive controller, and the optimal control strategy Ui* solved in the prediction time domain acts on the target following vehicle through the local predictive controller.
Although the embodiment of the present application has been presented and described, it is understandable to ordinary technicians in the field that these embodiments can be varied, modified, replaced, and amended without departing from the principles and spirit of the present application, Therefore, the scope of the present application is limited by the accompanying claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
202211087600.7 | Sep 2022 | CN | national |