METHOD, DEVICE, AND STORAGE MEDIUM FOR DECENTRALIZED OPTIMAL CONTROL FOR LARGE-SCALE MULTIAGENT SYSTEMS

Description

FIELD OF THE DISCLOSURE

The present disclosure generally relates to the field of hierarchical heterogeneous planning and scheduling technology and, more particularly, relates to a method, a device, and a storage medium for decentralized optimal control for large-scale multi-agent systems.

BACKGROUND

In recent years, large-scale multi-agent systems (LS-MAS) have attracted significant interest from both research societies and industrial communities due to its capability of upgrading conventional multi-agent system performance by using its diversity gain. For instance, the tracking control problem in the LS-MAS has been studied. However, It is extremely difficult to directly utilize conventional control into LS-MAS due to three challenges. The first challenge is notorious “curse of dimensionality”. Since conventional cooperative control needs each agent to know other agents' states, the computational complexity of distributed control is exponentially increased along with increased number of agents. The second challenge is lacking a realistic reliable communication network that can timely support information exchange among LS-MAS. Due to the limitation of communication capability in practice, conventional distributed cooperative control techniques are extremely difficult to be applied. The last challenge is that the constraints from physical system limitation and practical environment may cause difficulty in LS-MAS optimal control design. Therefore, there is a need to overcome these challenges simultaneously and lead to an intelligent, reliable and applicable control for LS-MAS.

BRIEF SUMMARY OF THE DISCLOSURE

One aspect or embodiment of the present disclosure provides a method for decentralized optimal control for a large-scale multi-agent system. The large-scale multi-agent system includes multiple agents, and each agent includes three neural networks (NNs) including an actor NN, a critic NN, and a mass NN. The method includes initializing errors to obtain an initialized error of the actor NN, an initialized error of the critic NN, and an initialized error of the mass NN; initializing error thresholds to obtain an initialized error threshold of the actor NN, an initialized error threshold of the critic NN, and an initialized error threshold of the mass NN; if the initialized error of the actor NN is greater than or equal to the initialized error threshold of the actor NN, if the initialized error of the critic NN is greater than or equal to the initialized error threshold of the critic NN, and if the initialized error of the mass NN is greater than or equal to the initialized error threshold of the mass NN: calculating NN weights of the actor NN, the critic NN, and the mass NN, respectively; and updating the actor NN, the critic NN, and the mass NN using corresponding calculated NN weights, respectively; and calculating NN errors of the actor NN, the critic NN, and the mass NN, respectively; and updating the actor NN, the critic NN, and the mass NN using corresponding calculated NN errors, respectively.

Another aspect or embodiment of the present disclosure provides a device for decentralized optimal control for a large-scale multi-agent system. The large-scale multi-agent system includes multiple agents, and each agent includes three neural networks (NNs) including an actor NN, a critic NN, and a mass NN. The device includes a memory, configured to store program instructions for performing a method for decentralized optimal control for the large-scale multi-agent system; and a processor, coupled with the memory and, when executing the program instructions, configured for: initializing errors to obtain an initialized error of the actor NN, an initialized error of the critic NN, and an initialized error of the mass NN; initializing error thresholds to obtain an initialized error threshold of the actor NN, an initialized error threshold of the critic NN, and an initialized error threshold of the mass NN; if the initialized error of the actor NN is greater than or equal to the initialized error threshold of the actor NN, if the initialized error of the critic NN is greater than or equal to the initialized error threshold of the critic NN, and if the initialized error of the mass NN is greater than or equal to the initialized error threshold of the mass NN: calculating NN weights of the actor NN, the critic NN, and the mass NN, respectively; and updating the actor NN, the critic NN, and the mass NN using corresponding calculated NN weights, respectively; and calculating NN errors of the actor NN, the critic NN, and the mass NN, respectively; and updating the actor NN, the critic NN, and the mass NN using corresponding calculated NN errors, respectively.

Another aspect or embodiment of the present disclosure provides a non-transitory computer-readable storage medium, containing program instructions for, when being executed by a processor, performing a method for decentralized optimal control for a large-scale multi-agent system. The large-scale multi-agent system includes multiple agents, and each agent includes three neural networks (NNs) including an actor NN, a critic NN, and a mass NN. The method includes initializing errors to obtain an initialized error of the actor NN, an initialized error of the critic NN, and an initialized error of the mass NN; initializing error thresholds to obtain an initialized error threshold of the actor NN, an initialized error threshold of the critic NN, and an initialized error threshold of the mass NN; if the initialized error of the actor NN is greater than or equal to the initialized error threshold of the actor NN, if the initialized error of the critic NN is greater than or equal to the initialized error threshold of the critic NN, and if the initialized error of the mass NN is greater than or equal to the initialized error threshold of the mass NN: calculating NN weights of the actor NN, the critic NN, and the mass NN, respectively; and updating the actor NN, the critic NN, and the mass NN using corresponding calculated NN weights, respectively; and calculating NN errors of the actor NN, the critic NN, and the mass NN, respectively; and updating the actor NN, the critic NN, and the mass NN using corresponding calculated NN errors, respectively.

Other aspects or embodiments of the present disclosure may be understood by those skilled in the art in light of the description, the claims, and the drawings of the present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

The following drawings are merely examples for illustrative purposes according to various disclosed embodiments and are not intended to limit the scope of the present disclosure.

FIG. 1 depicts a flowchart of an exemplary method for decentralized optimal control for a large-scale multi-agent system according to various disclosed embodiments of the present disclosure.

FIG. 2 depicts an exemplary barrier-actor-critic-mass (BACM) algorithm according to various disclosed embodiments of the present disclosure.

FIG. 3 depicts an exemplary structure of a barrier-actor-critic-mass system according to various disclosed embodiments of the present disclosure.

FIG. 4 depicts an exemplary overall trajectory schematic according to various disclosed embodiments of the present disclosure.

FIG. 5 depicts an exemplary tracking error plot of all agents in an x axis according to various disclosed embodiments of the present disclosure.

FIG. 6 depicts an exemplary tracking error plot of all agents in a y axis according to various disclosed embodiments of the present disclosure.

FIG. 7 depicts an exemplary HJB (Hamiltonian-Jacobi-Bellman) equation error plot according to various disclosed embodiments of the present disclosure.

FIG. 8 depicts an exemplary FPK (Fokker-Planck-Kolmogorov) equation error plot according to various disclosed embodiments of the present disclosure.

DETAILED DESCRIPTION

References may be made in detail to exemplary embodiments of the disclosure, which may be illustrated in the accompanying drawings. Wherever possible, same reference numbers may be used throughout the accompanying drawings to refer to same or similar parts.

Mean field game theory (MFG) may be adopted to address the “curse of dimensionality” in LS-MAS. In MFG, individual agents may use a probability density function (PDF) (i.e. “mass”) of all agents to observe the behavior of entire population without requiring their states and control inputs. Then, infinity players' non-cooperative game may be shifted into a two-players game that includes a single agent versus entire population. Meanwhile, practical physical system limitations as well as complex environment may cause constraints into the control design for LS-MAS. For example, both state and density constraints may be considered in MFG based control for LS-MAS, respectively. To better integrate those constraints into the MFG-based LS-MAS optimal control problem formulation, barrier function may be adopted for handling individual agent state constraint and mass function's density constraint. With the barrier function and MFG, the constrained LS-MAS optimal control problem may be formulated. However, to obtain optimal control, a pair of forward and backward partial differential equation (PDE), called Fokker-Planck-Kolmogorov (FPK) equation and Hamiltonian-Jacobi-Bellman (HJB) equation, may need to be solved. It is extremely difficult and even impossible to directly solve these PDEs since these two PDEs are closely coupled with each other. To address such difficulty, adaptive dynamic programming and reinforcement learning technique may be adopted. Furthermore, a barrier-actor-critic-mass (BACM) learning algorithm may be developed with mass NN (neural network) for learning behaviors of large population via estimating the solution of FPK equation with barrier function, critic NN for obtaining optimal cost function by learning the solution of the HJB equation with barrier function, and actor NN for solving decentralized optimal tracking control based on the information provided by the mass NN and the critic NN. The key contributions of such configuration may be the following: the boundary and density constraints may be integrated into conventional MFG based LS-MAS optimization through a barrier function based system transformation; and the barrier-actor-critic-mass algorithm may be developed to solve the constrained HJB and FPK equations simultaneously and further obtain the optimal control for LS-MAS in real-time.

According to various embodiments of the present disclosure, LS-MAS tracking optimal control is described hereinafter. N may represent the number of homogeneous agents moving in a l dimensional configuration space, which is enclosed by an upper and lower boundary. An agent i may be controlled by the stochastic differential equation with their states being constrained as follows:

$\begin{matrix} {dx}_{i} = [f (x_{i}) + g (x_{i}) u_{i}] dt + \sqrt{2 v} {dB}_{i} & (1) \end{matrix}$

where f(x_i) and g(x_i) may be nonlinear functions, x_imay be an agent state which includes the position and velocity of the agent, u_imay be a control input, B_imay be standard Brownian motion which represents the process noise; and v may be a non-negative parameter.

A predefined time varying trajectory x_r(t) may be given to all agents, where t is time. The objective of individual agent may be to track the reference trajectory by minimizing the tracking error which is defined as the following:

$\begin{matrix} {\tilde{x}}_{i} (t) = x_{i} (t) - x_{r} (t) & (2) \end{matrix}$

Moreover, the tracking error dynamics may be derived as follows:

$\begin{matrix} \begin{matrix} d {\tilde{x}}_{i} (t) = {dx}_{i} (t) - {dx}_{r} (t) \\ = [f (x_{i}) + g (x_{i}) u_{i} - \frac{?}{dt}] dt + \sqrt{2 υ} {dB}_{i} \\ = [f^{'} ({\tilde{x}}_{i}) + g^{'} ({\tilde{x}}_{i}) u_{i}] dt + \sqrt{2 υ} {dB}_{i} \end{matrix} & (3) \end{matrix}$

$where,$

$f^{'} ({\tilde{x}}_{i}) = f ({\tilde{x}}_{i} + x_{r}) - ({dx}_{r} / dt)$

$and$

$g^{'} ({\tilde{x}}_{i}) = g ({\tilde{x}}_{i} + x_{r})$

$? indicates text missing or illegible when filed$

The optimal objective of each agent may be to track the reference trajectory by minimizing the following cost function:

$\begin{matrix} ? ({\tilde{x}}_{i}, m) = E {\int_{0}^{\infty} [L ({\tilde{x}}_{i}, u_{i}) + C ({\tilde{x}}_{i}, m)] dt} & (4) \end{matrix}$

$? indicates text missing or illegible when filed$

where m({tilde over (x)}_i, t) may denote the probability density function (mass) of the population's tracking error at time t. Also, C({tilde over (x)}_i, m) may be the mean field coupling function which represents the interaction between agent i and the whole population of other agents. Since the dimension of the PDF and each agent state are same, the mean field coupling function can greatly reduce the computational complexity problem. Moreover, L({tilde over (x)}_i, u_i)=∥{tilde over (x)}_i∥+∥u_i∥_R², where Q and R have compatible dimensions.

Next, a barrier-function based system transformation may be applied to the original system to ensure both the tracking error state and density constraints. Let the Barrier function B(.): custom-character → is defined on (l_{{tilde over (x)},i}), u_{{tilde over (x)},i}), then the tracking error state {tilde over (x)}_iof the system may be represented as follows:

$\begin{matrix} s_{i} = B_{i} ({\tilde{x}}_{i}; l_{\tilde{x}, i}, u_{\tilde{x}, i}) = \ln \frac{u_{\tilde{x}, i} (l_{\tilde{x}, i} - {\tilde{x}}_{i})}{l_{\tilde{x}, i} (u_{\tilde{x}, i} - {\tilde{x}}_{i})} & (5) \end{matrix}$

- where l_{{tilde over (x)},i}and u_{{tilde over (x)},i}may satisfy l_{{tilde over (x)},i}<{tilde over (x)}i<u_{{tilde over (x)},i}and s_imay be the tracking error state of the transformed system. Also, the Barrier function may be invertible on interval (l_{{tilde over (x)},i}, u_{{tilde over (x)},i}), for example:

$\begin{matrix} {\tilde{x}}_{i} = B_{i}^{- 1} (s_{i}; l_{\tilde{x}, i}, u_{\tilde{x}, i}) = l_{\tilde{x}, i}, u_{\tilde{x}, i} \frac{? - ?}{l_{\tilde{x}, i} ? - u_{\tilde{x}, i} ?} & (6) \end{matrix}$

$? indicates text missing or illegible when filed$

Similarly, barrier function may be generated for ensuring density constraint as follows:

$\begin{matrix} ρ = B_{m} (m; ρ_{1}, p_{2}) = \ln \frac{ρ_{2} (ρ_{1} - m ({\tilde{x}}_{i}, t))}{ρ_{1} (ρ_{2} - m ({\tilde{x}}_{i}, t))} & (7) \end{matrix}$

- where ρ₁and ρ₂may be two constants satisfying ρ₁<ρ₂. The inverse of the barrier function may be represented as follows:

$\begin{matrix} m = B_{m}^{- 1} (ρ; ρ_{1}, ρ_{2}) = ρ_{1} ρ_{2} \frac{? - ?}{ρ_{1} ? - ρ_{2} ?} & (8) \end{matrix}$

$? indicates text missing or illegible when filed$

- where ρ may be the density of the transformed system.

In one embodiment of the present disclosure, the barrier functions B(.) may take finite value when the arguments are within above defined region and approach to infinity as the state and density approach the boundary of the defined region, respectively.

The dynamics of the transformed state s_imay be obtained by using following chain rule:

$\begin{matrix} \begin{matrix} {ds}_{i} = \frac{\frac{d {\tilde{x}}_{i}}{dt}}{\frac{d {\tilde{x}}_{i}}{{ds}_{i}}} dt \\ = \frac{[f^{'} ({\tilde{x}}_{i}) + g^{'} ({\tilde{x}}_{i}) u_{i}] dt}{\frac{d (? \frac{?}{?})}{{ds}_{i}}} \\ = [f^{'} ({\tilde{x}}_{i}) + g^{'} ({\tilde{x}}_{i}) u_{i}] \frac{u_{\tilde{x}, i}^{2} ? - l_{\tilde{x}, i} u_{\tilde{x}, i} + l_{\tilde{x}, i}^{2} ?}{u_{\tilde{x}, i} l_{\tilde{x}, i}^{2} - l_{\tilde{x}, i} u_{\tilde{x}, i}^{2}} dt \\ = [F (s_{i}) + G (s_{i}) u_{i}] dt + \sqrt{2 υ} {dB}_{i} \end{matrix} & (9) \end{matrix}$

$where$

$F (s_{i}) = f ({\tilde{x}}_{i}) \frac{?}{?}$

$and$

$G (s_{i}) = g ({\tilde{x}}_{i}) \frac{?}{?}$

$? indicates text missing or illegible when filed$

F(s_i) may be Lipschitz, and there may exist a constant a_fsuch that for s_i∈Ω, ∥F(s_i)∥≤a_f∥s_i∥, where Ω may be a compact set containing the origin. In addition, G(s_i) may be bounded on Ω, i.e., there may exist a constant a_gsuch that ∥(s_i)∥≤a_g. Moreover, the system in equation (1) may be controllable over the compact set Ω.

Next, a new cost function of the transformed state may be represented as follows:

$\begin{matrix} V_{i} (s_{i}, ρ) = E \int_{0}^{\infty} [L (s_{i}, u_{i}) + C (s_{i}, ρ)] dt & (10) \end{matrix}$

$where$

$L (s_{i}, u_{i}) =  s_{i} ? + { u_{i} }_{R}^{2} .$

$? indicates text missing or illegible when filed$

Then, a Hamiltonian may be defined as follows:

$\begin{matrix} H [s_{i}, {DV}_{i} (s_{i}, ρ, t)] = L (s_{i}, u_{i}) + {{DV}_{i} (s_{i}, ρ, t)}^{T} [F (s_{i}) + G (s_{i}) u_{i}] & (11) \end{matrix}$

Next, the following HJB equation may be obtained by substituting the optimal evaluation function into the Hamiltonian as follows:

$\begin{matrix} - \partial_{t} V_{i}^{*} (s_{i}, ρ, t) - {υΔV}_{i}^{*} (s_{i}, ρ, t) + H [s_{i}, {DV}_{i}^{*} (s_{i}, ρ, t)] = C (s_{i}, ρ) & (12) \end{matrix}$

Then, the optimal control for each agent may be derived as follows:

$\begin{matrix} u_{i}^{*} (s_{i}) = - \frac{?}{2} R^{- 1} g^{T} (s_{i}) {DV}_{i}^{*} (s_{i}, ρ, t) & (13) \end{matrix}$

$? indicates text missing or illegible when filed$

To obtain the HJB equation in equation (12), the practical probability density function (PDF) (i.e., mass function p) may be required. The mass function may be obtained by solving the FPK equation, where the FPK equation with density constraint may be obtained as follows:

$\begin{matrix} \begin{matrix} \partial_{t} ρ (s_{i}, t) = \frac{\partial_{t} m (\tilde{x}, t)}{\frac{\partial m (\tilde{x}, t)}{?}} \\ = \frac{υΔ m (\tilde{x}, t) + di ? (m D_{p} H [\tilde{x}, {DV}_{i} (\tilde{x}, m, t)])}{\frac{? \frac{?}{?}}{?}} \\ = (υ Δ ρ (s_{i}, t) + di ? (ρ D_{p} H [s_{i}, {DV}_{i} (s_{i}, ρ, t)])) \\ \frac{ρ_{1}^{2} ? - 2 ρ_{1} ρ_{2} + ρ_{1}^{2} ?}{ρ_{2} ρ_{1}^{2} - ρ_{1} ρ_{2}^{2}} \end{matrix} & (14) \end{matrix}$

$? indicates text missing or illegible when filed$

Next, the FPK equation with the optimal cost function may be obtained as follows:

$\begin{matrix} (15) \end{matrix}$

$\partial_{t} ρ (s_{i}, t) -  [υ Δ ρ (s_{i}, t) + di ? (ρ D_{p} H [s_{i}, {DV}_{i}^{*} (s_{i}, ρ, t)]))] \frac{ρ_{1}^{2} ? - 2 ρ_{1} ρ_{2} + ρ_{1}^{2} ?}{ρ_{2} ρ_{1}^{2} - ρ_{1} ρ_{2}^{2}} = 0$

$? indicates text missing or illegible when filed$

According to various embodiments of the present disclosure, to obtain the optimal control policy, the coupled HJB-FPK equation may need to be solved in real time. However, the HJB and FPK equations may be multi-dimensional nonlinear PDEs whose solution may be difficult to achieve with state and density constraints. Therefore, in the present disclosure, the barrier-actor-critic-mass based NNs may be developed to learn the solution of coupled HJB-FPK equations.

According to various embodiments of the present disclosure, a method for decentralized optimal control for a large-scale multi-agent system is described hereinafter.

FIG. 1 depicts a flowchart of an exemplary method for decentralized optimal control for a large-scale multi-agent system according to various disclosed embodiments of the present disclosure. FIG. 2 depicts an exemplary BACM algorithm according to various disclosed embodiments of the present disclosure. FIG. 3 depicts an exemplary structure of a barrier-actor-critic-mass system according to various disclosed embodiments of the present disclosure.

The large-scale multi-agent system includes multiple agents; and each agent includes three neural networks (NNs) including an actor NN, a critic NN, and a mass NN. Referring to FIGS. 1-3, the method includes initializing errors to obtain an initialized error of the actor NN, an initialized error of the critic NN, and an initialized error of the mass NN; initializing error thresholds to obtain an initialized error threshold of the actor NN, an initialized error threshold of the critic NN, and an initialized error threshold of the mass NN; if the initialized error of the actor NN is greater than or equal to the initialized error threshold of the actor NN, if the initialized error of the critic NN is greater than or equal to the initialized error threshold of the critic NN, and if the initialized error of the mass NN is greater than or equal to the initialized error threshold of the mass NN: calculating NN weights of the actor NN, the critic NN, and the mass NN, respectively; and updating the actor NN, the critic NN, and the mass NN using corresponding calculated NN weights, respectively; and calculating NN errors of the actor NN, the critic NN, and the mass NN, respectively; and updating the actor NN, the critic NN, and the mass NN using corresponding calculated NN errors, respectively.

In one embodiment, the method further includes, if the initialized error of the actor NN is less than the initialized error threshold of the actor NN, obtaining previous calculated NN weights of the actor NN; or if the initialized error of the critic NN is less than the initialized error threshold of the critic NN, obtaining previous calculated NN weights of the critic NN; or if the initialized error of the mass NN is less than the initialized error threshold of the mass NN, obtaining previous calculated NN weights of the mass NN.

In one embodiment, the method further includes using the previous calculated NN weights of the actor NN to calculate a control; and executing the calculated control.

In one embodiment, the method further includes, before initializing the errors, initializing a state and a density of the agent, where the state of the agent includes a position and a velocity; and calculating an error of the agent using the state of the agent and a predefined trajectory.

In one embodiment, the method further includes, before initializing the errors and after calculating the error of the agent, performing a barrier-function based system transformation on the error and the density of the agent to obtain to a transformed error state and a transformed density state, respectively.

In one embodiment, the transformed error state and the transformed density state are configured to calculate corresponding NN weights and errors.

In one embodiment, the method further includes, before initializing the errors, randomly initializing the NN weights of the actor NN, the critic NN, and the mass NN.

In one embodiment, the critic NN is configured to estimate a cost function; and the mass NN is configured to estimate a probability density function.

In one embodiment, the agent includes an unmanned aerial vehicle.

In one embodiment, referring to FIG. 2, there may be no need for using two levels of while-loops; the outside loop may be only for the purpose of debugging; and optionally, lines 7 and 15 may be removed or omitted.

According to various embodiments of the present disclosure, the barrier-actor-critic-mass algorithm is described hereinafter. Referring to FIGS. 1-3, in the BACM, each agent may maintain three neural networks (NN). The actor NN may approximate the optimal control policy, the critic NN may approximate the optimal evaluation function and the mass NN may estimate the density of the entire population. Meanwhile, the barrier function may be applied into three NNs to ensure both tracking error and density constraints being satisfied during the learning process.

According to various embodiments of the present disclosure, critic learning is described in the following. The optimal value function may be represented as follows:

$\begin{matrix} ? (s_{i}, ρ, t) = W_{V, i}^{T} ϕ_{V, i} + ε_{V, i} & (16) \end{matrix}$

$? indicates text missing or illegible when filed$

where W_v,imay be an ideal critic NN weight and ∅_v,imay be the critic NN activation function. In addition, e may represent the reconstruction error of critic neural network. Next, the optimal cost function may be approximated as follows:

$\begin{matrix} {\hat{V}}_{i} (s_{i}, {\hat{ρ}}_{i}, t) = {\hat{W}}_{V, i}^{T} {\hat{ϕ}}_{V, i} & (17) \end{matrix}$

where Ŵ_V,imay be the approximated NN weights.

By substituting equation (17) to equation (12), a residual error used to tune the weight of the critic NN may be obtained as follows:

$\begin{matrix} e_{HJBi} = C (s_{i}, {\hat{ρ}}_{i}) + {\hat{W}}_{V, i}^{T} [\partial_{t} {\hat{ϕ}}_{V, i} + νΔ {\hat{ϕ}}_{V, i} - {\hat{H}}_{W}] where, \hat{H} = H [s_{i}, {DV}_{i}^{*} (s_{i}, ρ_{i}, t)] and \hat{H} = {\hat{W}}_{V, i}^{T} {\hat{H}}_{W} . & (18) \end{matrix}$

Next, the equation (18) may be simplified as follows:

$\begin{matrix} e_{HJBi} = C (s_{i}, {\hat{ρ}}_{i}) + {\hat{W}}_{V, i}^{T} ψ_{V, i} (s_{i}, {\hat{ρ}}_{i}, t) & (19) \end{matrix}$

$\begin{matrix} where ψ_{V, i} (s_{i}, {\hat{ρ}}_{i}, t) = [\partial_{t} {\hat{ϕ}}_{V, i} + νΔ {\hat{ϕ}}_{V, i} - {\hat{H}}_{W}] & (20) \end{matrix}$

By substituting the optimal cost function from equation (16) to equation (12), it may obtain:

$\begin{matrix} C (s_{i}, ρ_{i}) + W_{V, i}^{T} [\partial_{t} ϕ_{V, i} + {νΔϕ}_{V, i} - H_{W}] + ε_{HJBi} = 0 & (21) \end{matrix}$

where H=W_V,i^TH_Wand ε_HJBimay be an error caused by the reconstruction error.

After the simplification, the equation (21) may be written as follows:

$\begin{matrix} C (s_{i}, ρ_{i}) + W_{V, i}^{T} ψ_{V, i} (s_{i}, ρ_{i}, t) + ε_{HJBi} = 0 & (22) \end{matrix}$

The approximation error of the coupling function may be derived as follows:

$\begin{matrix} \tilde{C} (s_{i}, {\tilde{ρ}}_{i}) = C (s_{i}, {\tilde{ρ}}_{i}) - C (s_{i}, ρ_{i}) & (23) \end{matrix}$

By substituting equation (23) to equation (22), it may obtain:

$\begin{matrix} C (s_{i}, {\tilde{ρ}}_{i}) - \tilde{C} (s_{i}, {\tilde{ρ}}_{i}) + W_{V, i}^{T} ψ_{V, i} (s_{i}, ρ_{i}, t) + ε_{HJBi} = 0 & (24) \end{matrix}$

Next, by substituting equation (24) into equation (19), it may obtain:

$\begin{matrix} e_{HJBi} = \tilde{C} (s_{i}, {\tilde{ρ}}_{i}) - W_{V, i}^{T} ψ_{V, i} (s_{i}, ρ_{i}, t) - ε_{HJBi} + {\hat{W}}_{V, i}^{T} ψ_{V, i} (s_{i}, {\hat{ρ}}_{i}, t) & (25) \end{matrix}$

Next, the critic NN weight approximation error and HJB equation approximation error may be respectively defined as follows:

$\begin{matrix} {\hat{W}}_{V, i} = W_{V, i} - {\hat{W}}_{V, i} & (26) \end{matrix}$

$\begin{matrix} {\tilde{ψ}}_{V, i} (s_{i}, {\tilde{ρ}}_{i}, t) = ψ_{V, i} (s_{i}, ρ_{i}, t) - ψ_{V, i} (s_{i}, {\hat{ρ}}_{i}, t) & (27) \end{matrix}$

By substituting equation (26) and equation (27) into equation (25), it may obtain:

$\begin{matrix} \begin{matrix} e_{HJBi} = \tilde{C} (s_{i}, {\tilde{ρ}}_{i}) - W_{V, i}^{T} ({\tilde{ψ}}_{V, i} (s_{i}, {\tilde{ρ}}_{i}, t) + ψ_{V, i} (s_{i}, {\hat{ρ}}_{i}, t)) + \\ (W_{V, i}^{T} - {\tilde{W}}_{V, i}^{T}) ψ_{V, i} (s_{i}, {\hat{ρ}}_{i}, t) - ε_{HJBi} \\ = \tilde{C} (s_{i}, {\tilde{ρ}}_{i}) - W_{V, i}^{T} {\tilde{ψ}}_{V, i} (s_{i}, {\tilde{ρ}}_{i}, t) - \\ {\tilde{W}}_{V, i}^{T} ψ_{V, i} (s_{i}, {\hat{ρ}}_{i}, t) - ε_{HJBi} \end{matrix} & (28) \end{matrix}$

Next, the update law for critic NN may be obtained by using the gradient descent along with the HJB approximation error as follows:

$\begin{matrix} {\dot{\hat{W}}}_{V, i} = - α_{V, i} \frac{ψ_{V, i} (s_{i}, {\hat{ρ}}_{i}, t) e_{HJBi}^{T}}{1 + { ψ_{V, i} (s_{i}, {\hat{ρ}}_{i}, t) }^{2}} & (29) \end{matrix}$

where a_V,imay be the learning rate.

According to various embodiments of the present disclosure, mass learning is described as the following. The mass function be represented as follows:

$\begin{matrix} ρ (s_{i}, t) = W_{ρ, i}^{T} ϕ_{ρ, i} + ε_{ρ, i} & (30) \end{matrix}$

- where Ŵ_ρ,iand ϕ_ρ,imay be the ideal mass NN weight and activation function, respectively. ε_ρ,imay be the reconstruction error of the mass NN.

Then, the mass distribution may be estimated as follows:

$\begin{matrix} \hat{ρ} (s_{i}, {\overline{ρ}}_{i}, t) = {\hat{W}}_{ρ, i}^{T} {\hat{ϕ}}_{ρ, i} & (31) \end{matrix}$

- where Ŵ_ρ,imay be the approximated mass NN weight. Moreover, ρ may be the averaged historical density defined as

$\overline{ρ} = \frac{1}{T} \int_{t}^{t - T} ρ d ρ,$

and T may be a constant historical window.

The residual error for the mass NN may be defined by substituting equation (31) to equation (15) as follows:

$\begin{matrix} e_{{FPK}_{i}} = {\hat{W}}_{ρ, i}^{T} [\partial_{t} {\hat{ϕ}}_{ρ, i} - [νΔ {\hat{ϕ}}_{ρ, i} + div ({\hat{ϕ}}_{ρ, i} D_{p} \hat{H})] \frac{ρ_{1}^{2} e^{- {\overline{ρ}}_{i}} - 2 ρ_{1} ρ_{2} + ρ_{1}^{2} e^{{\overline{ρ}}_{i}}}{ρ_{2} ρ_{1}^{2} - ρ_{1} ρ_{2}^{2}}] & (32) \end{matrix}$

$where \hat{H} = H [s_{i}, D_{s} {\hat{ϕ}}_{V, i}]$

Equation (32) may be simplified as follows:

$\begin{matrix} e_{FPKi} = {\hat{W}}_{ρ, i}^{T} ψ_{ρ, i} (s_{i}, {\overline{ρ}}_{i}, {\hat{V}}_{i}, t) & (33) \end{matrix}$

$\begin{matrix} ψ_{ρ, i} (s_{i}, ρ ?_{i}, {\hat{V}}_{i}, ⁠ t) = [⁠ \partial_{t} {\hat{ϕ}}_{ρ, i} - [v Δ {\hat{ϕ}}_{ρ, i} + div ({\hat{ϕ}}_{ρ, i} D_{ρ} \hat{H})] \frac{ρ_{1}^{2} e^{- ρ_{i}} ? - 2 ρ_{1} ρ_{2} + ρ_{1}^{2} e^{- ρ} ?}{ρ_{2} ρ_{1}^{2} - ρ_{1} ρ_{2}^{2}}] & (34) \end{matrix}$

$? indicates text missing or illegible when filed$

Next, by substituting the mass function from equation (30) to equation (15), it may obtain:

$\begin{matrix} W_{ρ, i}^{T} [\partial_{t} ϕ_{ρ, i} - [v Δ ϕ_{ρ, i} + div (ϕ_{ρ, i} D_{ρ} H [s_{i}, D_{s} V_{i} (s_{i}, ρ_{i}, t)])] & (35) \end{matrix}$

$\frac{ρ_{1}^{2} e^{- ρ_{i}} ? - 2 ρ_{1} ρ_{2} + ρ_{1}^{2} e^{- ρ_{i}} ?}{ρ_{2} ρ_{1}^{2} - ρ_{1} ρ_{2}^{2}}] + ?_{FPKi} = 0$

$? indicates text missing or illegible when filed$

- which may be simplified as follows:

$\begin{matrix} W_{ρ, i}^{T} ψ_{ρ, i} (s_{i}, ρ ?_{i}, V_{i}, t) + ?_{FPKi} = 0 & (36) \end{matrix}$

$? indicates text missing or illegible when filed$

The mass NN weight approximation error and FPK equation approximation error may be defined as follows:

$\begin{matrix} {\tilde{W}}_{ρ, i} = W_{ρ, i} = {\hat{W}}_{ρ, i} & (37) \end{matrix}$

$\begin{matrix} ψ ?_{ρ, i} (s_{i}, ρ ?_{i}, {\tilde{V}}_{i}, t) = ψ_{ρ, i} (s_{i}, ρ ?_{i}, V_{i}, t) - ψ_{ρ, i} (s_{i}, ρ ?_{i}, {\hat{V}}_{i}, t) & (38) \end{matrix}$

$? indicates text missing or illegible when filed$

Next, be substituting equation (36) into equation (33), it may obtain:

$\begin{matrix} ?_{FPKi} = - W ?_{ρ, i}^{T} ψ_{ρ, i} (s_{i}, {\overline{ρ}}_{i}, {\hat{V}}_{i}, t) - W_{ρ, i}^{T} ψ ?_{ρ, i} (s_{i}, ρ ?_{i}, V ?_{i}, t) = ?_{FPKi} & (39) \end{matrix}$

$? indicates text missing or illegible when filed$

Then, by applying the gradient descent along with FPK estimation error, the update law for mass NN may be generated as follows:

$\begin{matrix} {\dot{\hat{W}}}_{ρ, i} = α_{ρ . i} \frac{ψ_{ρ, i} (s_{i}, {\overline{ρ}}_{i}, {\hat{V}}_{i}, t) e_{FPKi}^{T}}{1 + { ψ_{ρ, i} (s_{i}, {\overline{ρ}}_{i}, {\hat{V}}_{i}, t) }^{2}} & (40) \end{matrix}$

where α_ρ,imay be the mass NN learning rate.

According to various embodiments of the present disclosure, actor learning is described as the following. The optimal control may be represented as follows:

$\begin{matrix} u_{i} (s_{i}, ρ, t) = W_{u, i}^{T} ϕ_{u, i} + ε_{u, i} & (41) \end{matrix}$

where W_u,iand ϕ_u,imay be the ideal actor NN weight and activation function, respectively. ε_u,imay be the reconstruction error of the Actor NN.

Then, the optimal control may be estimated as follows:

$\begin{matrix} {\hat{u}}_{i} (s_{i}, {\hat{ρ}}_{i}, t) = {\hat{W}}_{u, i}^{T} {\hat{ϕ}}_{u, i} & (42) \end{matrix}$

where Ŵ_u,iis the approximated actor NN weight.

The residual error after substituting equation (42) into equation (13) may be represented as follows:

$\begin{matrix} e_{u, i} = {\hat{W}}_{u, i}^{T} {\hat{ϕ}}_{u, i} + \frac{1}{2} R^{- 1} g^{T} (s_{i}) D_{s} {\hat{V}}_{i} (s_{i}, {\hat{ρ}}_{i}, t) & (43) \end{matrix}$

Furthermore, the update law for actor NN may be designed as follows:

$\begin{matrix} W_{u, i} ? = - α_{u, i} \frac{ϕ_{u, i} (s_{i}, {\hat{ρ}}_{i}, t) e_{ui}^{T}}{1 + { ϕ_{u, i} (s_{i}, {\hat{ρ}}_{i}, t) }^{2}} & (44) \end{matrix}$

$? indicates text missing or illegible when filed$

The designed BACM algorithm has been implemented into the large-scale multi-UAV (unmanned aerial vehicle) system to address the decentralized mean field based optimal tracking control problem. In one embodiment, a total of 3000 agents (e.g., UAV) may be deployed with system dynamics under physical limitation and uncertain environment. A reference trajectory may have been given ahead of the mission planning. The goal of each agent may be to track the reference trajectory while avoiding the obstacle during the mission. Therefore, the movements of all agents may be limited to a fixed area with specific boundary and density constraint. The initial positions of all agents may be generated randomly following a normal distribution with mean 0.5 and variance 0.16. The initial velocities of all agents may be set to zero. In one embodiment, the reference trajectory may be given as follows:

$x_{r} (t) = [\begin{matrix} 0.2 \sin (2 t) + 0.002 t^{2} + 0.5 \\ 0.2 t \\ 0.2 \cos (2 t) + 0.004 t \\ 0.2 \end{matrix}]$

In one embodiment, the agent intrinsic dynamics may be given as follows:

$f^{'} (x) = [\begin{matrix} x_{2} - x_{1} \\ x_{4} - x_{3} \\ \frac{x_{2}}{2} [({\cos (2 x_{1} + 2)}^{2} - 1)] - \frac{x_{1}}{2} \\ \frac{x_{4}}{2} [({\cos (2 x_{3} + 2)}^{2} - 1)] - \frac{x_{3}}{2} \end{matrix}]$

$g^{'} (x) = [\begin{matrix} 0 \\ 0 \\ \cos (2 x_{1}) + 2 \\ \cos (2 x_{3}) + 2 \end{matrix}]$

The non-negative parameter v may be selected as 0.02. The mean field cost function may be selected as C(s_i, m)=∥s_i− custom-character (ρ)∥, which represents the difference between current tracking error of the agent i and current average tracking error of the whole population. In addition, the state and density constraints may be considered as follows:

$[\begin{matrix} l_{\tilde{x}, i} \\ u_{\tilde{x}, i} \end{matrix}] = [\begin{matrix} x_{r} (t) + 1 \\ - (x_{r} (t) + 1) \end{matrix}]$

where l_{{tilde over (x)},i}and u_{{tilde over (x)},i}may be the lower and upper bound of the state constraint, respectively.

Furthermore, the lower and upper bound of the density constraint may be defined as follows:

$[\begin{matrix} ρ_{1} \\ ρ_{2} \end{matrix}] = [\begin{matrix} 0.2 \\ 1 \end{matrix}]$

- where ρ(s)=1 may denote that the tracking error of all agents are same.

The barrier function-based system transformation may have been employed for state constraint. The new dynamics of the transformed system may be given as follows:

$\begin{matrix} F (s_{i}) = f (B_{i}^{- 1} (s_{i})) \frac{u_{\overline{x}, i}^{2} e^{- s_{i}} - 2 l_{\overline{x}, i} u_{\overline{x}, i} + l_{\overline{x}, i}^{2} e^{s_{i}}}{u_{\overline{x}, i} l_{\overline{x}, i}^{2} - l_{\overline{x}, i} u_{\overline{x}, i}^{2}} & (45) \end{matrix}$

$\begin{matrix} G (s_{i}) = g (B_{i}^{- 1} (s_{i})) \frac{u_{\overline{x}, i}^{2} e^{- s_{i}} - 2 l_{\overline{x}, i} u_{\overline{x}, i} + l_{\overline{x}, i}^{2} e^{s_{i}}}{u_{\overline{x}, i} l_{\overline{x}, i}^{2} - l_{\overline{x}, i} u_{\overline{x}, i}^{2}} & (46) \end{matrix}$

$where f (B_{i}^{- 1} (s_{i})) = f^{'} (x) - ({dx}_{r} / dt) and g (B_{i}^{- 1} (s_{i})) = g^{'} (x) .$

In one embodiment, the coefficients to evaluate the cost of actions and tracking errors may be selected as R=1, and Q=1. The learning rate of the neural network may be defined as α_u,i=2×10⁻⁴, α_V,i=2×10⁻⁶, α_ρ,i=1×10⁻³. Furthermore, the thresholds may be defined as δ_u=1×10⁻³, δ_FPK=1×10⁻³, and δ_HJB=1×10⁻⁴.

According to various embodiments of the present disclosure, the overall performance schematic of developed BACM based decentralized optimal tracking control is shown in FIG. 4. Referring to FIG. 4, the black curve may mark the reference trajectory and grey curves may represent the boundary constraints. It should be noted that the developed algorithm may force all the agents to track the reference trajectory while satisfying the given state constraints (i.e., boundary).

The tracking errors of all agents has been analyzed in various embodiments of the present disclosure. FIG. 5 depicts an exemplary tracking error plot of all agents in an x axis according to various disclosed embodiments of the present disclosure. FIG. 6 depicts an exemplary tracking error plot of all agents in a y axis according to various disclosed embodiments of the present disclosure. FIGS. 5-6 illustrate the tracking errors of all agents in the x-axis and y-axis, respectively. Both figures show that the tracking errors may converge to near zero along with time, which may indicate that the designed algorithm may track the reference trajectory in real time.

According to various embodiments of the present disclosure, the neural networks performance may be demonstrated by analyzing the HJB equation error along with the FPK equation error of agents. FIG. 7 depicts an exemplary HJB equation error plot according to various disclosed embodiments of the present disclosure. FIG. 8 depicts an exemplary FPK equation error plot according to various disclosed embodiments of the present disclosure. Without loss of generality, the optimality for, for example, agent 1, may be evaluated. Referring to FIGS. 7-8, the mean field equations error for agent 1 may converge to near zero, which may indicate that the solution of the HJB-FPK coupled equation system may be successfully approximated, such that the e-Nash equilibrium may have been reached.

According to various embodiments of the present disclosure, the BACM framework may have been developed based on mean field game theory. The decentralized optimal control for LS-MAS may have been obtained by solving the coupled HJB-FPK equations under the state and density constraints that is ensured through appropriate barrier functions. Three neural networks may be employed to solve the barrier function based mean field game, where the actor NN is for learning optimal control, the critic NN is for estimating optimal cost function, and the mass NN is for approximating the LS-MAS's probability density function (i.e., mass). Furthermore, a series of numerical simulations may have demonstrated the effectiveness of the developed method in embodiments of the present disclosure.

Various embodiments of the present disclosure further provide a device for decentralized optimal control for a large-scale multi-agent system. The large-scale multi-agent system includes multiple agents. Each agent includes three neural networks (NNs) including an actor NN, a critic NN, and a mass NN. The device includes a memory, configured to store program instructions for performing a method for decentralized optimal control for the large-scale multi-agent system; and a processor, coupled with the memory and, when executing the program instructions, configured for: initializing errors to obtain an initialized error of the actor NN, an initialized error of the critic NN, and an initialized error of the mass NN; initializing error thresholds to obtain an initialized error threshold of the actor NN, an initialized error threshold of the critic NN, and an initialized error threshold of the mass NN; if the initialized error of the actor NN is greater than or equal to the initialized error threshold of the actor NN, if the initialized error of the critic NN is greater than or equal to the initialized error threshold of the critic NN, and if the initialized error of the mass NN is greater than or equal to the initialized error threshold of the mass NN: calculating NN weights of the actor NN, the critic NN, and the mass NN, respectively; and updating the actor NN, the critic NN, and the mass NN using corresponding calculated NN weights, respectively; and calculating NN errors of the actor NN, the critic NN, and the mass NN, respectively; and updating the actor NN, the critic NN, and the mass NN using corresponding calculated NN errors, respectively.

Various embodiments of the present disclosure further provide a non-transitory computer-readable storage medium, containing program instructions for, when being executed by a processor, performing a method for decentralized optimal control for a large-scale multi-agent system. The large-scale multi-agent system includes multiple agents. Each agent includes three neural networks (NNs) including an actor NN, a critic NN, and a mass NN. The method includes initializing errors to obtain an initialized error of the actor NN, an initialized error of the critic NN, and an initialized error of the mass NN; initializing error thresholds to obtain an initialized error threshold of the actor NN, an initialized error threshold of the critic NN, and an initialized error threshold of the mass NN; if the initialized error of the actor NN is greater than or equal to the initialized error threshold of the actor NN, if the initialized error of the critic NN is greater than or equal to the initialized error threshold of the critic NN, and if the initialized error of the mass NN is greater than or equal to the initialized error threshold of the mass NN: calculating NN weights of the actor NN, the critic NN, and the mass NN, respectively; and updating the actor NN, the critic NN, and the mass NN using corresponding calculated NN weights, respectively; and calculating NN errors of the actor NN, the critic NN, and the mass NN, respectively; and updating the actor NN, the critic NN, and the mass NN using corresponding calculated NN errors, respectively.

The embodiments disclosed herein may be exemplary only. Other applications, advantages, alternations, modifications, or equivalents to the disclosed embodiments may be obvious to those skilled in the art and be intended to be encompassed within the scope of the present disclosure.

Claims

1. A method for decentralized optimal control for a large-scale multi-agent system, the large-scale multi-agent system including multiple agents each including three neural networks (NNs) including an actor NN, a critic NN, and a mass NN, the method comprising: initializing errors to obtain an initialized error of the actor NN, an initialized error of the critic NN, and an initialized error of the mass NN;initializing error thresholds to obtain an initialized error threshold of the actor NN, an initialized error threshold of the critic NN, and an initialized error threshold of the mass NN; andif the initialized error of the actor NN is greater than or equal to the initialized error threshold of the actor NN, if the initialized error of the critic NN is greater than or equal to the initialized error threshold of the critic NN, and if the initialized error of the mass NN is greater than or equal to the initialized error threshold of the mass NN: calculating NN weights of the actor NN, the critic NN, and the mass NN, respectively; and updating the actor NN, the critic NN, and the mass NN using corresponding calculated NN weights, respectively; andcalculating NN errors of the actor NN, the critic NN, and the mass NN, respectively; and updating the actor NN, the critic NN, and the mass NN using corresponding calculated NN errors, respectively.
2. The method according to claim 1, further including: if the initialized error of the actor NN is less than the initialized error threshold of the actor NN, obtaining previous calculated NN weights of the actor NN; orif the initialized error of the critic NN is less than the initialized error threshold of the critic NN, obtaining previous calculated NN weights of the critic NN; orif the initialized error of the mass NN is less than the initialized error threshold of the mass NN, obtaining previous calculated NN weights of the mass NN.
3. The method according to claim 2, further including: using the previous calculated NN weights of the actor NN to calculate a control; and
4. The method according to claim 1, wherein before initializing the errors, the method further includes: initializing a state and a density of the agent, wherein the state of the agent includes a position and a velocity; andcalculating an error of the agent using the state of the agent and a predefined trajectory.
5. The method according to claim 4, wherein before initializing the errors and after calculating the error of the agent, the method further includes: performing a barrier-function based system transformation on the error and the density of the agent to obtain to a transformed error state and a transformed density state, respectively.
6. The method according to claim 5, wherein: the transformed error state and the transformed density state are configured to calculate corresponding NN weights and errors.
7. The method according to claim 1, wherein before initializing the errors, the method further includes: randomly initializing the NN weights of the actor NN, the critic NN, and the mass NN.
8. The method according to claim 1, wherein: the critic NN is configured to estimate a cost function; andthe mass NN is configured to estimate a probability density function.
9. The method according to claim 1, wherein: the agent includes an unmanned aerial vehicle.
10. A device for decentralized optimal control for a large-scale multi-agent system, the large-scale multi-agent system including multiple agents each including three neural networks (NNs) including an actor NN, a critic NN, and a mass NN, the device comprising: a memory, configured to store program instructions for performing a method for decentralized optimal control for the large-scale multi-agent system; anda processor, coupled with the memory and, when executing the program instructions, configured for: initializing errors to obtain an initialized error of the actor NN, an initialized error of the critic NN, and an initialized error of the mass NN;initializing error thresholds to obtain an initialized error threshold of the actor NN, an initialized error threshold of the critic NN, and an initialized error threshold of the mass NN;if the initialized error of the actor NN is greater than or equal to the initialized error threshold of the actor NN, if the initialized error of the critic NN is greater than or equal to the initialized error threshold of the critic NN, and if the initialized error of the mass NN is greater than or equal to the initialized error threshold of the mass NN:calculating NN weights of the actor NN, the critic NN, and the mass NN, respectively; and updating the actor NN, the critic NN, and the mass NN using corresponding calculated NN weights, respectively; andcalculating NN errors of the actor NN, the critic NN, and the mass NN, respectively; and updating the actor NN, the critic NN, and the mass NN using corresponding calculated NN errors, respectively.
11. The device according to claim 10, wherein the processor is further configured for: if the initialized error of the actor NN is less than the initialized error threshold of the actor NN, obtaining previous calculated NN weights of the actor NN; orif the initialized error of the critic NN is less than the initialized error threshold of the critic NN, obtaining previous calculated NN weights of the critic NN; orif the initialized error of the mass NN is less than the initialized error threshold of the mass NN, obtaining previous calculated NN weights of the mass NN.
12. The device according to claim 11, wherein the processor is further configured for: using the previous calculated NN weights of the actor NN to calculate a control; andexecuting the calculated control.
13. The device according to claim 10, wherein before initializing the errors, the processor is further configured for: initializing a state and a density of the agent, wherein the state of the agent includes a position and a velocity; andcalculating an error of the agent using the state of the agent and a predefined trajectory.
14. The device according to claim 13, wherein before initializing the errors and after calculating the error of the agent, the processor is further configured for: performing a barrier-function based system transformation on the error and the density of the agent to obtain to a transformed error state and a transformed density state, respectively.
15. The device according to claim 14, wherein: the transformed error state and the transformed density state are configured to calculate corresponding NN weights and errors.
16. The device according to claim 10, wherein before initializing the errors, the processor is further configured for: randomly initializing the NN weights of the actor NN, the critic NN, and the mass NN.
17. A non-transitory computer-readable storage medium, containing program instructions for, when being executed by a processor, performing a method for decentralized optimal control for a large-scale multi-agent system which includes multiple agents each including three neural networks (NNs) including an actor NN, a critic NN, and a mass NN, the method comprising: initializing errors to obtain an initialized error of the actor NN, an initialized error of the critic NN, and an initialized error of the mass NN;initializing error thresholds to obtain an initialized error threshold of the actor NN, an initialized error threshold of the critic NN, and an initialized error threshold of the mass NN;if the initialized error of the actor NN is greater than or equal to the initialized error threshold of the actor NN, if the initialized error of the critic NN is greater than or equal to the initialized error threshold of the critic NN, and if the initialized error of the mass NN is greater than or equal to the initialized error threshold of the mass NN: calculating NN weights of the actor NN, the critic NN, and the mass NN, respectively; and updating the actor NN, the critic NN, and the mass NN using corresponding calculated NN weights, respectively; andcalculating NN errors of the actor NN, the critic NN, and the mass NN, respectively; and updating the actor NN, the critic NN, and the mass NN using corresponding calculated NN errors, respectively.
18. The storage medium according to claim 17, wherein the method further includes: if the initialized error of the actor NN is less than the initialized error threshold of the actor NN, obtaining previous calculated NN weights of the actor NN; orif the initialized error of the critic NN is less than the initialized error threshold of the critic NN, obtaining previous calculated NN weights of the critic NN; orif the initialized error of the mass NN is less than the initialized error threshold of the mass NN, obtaining previous calculated NN weights of the mass NN.
19. The storage medium according to claim 18, wherein the method further includes: using the previous calculated NN weights of the actor NN to calculate a control; andexecuting the calculated control.
20. The storage medium according to claim 17, wherein before initializing the errors, the method further includes: initializing a state and a density of the agent, wherein the state of the agent includes a position and a velocity; andcalculating an error of the agent using the state of the agent and a predefined trajectory.

GOVERNMENT RIGHTS

The present disclosure was made with Government support under Contract No. FA8750-22-C-1000, awarded by the United States Air Force Research Laboratory. The U.S. Government has certain rights in the present disclosure.

METHOD, DEVICE, AND STORAGE MEDIUM FOR DECENTRALIZED OPTIMAL CONTROL FOR LARGE-SCALE MULTIAGENT SYSTEMS

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

GOVERNMENT RIGHTS