Social crowd navigation for mobile robots among human pedestrians is a challenging task due to complex human-robot and human-human interactions, as well as the uncertainty and unpredictability of human behavior. Moreover, robots' behavior in crowd navigation is not only required to be safe, robust, and efficient like other robotic applications but also needs to ensure social compatibility in motion with human pedestrians.
In one aspect, a computer-implemented method for controlling navigation of a plurality of mobile robots is provided. The method includes: (i) defining a potential game for the plurality of robots with input including a predicted trajectory of each of a plurality of humans in an environment in which the plurality of robots are configured to move and output including a potential function, (ii) iteratively executing processing of a two-player game algorithm until a pre-defined condition is satisfied, wherein, in the two-player game algorithm, the plurality of humans collectively represent a first player having a first cost function associated with the summed predicted trajectories of the plurality of humans, and the plurality of mobile robots collectively represent a second player having a second cost function including the potential function of the potential game, (iii) in response to the pre-defined condition being satisfied, generating control instructions for each robot of the plurality of robots based on output from the two-player game algorithm after a last iteration, and (iv) controlling each of the plurality of robots to move in accordance with the respective control instructions.
In another aspect, a computer-implemented method for controlling navigation of a plurality of mobile robots is provided. The method includes, by a controller, iteratively executing processing of a two-player game algorithm until a pre-defined condition is satisfied. The iteratively executing processing includes: (i) transmitting, by the controller to each of the plurality of robots, a predicted trajectory of each of a plurality of humans in an environment in which the plurality of mobile robots are configured to move, (ii) receiving, from each of the plurality of robots, a respective current best-response trajectory that is current to the respective iteration, (iii) transmitting, to each of the plurality of robots, the received current best-response trajectories of each other robot of the plurality of robots, and (iv) receiving, from each of the plurality of robots, a convergence signal in response to the transmitting. The method also includes, in response to the pre-defined condition being satisfied, generating, by the controller, control instructions for each robot of the plurality of robots based on output from the two-player game algorithm after a last iteration, and controlling each of the plurality of robots to move in accordance with the respective control instructions.
In yet another aspect, controllers and control systems programmed to implement the above-described methods are also provided.
The present disclosure provides a control framework for the coordination of multiple robots as they navigate through crowded environments. More specifically, an interaction-aware control framework to coordinate robots among human pedestrians using game-theoretic model predictive control (MPC) and deep learning-based human trajectory prediction is provided. In at least some embodiments, a social long-short term memory (LSTM) model, which predicts the future trajectories of human pedestrians, is combined with an MPC formulation to derive the optimal control actions for the robots. It is recognized that this framework for the navigation and control of mobile robots is not restricted to the use of social LSTM models, and any other learning-based model/interactive model may be used for the prediction of human movement in the coupled prediction-and-planning methods described herein.
The goal of social multi-robot navigation is to navigate each robot from an initial position to their goals while interacting with the human pedestrians to avoid collisions and minimizing any discomfort during motion.
Game theory is an effective tool for formulating and analyzing different types of interactions among agents. In at least some exemplary embodiments, a potential game between the robots is combined with a two-player game between the robots and the humans. Centralized and distributed MPC algorithms based on an iterative best-response approach are provided.
Turning now to the Figures,
Each robot 102 includes one or more sensors 204, configured to collect sensor data related to the movement of the respective robot 102 and the environment around robot 102. In the exemplary embodiment, each robot 102 also includes one or more control components 206. Control components 206 include physical components of robots 102 that enable and execute movement of robots 102. Physical components 206 many include, but are not limited to including, wheels, belts, gears, rollers, motors, mechanical actuators, electrical actuators, electro-mechanical actuators, and the like.
Additionally, each robot 102 includes a computing assembly 208. Assembly 208 includes requisite processing, memory, and communication components enabling robot 102 to transmit and receive messages and data, to make local computations (including any computation described herein), to instruct control components 206, and the like. In some embodiments, assembly 208 includes a local social LSTM module 230, similar to social LSTM module 220 of controller 202, and a local MPC module 232, similar to MPC module 222 of controller 202. That is, where various computational functionality is herein described as being centrally performed at controller 202 using social LSTM module 220 or MPC module 222, such functionality may also be performed by one or more local social LSTM modules 230 or local MPC modules 232, respectively, at respective one or more robots 102. For example, in some embodiments, one robot 102 may compute navigation/control instructions for all robots 102 within an environment and then may transmit or broadcast the instructions to the other robots 102. As another example, each robot 102 is equipped with its own assembly 208 and communicates with other robots 102 through peer-to-peer communication or via controller 202. That is, it is contemplated that, in some embodiments, controller 202 may be omitted.
Although three robots 102 are depicted in the figures, it should be understood that the systems and methods of the present disclosure are applicable to groups of robots 102 of any number, including two robots as well as more than three robots.
It should be understood that the use of this LSTM model for predicting human trajectories is for exemplary purpose. The present disclosure is not limited to these models nor even these classes of models. In alternative embodiments, human trajectories may be predicted by applying constant velocity, Kalman filters, intention-enhanced optimal reciprocal collision avoidance, social generative adversarial networks (GAN), social-NCE, sparse Gaussian processes, or other learning-based model(s).
In the exemplary embodiment, the social LSTM model is a recursive model configured to predict trajectories of pedestrians, such as humans 104, over a prediction period (also referred to as a prediction horizon) comprising a plurality of time steps, given observations of one or more previous actions of all humans 104 in the prediction set. The social LSTM model is applied to generate or compute simultaneous trajectory predictions for all pedestrians (humans 104) in the prediction set.
Specifically, the respective predicted motion of each human 104 for a single time step is computed, for each time step within the prediction horizon. At each time step within the prediction horizon, each human 104's predicted and actual positions in previous time steps are input to the social LSTM model. Notably, however, the motion of robots 102 is computed (e.g., using controller 202, as described further herein) at each time step and is therefore objectively known (e.g., predicted without error). In this way, the recursive, “single-step” social LSTM model enables coupled prediction and planning over the prediction/control horizon—that is, accounting for the mutual dependence between humans' and robots' future motion.
In
The MPC model for control of the motion of robots 102 (e.g., robot 102 depicted in
Individual objectives and shared objectives, or any other constraints, may be weighted in different applications of the MPC model to prioritize different objectives. For example, in some formulations, the weight or significance of reaching the individual objective increases as the distance to the objective decreases. In some formulations, to prevent collision between a robot and any human pedestrian, the MPC model imposes a minimum threshold distance between the robot and any human. The minimum threshold distance is, in some embodiments, speed-dependent, such that the robot stays further away from any human as its speed increases (leading to less human discomfort).
Individual objectives include minimal distance to goal, minimal time to goal, minimal collisions, maximum human comfort, minimal human discomfort, minimal robot “jerk”, or minimal robot “freeze” or downtime (or any combination thereof). Shared objectives include “flocking” together or remaining within a threshold radius of one another, or maintaining another group-level formation than a flock, such as a particular shape or a minimum threshold distance between the robots.
In
In some embodiments, the game theory application takes a centralized MPC approach, in which controller 202 is programmed to centrally compute the solution to a centralized MPC function for each of robots 102, and generate control instructions for robots 102 based thereon.
In particular, the navigation of robots 102 relative to one another is characterized as a potential game with each robot 102 as a player (see
Taken with the potential game described above, the interaction between humans 104 and robots 102 is considered as a two-player game, with the group of humans 104 as one player and the group of robots 102 as the other player (see
In the centralized MPC approach, given this two-player game, controller 202 performs an iterative, centralized computational process, including the following steps conducted at each time step within an overall control horizon. One embodiment of these steps, in accordance with one exemplary centralized MPC process 600, is shown in
Controller 202 iterates (608) these steps until a pre-defined condition is satisfied (610). That condition includes one of the following: the difference in the potential function at two consecutive iterations is less than a threshold value (which may be referred to as a “convergence condition”); or a maximum number of iterations is reached. Upon the condition being satisfied (610), controller 202 accepts the computed proposed trajectory of the last iteration, for each robot 102, as the computed best-response trajectory to be executed, for each robot 102. Controller 202 generates (612) control instructions 402 (see
In some embodiments, the game theory application takes a decentralized or distributed MPC approach, in which each robot 102 and controller 202 individually collaborate and exchange data to compute the solution to the individual MPC model for each respective robot 102. More specifically, each robot 102 locally solves its individual MPC model (e.g., using local MPC module 232), given the current best-response trajectories of other robots 102.
In the distributed MPC approach, given the above-presented two-player game, robots 102 and controller 202 together perform an iterative, distributed computational process, including the following steps conducted at each time step. One embodiment of these steps, in accordance with one exemplary distributed MPC process 700, is shown in
Controller 202 computes (702) predicted trajectories 404 for humans 104 (e.g., using the social LSTM model) over a prediction horizon and transmits the predicted trajectories 404 to robots 102 (see also
Each robot 102, in parallel, uses the predicted (human) trajectories (404) from controller 202 and the (estimated or actual) previous best-response trajectories of the other robots 102 as inputs to solve (704) the local MPC model. If the proposed solution—that is, a computed proposed trajectory—at that iteration improves the cost for the respective robot 102 relative to the cost of the collective previous best-response trajectory of all robots 1012, that robot 102 will accept (706) the computed proposed trajectory as the current best-response trajectory for itself. “Current,” in this context, refers to a current computation relative to an in-progress iteration. If not, the robot 102 accepts the previous best-response trajectory (or the initially estimated best-response trajectory, at initiation of process 700) as the current best-response trajectory for itself. Each robot 102 transmits its respective current best-response trajectory back to controller 202. Controller 202 may then distribute, to all robots 102, the current best-response trajectories received from all robots 102.
Each robot 102, in parallel, uses the received best-response trajectories from all robots 102, to again solve the local MPC model and determine whether the collective current best-response trajectory for all robots 102 improves the cost for the respective robot 102 relative to a previously computed best-response trajectory for all robots 102. If so, the robot 102 transmits a positive convergence signal to controller 202; if not, the robot 102 transmits a negative convergence signal to controller 202.
Controller 202 and robots 102 iterate (708) these steps until a pre-defined condition is satisfied (710). That condition includes one of the following: a positive convergence signal is received at controller 202 from all robots 102; or a maximum number of iterations is reached. Upon the condition being satisfied (710), controller 202 generates (712) control instructions (e.g., control instructions 402, as shown in
In some embodiments, it is contemplated that one robot 102 may function as a controller in the distributed MPC process. For example, controller 202 may be omitted, and the functions ascribed to controller 202 in the above description may be performed by a designated one of robots 102.
Method 800 further includes generating 806, in response to the condition being satisfied, control instructions for each robot of the plurality of robots based on output from the two-player game algorithm after a last iteration, and controlling 808 each of the plurality of robots to move in accordance with the respective control instructions.
Method 800 may include additional, fewer, or alternative steps, including those described elsewhere herein.
For example, in some embodiments, controlling 808 includes transmitting the control instructions, respectively, to each robot of the plurality of robots, the control instructions including computer-executable commands causing each robot to process and execute the respective commands.
In some embodiments, generating 806 includes generating control instructions defining a best-response trajectory for each respective robot. In some such embodiments, method 800 further includes computing, in the last iteration, a proposed trajectory of each of the plurality of mobile robots, and accepting the proposed trajectory as the best-response trajectory to be executed by each of the plurality of mobile robots.
In some embodiments, executing processing 804 includes optimizing a centralized MPC function for each of the plurality of robots to obtain a proposed trajectory for each of the plurality of robots, and evaluating the potential function using the proposed trajectory for each of the plurality of robots and the predicted trajectories of the plurality of humans.
In certain embodiments, the condition is satisfied when a difference in the output from the two-player game algorithm between a previous iteration and the last iteration is less than a threshold value. The condition may be a convergence condition. In some embodiments, the condition is satisfied when the last iteration corresponds to a maximum number of iterations.
In still further embodiments, method 800 includes computing the predicted trajectory of each of the plurality of humans by executing a social long-short term memory (LSTM) model.
Method 900 also includes generating 912, in response to the condition being satisfied, control instructions for each robot of the plurality of robots based on output from the two-player game algorithm after a last iteration, and controlling 914 each of the plurality of robots to move in accordance with the respective control instructions.
Method 900 may include additional, fewer, or alternative steps, including those described elsewhere herein.
For example, in some embodiments, controlling 914 includes transmitting, by the controller, the control instructions, respectively, to each robot of the plurality of robots, the control instructions including computer-executable commands causing each robot to process and execute the respective commands.
In some embodiments, generating 812 includes generating, by the controller, control instructions defining an actual best-response trajectory for each respective robot. In some such embodiments, method 900 further includes, in the last iteration, receiving, by the controller, a positive convergence signal from each of the plurality of mobile robots, and accepting the current best-response trajectories of the plurality of robots as the actual best-response trajectories.
In certain embodiments, the executing processing 902 includes solving, by each robot of the plurality of robots in parallel, a local MPC function to obtain the current best-response trajectory that is current to the respective iteration.
In further embodiments, the executing processing 902 includes in response to receiving the current best-response trajectories of each other robot of the plurality of robots, solving, by each robot, a local MPC function to determine whether the current best-response trajectories of the plurality of robots improves an outcome relative to previously computed best-response trajectories of the plurality of robots. In some such embodiments, method 900 also includes, when the solving indicates an improved outcome, transmitting, by each of the plurality of robots, the convergence signal including a positive convergence signal to the controller.
In some embodiments, the condition is satisfied when the convergence signal includes a positive convergence signal. In some embodiments, the condition is satisfied when the last iteration corresponds to a maximum number of iterations.
In certain embodiment, method 900 further includes computing the predicted trajectory of each of the plurality of humans by executing a social long-short term memory (LSTM) model.
In the example embodiment, memory device 1004 includes one or more devices that enable information, such as executable instructions or other data (e.g., sensor data), to be stored and retrieved. Moreover, memory device 1004 includes one or more computer readable media, such as, without limitation, dynamic random access memory (DRAM), static random access memory (SRAM), a solid state disk, or a hard disk. In the example embodiment, the memory device 604 stores, without limitation, application source code, application object code, configuration data, additional input events, application states, assertion statements, validation results, or any other type of data. The computing device 1000, in the example embodiment, may also include a communication interface 1006 that is coupled to processor 1002 via system bus 1008.
In the example embodiment, processor 1002 may be programmed by encoding an operation using one or more executable instructions and providing the executable instructions in memory device 1004. In the example embodiment, processor 1002 is programmed perform functions such as those described herein with respect to
In operation, a computer executes computer-executable instructions embodied in one or more computer-executable components stored on one or more computer-readable media to implement aspects of the disclosure described or illustrated herein. The order of execution or performance of the operations in embodiments of the disclosure illustrated and described herein is not essential, unless otherwise specified. That is, the operations may be performed in any order, unless otherwise specified, and embodiments of the disclosure may include additional or fewer operations than those disclosed herein. For example, it is contemplated that executing or performing a particular operation before, contemporaneously with, or after another operation is within the scope of aspects of the disclosure.
The present disclosure is directed to a game-theoretical approach to controlling navigation of multiple robots within an environment, accounting for robot objectives and human objectives (or preferences). As described herein, each game theory model or algorithm applies (i) a potential game algorithm relative to each robot, and (ii) a two-player game algorithm accounting for robots and humans (as groups). These algorithms utilize input data regarding the position of humans and robots at each time step within an overall prediction horizon or control horizon (which may include a plurality of time steps). The above description and below examples provide exemplary models to generate positional input data for humans and robots-namely, a single-step social LSTM model and an MPC model, respectively. Exemplary single-step LSTM models and MPC models are provided below. However, it should be understood that any exemplary models, algorithms, formulae, and equations presented herein should be taken as illustrative; other models, algorithms, formulae, and equations may be applied.
As is recognized, social crowd navigation for mobile robots among human pedestrians remains a challenging task due to complex human-robot and human-human interactions, as well as the uncertainty and unpredictability of human behavior. Moreover, robots' behavior in crowd navigation is not only required to be safe, robust, and efficient like other robotic applications but also needs to ensure social compatibility in motion with human pedestrians. This is why social crowd navigation has attracted considerable attention in recent years.
There have been several applications where a team of autonomous systems must operate while interacting with humans, such as connected and automated driving in mixed traffic, robotic assistance devices, etc. Multi-robot navigation in crowds is a typical application of cyber-physical-human systems that is significantly more challenging than single robots, as it involves multi-agent coordination and control for human-robot interaction.
The navigation problem for robots in crowds can be approached through two main strategies: decoupled prediction and planning and coupled prediction and planning. In the decoupled approach, human agents are considered dynamic obstacles and, thus, mobility-based interactions are ignored. As a result, “freezing robot problem” and “reciprocal dance problem” can occur and cause discomfort to pedestrians nearby. In contrast, mutual interactions between human and robot agents are embedded in the coupled approach, by appropriately incorporating human motion prediction into the problem of robot navigation. This approach has been observed to somewhat mitigate the freezing robot and reciprocal dance problems, thereby making it a recent trend in crowd navigation research.
Several recent studies in crowd navigation have developed reinforcement learning (RL) methods, in which a control policy for a robot is approximated by deep neural networks, such as collision avoidance deep RL (CADRL), long short-term memory (LSTM)-RL, socially aware RL (SARL), as well as graph-based techniques. However, the efficacy of RL policies is often degraded if the robot encounters new scenarios that are different from the training scenario.
On the other hand, optimization-based approaches such as model predictive control (MPC) do not depend on offline training and can generalize well with appropriate parameter tuning. In MPC, the trajectories of robots over a finite control horizon are optimized, given certain prediction models of human trajectories. Various human prediction models have been used together with MPC in the literature, such as constant velocity, Kalman filters, intention-enhanced optimal reciprocal collision avoidance, social generative adversarial networks (GAN), and LSTM. Due to the complexity of human behavior, recent data-driven methods such as Social-LSTM, Social-GAN, Social-NCE, or sparse Gaussian processes have demonstrated more accurate trajectory prediction compared to domain knowledge-based models. Thus, combining machine learning techniques with MPC has the potential to enhance the efficacy of overall navigation.
While there are increasing studies on social navigation for an individual robot in crowds, the problem of multi-robot cooperative navigation among human pedestrians has not been fully explored. Other work has focused on multi-robot collision avoidance while the humans were modeled as dynamic obstacles with constant velocities; in this work, therefore, the human-robot interaction was not considered.
The present disclosure proposes an interaction-aware control framework to coordinate robots among human pedestrians using game-theoretic MPC and deep learning-based human trajectory prediction. First, an MPC problem is formulated to find the optimal control actions for the robots in a receding horizon fashion, and Social-LSTM, a state-of-the-art trajectory prediction model, is integrated into the MPC formulation. To solve the resulting problem, the concept of the potential game for multi-robot coordination is utilized, combined with a two-player game for human-robot interaction to develop two algorithms for centralized and distributed MPC. The effectiveness of the proposed framework is demonstrated with simulations of a crowd navigation scenario, in which the robots exhibit coordinated flocking behavior while simultaneously navigating toward their respective goals.
The disclosure considers an environment ⊂
2 with agents including M∈
robots and N−M human pedestrians, illustrated in
={1, . . . , M},
={N−M+1, . . . , N}, and
=
∪
the sets of robots, pedestrians, and all agents respectively.
At time step k∈, let si,k=[si,kx, si,ky]T∈
, vi,k=[vi,kx, vi,ky]T∈
2, and ai,k=[ai,kx, ai,ky]T∈
2 be the vectors corresponding to position, velocity, and acceleration of the robot i∈
in Cartesian coordinates, respectively, where each vector consists of two components for x- and y-axis. Let xi,k=[si,kT, vi,kT]T and ui,k=ai,k be the vectors of states and control actions for the robot i at time step k, respectively, while sj,k∈
denotes the position of pedestrian j∈
at time step k. For simplicity in notation, we use Xi,t
to denote the concatenated arbitrary vector X from time step t1 to time step t2. We denote the state and action spaces for robot i as
i and
i, i.e., xi,k∈
i and ui,k∈
i, respectively.
Assumption 1. We assume that i and
i are nonempty, compact, and connected sets for all i∈
.
The goal of social multi-robot navigation is to navigate each robot i from the initial position siinit∈ to their goals sigoal∈
while interacting with the human pedestrians to avoid collisions and minimizing any discomfort during motion. Consider the following assumption to facilitate positioning and communication between the robots for coordination.
Assumption 2. The robots' and pedestrians' real-time positions can be obtained by a positioning system and stored by a central coordinator (e.g., a computer). The robots and the coordinator can exchange information through wireless communication.
Model Predictive Control (MPC) with Social-LSTM
The proposed framework of the present disclosure combines a Social-LSTM model, which predicts the future trajectories of human pedestrians with an MPC formulation to derive the optimal control actions for the robots. Next, the Social-LSTM model and MPC formulation are formalized.
Let t∈ be the current time step, H∈
+ be the control/prediction horizon length, and
t={t, t+1, . . . , t+H=1} be the set of time steps in the next control horizon. The human prediction model aim at predicting the trajectories of pedestrians over a prediction horizon of length H given the observations over L∈
+ previous time steps of all agents' trajectories. Social-LSTM was developed for jointly predicting the multi-step trajectory of multiple agents. This disclosure considers recursive prediction for the pedestrians' positions using the single-step Social-LSTM model denoted by ϕ(⋅):
→
as follows:
The prediction model of the present disclosure (EQ. 1) differs from at least one known Social-LSTM model. First, the known Social-LSTM model performs simultaneous prediction for all N agents, i.e., both robots and humans. However, in the present disclosure, the positions of human pedestrians are precisely extracted from the joint predictions. Additionally, the Social-LSTM model is considered with single-step prediction, where predictions are made recursively throughout the control horizon at each time step instead of a multi-step prediction for the entire control horizon. The recursive technique yields a coupled prediction and planning approach, while using the multi-step prediction model leads to a decoupled prediction and planning approach that ignores the mutual dependence between humans' and robots' future behavior. Further, at each time step of the control horizon, the humans' predicted positions in previous time steps are used as the inputs of the Social-LSTM model, while the predicted positions of the robots are discarded, because the future robots' positions depend on the solution of the MPC problem.
Remark 1. Though social-LSTM is utilized at the human motion prediction model, other deep learning models can be used alternatively.
Henceforth, for ease of notation, we omit the time subscript of the variables if it refers to all time steps within the control horizon. For example, we use ui, xi, and sj, ∀i∈, j∈
, instead of ui,t:t+H−1, xi,t+1:t+H, and sj,t+1:t+H. It is also considered that the dynamics of each robot are governed by a discrete-time double-integrator model:
We formulate the local MPC problem for robot i at time t:
Note that any constraints, e.g., collision avoidance constraints, can be included in the objective function by penalty functions with sufficiently large weights.
Assumption 3. We assume that the individual objective Ji and shared objective Jij, for all i, j∈, are continuous and differentiable functions everywhere over
and
.
In this section, the MPC formulation (EQ. 4) is illustrated by considering the scenario illustrated in are subject to the following bound constraints:
We formulate individual and shared objectives by weighted sums of several features. For robot i, we consider a tracking minimization to a desired reference trajectory
To prevent collision between a robot and any human pedestrian, we impose the following constraint that the distance between the robot and each pedestrian is greater than a safe speed-dependent distance
∀i∈, j∈
, where dmin∈
+ is a minimum allowed distance ρ∈
+ is a scaling factor. A speed-dependent term is included in the above constrain to ensure that the robot stays farther away from the humans while moving at high speed, leading to less human discomfort. Similarly, we consider an inter-robot collision avoidance constraint
We incorporate the collision avoidance constraint into the objective function as a soft constraint by using a smooth max penalty function as follows:
The smoothed max penalty function is defined as
where μ∈+ is a parameter that adjusts the smoothness of the penalty function. Additionally, the shared objective between robots i and j includes the following flocking control objective:
Hence, the individual objective can be given by:
Centralized Vs. Distributed Model Predictive Control
The following examples illustrate exemplary approaches to applying game theory to the control of navigation of multiple robots within an environment including human pedestrians.
1. Multi-Robot Coordination as a Potential Game: Given the MPC Problem (EQ. 4) for each robot, we define a game for multi-robot navigation as follows.
Definition 1. At each time step t for control/prediction horizon length t, we define an M-player game for multi-robot navigation as
:={
, {Ci,t
}, where
=
1H×
2H× . . . ×
MH while
iH is the set of all action sequences of robot i in
t over a horizon H (also referred to as a strategy set). Let ui∈
iH be a valid strategy set for robot i and a joint strategy adopted by other players be denoted by u−i:=[ujT
∈
−1H. Then, the cumulative cost for each robot i over
t is denoted by Ci,t(ui, u−i):=Ji(⋅)+J−i(⋅), which can be evaluated by the objective function in EQ. 4.
The following definition of a continuous exact potential game has been presented.
Definition 2. If for every robot i∈, ui is a nonempty, compact, and connected set (extending Assumption 1), Ci,t(⋅) is a continuous and differentiable function (Assumption 3), then the game
is a continuous exact potential game if and only if there exists a function Ft:
→
that is continuous and differentiable on
and satisfies
Remark 2. The conditions in Definition 1 hold for multi-robot cooperative navigation presented in Section IV as given in EQ. 5, the domains i and
i are nonempty, compact, and connected, while the individual and shared objective functions shown in EQ. 14 and EQ. 15 are everywhere continuous and differentiable.
Lemma 1. is a continuous exact potential game if Jij(xi, xj)=Jji(xj, xi) and the potential function at time step t is computed by:
In order to satisfy Lemma 1 for the scenario presented in Section IV, we choose ωijcoll=ωijcoll and ωijfloc=ωjifloc, ∀i, j∈, i≠j.
2. Human-Robot Interaction as a Two-Player Game: Because a Nash equilibrium can be found by optimizing the potential function of the game at each time step, the multi-robot navigation algorithm can be implemented in a centralized manner (using only a central coordinator) by solving the centralized MPC problem:
Definition 3. At each time step t, we define a two-player game for human-robot interaction as :={{
,
}, {
,
}, {Ft(
,
), It(
,
)}, where
In this game, the group of robots and the group of humans are considered two players; each player's strategy is defined by the trajectories over the next control horizon. The cost function for robots is the objective function in EQ. 18, while the cost function for human pedestrians is computed by the deviation to the predicted trajectories given by EQ. 1. In other words, it is assumed that the best strategy for human pedestrians is to follow the trajectories given by the output of the Social-LSTM model (EQ. 1). The prediction model is expected to encode the fundamentals of pedestrian motion, such as collision avoidance, path following, and comfort.
In the IBR approach, a single agent updates its strategy at each iteration based on its best response to the others' strategies. Therefore, given the human-robot interaction game (Definition3), EQ. 18 can be solved by sequentially computing the Social-LSTM prediction and optimizing the MPC objective function. All the steps may be performed by the coordinator until the difference in the potential function evaluated at two consecutive iterations is smaller than a threshold ξ∈+, or a maximum number of iterations jmax∈
is reached. The details are provided in Algorithm 1. If the algorithm converges, the converged point is a Nash equilibrium. However, it does not always converge; it may even cycle between strategies.
,
,
, t−L+1
.
using (1) given
and
, t − L+1
.
to obtain
and
.
,
).
indicates data missing or illegible when filed
The present disclosure also provides an IBR-based distributed MPC algorithm in which the coordinator and the robots collaborate and communicate to solve the problem. Each robot repeatedly solves its local MPC problem (EQ. 4) given the current best-response trajectories of other robots in parallel. It is contemplated that a distributed technique may make the system less vulnerable to any single point of failure, and may decrease the computational overhead for the coordinator. The robots' current best-response trajectories can be exchanged with the other robots through communication between the coordinator and the robots. Though the IBR approach advances towards a Nash equilibrium in a continuous exact potential game, convergence may not be achieved within a finite number of iterations. Hence, to improve the convergence of the IBR algorithm, we can modify the algorithm based on the concept of e-Nash equilibrium, defined as follows:
Definition 4. A strategy (u*1, u*2, . . . , u*M) is an ϵ-Nash equilibrium if and only if ∃ϵ∈+ such that ∀i∈
;
Combining the IBR approach for the multi-robot navigation game and the human-robot interaction game, we derive Algorithm 2 for solving the distributed MPC problem. It starts with an initial guess of the best-response trajectories for the robots. At each iteration j, the coordinator predicts
, i.e., the trajectories over the next control horizon of human pedestrians by the Social-LSTM model (EQ. 1) where the inputs utilize the robots' trajectories from the previous iteration. The coordinator transmits
to all the robots. Next, each robot solves its local MPC problem given other robots' best-response trajectories in parallel and decides to accept or reject the new solution based on whether or not it can improve the cost function by at least ϵ. Then, each robot i transmits the current best-response trajectories xi(j) to the coordinator and receives x−i(j). The algorithm is terminated if the coordinator detects that the condition Ci,t(ui(j), u−i(j))≤Ci,t(ui(j-1), u−i(j))+ϵ is true ∀i∈
, or if j=jmax.
Simulated crowd navigation experiments were conducted in Python with the CrowdNav environment, where human pedestrians are simulated using ORCA. Trajnet++ benchmark was used for training Social-LSTM models on the ETH dataset. CasADi and IPOPT solver were used for formulating and solving the nonlinear MPC problems, respectively. The framework parameters were chosen as given in Table III. The simulations were executed on an MSI computer with an Intel Core i9 CPU, 64 GB RAM, and a Geforce RTX 3080 Ti GPU.
In some simulations, due to the flocking objective, the robots move from their origins to form a flock with other robots. The flock of robots can navigate among the humans without any collisions. The robots depart from the flock when they are close to their goals, and all the robots eventually reach their goals (e.g., within 16.4 seconds(s), for a simulation with 3 robots and 7 pedestrians in a circular crossing simulation).
To further assess the effectiveness of the proposed framework, performance was analyzed on the following metrics:
Success rate: % of simulations in which all agents reach their destinations.
Average travel time: Time(s) for all robots to reach their destinations (in simulations with success).
Collision rate: % of simulations that the minimum distance between the robots and the pedestrians is less than 0.8 meters (m) (violation of personal space).
Discomfort rate: % of simulations wherein a robot's projected path intersects with a human's projected path. The projected path is defined as a line segment from the current agent's position along with the direction of the agent's velocity and the length proportional to speed.
The proportional factor is chosen as 1.2 s.
The success rate and average travel time quantify the efficiency, while the collision rate and discomfort rate are related to social conformity of the navigation algorithms. Those validation metrics are computed by averaging 1,000 simulations with randomized initial conditions, including 500 circular crossing simulations and 500 perpendicular crossing simulations, with different origins and destinations of the human pedestrians. The metrics for centralized MPC and distributed MPC are compared in Table I, with different numbers of human pedestrians. It can be observed that the performance deteriorates as the crowd density increases. The centralized and distributed MPC algorithms perform similarly, with the centralized approach taking a narrow lead across most metrics.
Furthermore, the benefits of flocking control in reducing personal space violations and discomfort to humans were evaluated. The results for CMPC with and without the flocking control objective are tallied in Table II. The collision and discomfort rates can be reduced by including the flocking objective. Still, the robots generally take longer to reach their goals and decrease the number of successful simulations. The results indicate that moving in a flock can indirectly improve social conformity, but there is a higher chance of “freezing robots” when the flock of robots cannot find a safe yet efficient trajectory to avoid deadlock. This suggests a high-level reference trajectory generator can be combined with the proposed framework to avoid such situations.
Embodiments of the multi-robot navigation control systems and methods described herein facilitate improving resulting robot movement through an environment containing humans and robots. Specifically, these methods significantly reduce robot “freeze” and jerky motions, and enable the weighting of various objectives depending on the application, such as fastest time, minimized collisions or interactions with humans, and the like. It has been established that applying these game-theoretic methods may improve the navigation and related motion of mobile robots relative to other methods.
Some embodiments involve the use of one or more electronic processing or computing devices. As used herein, the terms “processor” and “computer” and related terms, e.g., “processing device,” and “computing device” are not limited to just those integrated circuits referred to in the art as a computer, but broadly refers to a processor, a processing device or system, a general purpose central processing unit (CPU), a graphics processing unit (GPU), a microcontroller, a microcomputer, a programmable logic controller (PLC), a reduced instruction set computer (RISC) processor, a field programmable gate array (FPGA), a digital signal processor (DSP), an application specific integrated circuit (ASIC), and other programmable circuits or processing devices capable of executing the functions described herein, and these terms are used interchangeably herein. These processing devices are generally “configured” to execute functions by programming or being programmed, or by the provisioning of instructions for execution. The above examples are not intended to limit in any way the definition or meaning of the terms processor, processing device, and related terms.
The various aspects illustrated by logical blocks, modules, circuits, processes, algorithms, and algorithm steps described above may be implemented as electronic hardware, software, or combinations of both. Certain disclosed components, blocks, modules, circuits, and steps are described in terms of their functionality, illustrating the interchangeability of their implementation in electronic hardware or software. The implementation of such functionality varies among different applications given varying system architectures and design constraints. Although such implementations may vary from application to application, they do not constitute a departure from the scope of this disclosure.
Aspects of embodiments implemented in software may be implemented in program code, application software, application programming interfaces (APIs), firmware, middleware, microcode, hardware description languages (HDLs), or any combination thereof. A code segment or machine-executable instruction may represent a procedure, a function, a subprogram, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to, or integrated with, another code segment or an electronic hardware by passing or receiving information, data, arguments, parameters, memory contents, or memory locations. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.
The actual software code or specialized control hardware used to implement these systems and methods is not limiting of the claimed features or this disclosure. Thus, the operation and behavior of the systems and methods were described without reference to the specific software code being understood that software and control hardware can be designed to implement the systems and methods based on the description herein.
When implemented in software, the disclosed functions may be embodied, or stored, as one or more instructions or code on or in memory. In the embodiments described herein, memory includes non-transitory computer-readable media, which may include, but is not limited to, media such as flash memory, a random-access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), and non-volatile RAM (NVRAM). As used herein, the term “non-transitory computer-readable media” is intended to be representative of any tangible, computer-readable media, including, without limitation, non-transitory computer storage devices, including, without limitation, volatile and non-volatile media, and removable and non-removable media such as a firmware, physical and virtual storage, CD-ROM, DVD, and any other digital source such as a network, a server, cloud system, or the Internet, as well as yet to be developed digital means, with the sole exception being a transitory propagating signal. The methods described herein may be embodied as executable instructions, e.g., “software” and “firmware,” in a non-transitory computer-readable medium. As used herein, the terms “software” and “firmware” are interchangeable and include any computer program stored in memory for execution by personal computers, workstations, clients, and servers. Such instructions, when executed by a processor, configure the processor to perform at least a portion of the disclosed methods.
As used herein, an element or step recited in the singular and proceeded with the word “a” or “an” should be understood as not excluding plural elements or steps unless such exclusion is explicitly recited. Furthermore, references to “one embodiment” of the disclosure or an “exemplary” or “example” embodiment are not intended to be interpreted as excluding the existence of additional embodiments that also incorporate the recited features. Likewise, limitations associated with “one embodiment” or “an embodiment” should not be interpreted as limiting to all embodiments unless explicitly recited.
Disjunctive language such as the phrase “at least one of X, Y, or Z,” unless specifically stated otherwise, is generally intended, within the context presented, to disclose that an item, term, etc. may be either X, Y, or Z, or any combination thereof (e.g., X, Y, and/or Z). Likewise, conjunctive language such as the phrase “at least one of X, Y, and Z,” unless specifically stated otherwise, is generally intended, within the context presented, to disclose at least one of X, at least one of Y, and at least one of Z.
The disclosed systems and methods are not limited to the specific embodiments described herein. Rather, components of the systems or steps of the methods may be utilized independently and separately from other described components or steps.
This written description uses examples to disclose the various embodiments, and also to enable a person having ordinary skill in the art to practice the various embodiments, including making and using any devices or systems and performing any incorporated methods. The patentable scope of the various embodiments is defined by the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if the examples have structural elements that do not differ from the literal language of the claims, or the examples include equivalent structural elements with insubstantial differences from the literal language of the claims.
This application claims the benefit of priority to U.S. Provisional Patent Application No. 63/588,372, filed Oct. 6, 2023, the entire contents of which are hereby incorporated by reference herein.
| Number | Date | Country | |
|---|---|---|---|
| 63588372 | Oct 2023 | US |