Embodiments of the invention relate to systems of autonomous distributed agents that perform predetermined tasks.
Conventionally, systems of autonomous distributed agents may perform predetermined tasks through reinforcement learning. However, such systems lack efficiency as a result of errors in communication between the agents, and as a result of errors associated with the detectors of the agents.
There exists a need for a system and method for controlling autonomous agents to perform a predetermined task using reinforcement learning, which takes into account errors in communication and detection.
Aspects of the present disclosure provides a system and method for controlling autonomous agents to perform a predetermined task using reinforcement learning, which takes into account errors in communication and detection.
An aspect of the present disclosure is drawn to a system for performing a predetermined function within a total area of operation, wherein system includes a first autonomous agent, a second autonomous agent, and a third autonomous agent. The first autonomous agent includes a first agent detector, a first agent communication component and a first agent controller. The first agent detector is operable to detect a first agent parameter within a first agent area and to generate a first agent parameter signal based on the detected first agent parameter. The first agent controller is operable to instruct the first autonomous agent to perform an initial first agent task and to perform a subsequent first agent task. The second autonomous agent includes a second agent detector, a second agent communication component and a second agent controller. The second agent detector is operable to detect a second agent parameter within a second agent area and to generate a second agent parameter signal based on the detected second agent parameter. The second agent communication component is operable to transmit the second agent parameter signal to the first agent communication component. The second agent controller is operable to instruct the second autonomous agent to perform an initial second agent task and to perform a subsequent second agent task. The third autonomous agent includes a third agent detector, a third agent communication component and a third agent controller. The third agent detector is operable to detect a third agent parameter within a third agent area and to generate a third agent parameter signal based on the detected third agent parameter. The third agent communication component is operable to transmit the third agent parameter signal to the first agent communication component and to the second agent communication component. The third agent controller is operable to instruct the third autonomous agent to perform an initial third agent task and to perform a subsequent third agent task. The first agent communication component is operable to transmit the first agent parameter signal to the second agent communication component and to the third agent communication component. The second agent communication component is further operable to transmit the second agent parameter signal to the third agent communication component. The first agent controller is operable to instruct the first autonomous agent to perform the subsequent first agent task based on the first agent parameter signal, the second agent parameter signal, the third agent parameter signal and a predetermined reward function using reinforcement learning and a first Kalman consensus filter. The second agent controller is operable to instruct the second autonomous agent to perform the subsequent second agent task based on the first agent parameter signal, the second agent parameter signal, the third agent parameter signal and the predetermined reward function using reinforcement learning and as second Kalman consensus filter. The third agent controller is operable to instruct the third autonomous agent to perform the subsequent third agent task based on the first agent parameter signal, the second agent parameter signal, the third agent parameter signal and the predetermined reward function using reinforcement learning and a third Kalman consensus filter. The first agent area is less than and within the total area of operation. The second agent area is less than and within the total area of operation.
The accompanying drawings, which are incorporated in and form a part of the specification, illustrate example embodiments and, together with the description, serve to explain the principles of the invention. In the drawings:
The present disclosure describes a system and method for automatically selecting appropriate multi-agent behaviors in mixed cooperative-competitive control tasks. Uniquely, in a system in accordance with the present disclosure the agents only need to share local state information; this enables the multi-agent reinforcement problem (MARL) to be solved (i.e. to “converge”) even with imperfect communication between distal agents. The output is a set of local agent policies defined as πi({acute over (ω)}it, αi) for each agent i as a learned function of a local transmission parameter (ωit) and a partially observed local state (sit). An implementation of distributed consensus deep reinforcement learning, further described below, is used to accomplish this function.
In principle, Multi-Agent Reinforcement Learning (MARL) provides an attractive and flexible framework for distributed control of teams of autonomous systems. However, theoretical and practical limitations of the associated algorithms, originally designed for single-agent control tasks, undermine their optimality in multi-agent settings. In accordance with the present disclosure, a class of distributed control problems is considered in which a set of agents may leverage local, range limited communications to achieve a shared goal. The present disclosure builds on conventional work in mobilized ad-hoc networks incorporating modern techniques from deep reinforcement learning. Further, the present disclosure is not restricted a conventional purely cooperative case. On the contrary, the present disclosure is empirically shown to be feasible in partially competitive settings as well. Motivated by the distributed consensus literature, our agent control policies operate on the output of a linear combination of neighbors' transmission parameters. This approach preserves the global average, may be easily computed for each agent using only locally available information and guarantees asymptotic average consensus for mild assumptions on the stochastic communication graph.
The networked MARL problem may be formalized using similar notation as that disclosed in “Fully decentralized multi-agent reinforcement learning with networked agents” by Zhang et al. CoRR 2018. Let {G=(N, εt)}t≥0 be an undirected time-varying communications graph between N agents in the network with (i, j) ∈ εt denoting agent i and j communicate at time t. Further, let dt(i) denote the degree of node i and Nt(i) be the neighbors of i at time t. On each timestep, each agent i makes a partial observation sit of the global state S via an observation function Oi(S). The agent computes a local transmission parameter, ωit to be broadcast to Nt(i). Finally, the agent selects an available action αi ∈ Ai according to its local policy πi({acute over (ω)}it, αi). Cases are restricted to where the local parameter {acute over (ω)}it is some function F of only the current transmission parameter and local state {acute over (ω)}it=F(sit, {acute over (ω)}it). As a baseline, policies learned are asses using only the local current state F(sit, ωit)=sit and an exponential moving average of the local state F(sit, ωit)=βsit+(1−β) sit-t.
This consensus update is computed by ωit+1=Σj∈N ct(i,j){acute over (ω)}jt using message weights ct. A natural choice for consensus update weights ct may be found in known time-varying Metropolis weights. The present disclosure defines the weight on each edge ct(i, j) as proportional to the degree of the incident nodes, with self-connections weighted such that the weights at each node form a convex combination.
Distributed Kalman consensus filters are then used in which consensus estimates are integrated over each agent's local Kalman state estimate using the distributed average consensus condition defined above in Equation 1.
The control policies for each agent may be trained independently using known deep deterministic policy gradients. For each task, a cumulative return is reported for each consensus function. Via the Kalman consensus filter condition, agents propagate expected positions and velocities of all entities ωti={{circumflex over (x)}tj, {circumflex over (v)}tj}j∈N inferred from position observations and a simple linear kinematics model with constant velocity.
An example embodiment of a system and method enabling a plurality of autonomous agents to execute a predetermined task within an area of operation in accordance with aspects of the present disclosure will now be described with reference to
As shown in the figure, area of operation 100 includes a target 102, a target 104, an agent 106, an agent 108, an agent 110, agent 112 and an agent 114.
Each of agent 108, 110, 112 and 114 is an autonomous agent. Each agent has been programmed to autonomously perform tasks to execute an overall group task. In this example embodiment, the group task is to position the group of agents such that a continuous communication link is established between the agents, such that target 102 is detected by at least one agent and such that target 104 is detected by at least one agent.
As shown in
Agent 106 is positioned such that agent 108, agent 110, agent 112 and target 102 are in local detection area 202. Agent 108 is positioned such that agent 106, agent 110, agent 112, agent 114 and target 104 are in local detection area 204. Agent 110 is positioned such that agent 106, agent 108 and target 104 are in local detection area 206. Agent 112 is positioned such that agent 106, agent 108 and agent 114 are in local detection area 208. Agent 114 is positioned such that agent 108, agent 112 and target 104 are in local detection area 210.
In this example embodiment, any one agent is able to bi-directionally communicate with another agent that is within the agent's local area of detection. In particular, agent 106 is operable to bi-directionally communicate with agent 108 via a communication channel 212, to bi-directionally communicate with agent 110 via a communication channel 214 and to bi-directionally communicate with agent 112 via a communication channel 216. Agent 108 is additionally operable to bi-directionally communicate with agent 110 via a communication channel 218, to bi-directionally communicate with agent 112 via a communication channel 220 and to bi-directionally communicate with agent 114 via a communication channel 222. Agent 112 is additionally operable to bi-directionally communicate with agent 114 via a communication channel 224.
As shown in the figure, method 300 starts (S302) and parameters are detected (S304). For example, in this embodiment, an agent is able to detect its position, its velocity and its acceleration. Further, each agent is able to detect any other agents or any targets that are within its local detection area. This will be described in greater detail with reference to
As shown in
In agent 106, communication component 404 is arranged to communicate with controller 406 via a communication channel 420 and to bi-directionally communicate with communication component 412 of agent 110 via a communication channel 124. Further, detector 402 is arranged to communicate with controller 406 via a communication channel 418. Still further, controller 406 is arranged to communicate with performing component 408 via a communication channel 422.
In agent 110, communication component 412 is arranged to communicate with controller 414 via a communication channel 426 and to bi-directionally communicate with communication component 404 of agent 106 via communication channel 124. Further, detector 410 is arranged to communicate with controller 414 via a communication channel 424. Still further, controller 414 is arranged to communicate with performing component 416 via a communication channel 428.
In this example, detector 402, communication component 404, controller 406 and performing component 408 are illustrated as individual devices. However, in some embodiments, at least two of detector 402, communication component 404, controller 406 and performing component 408 may be combined as a unitary device. Further, in some embodiments, at least one of detector 402, communication component 404, controller 406 and performing component 408 may be implemented as a computer having tangible computer-readable media for carrying or having computer-executable instructions or data structures stored thereon. Such tangible computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer. Non-limiting examples of tangible computer-readable media include physical storage and/or memory media such as RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. For information transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer may properly view the connection as a computer-readable medium. Thus, any such connection may be properly termed a computer-readable medium. Combinations of the above should also be included within the scope of computer-readable media.
In this example, detector 410, communication component 412, controller 414 and performing component 416 are illustrated as individual devices. However, in some embodiments, at least two of detector 410, communication component 412, controller 414 and performing component 416 may be combined as a unitary device. Further, in some embodiments, at least one of detector 410, communication component 412, controller 414 and performing component 416 may be implemented as a computer having tangible computer-readable media for carrying or having computer-executable instructions or data structures stored thereon.
Detector 402 may be any known device or system that is operable to detect a parameter, non-limiting examples of which include non-limiting examples of which include position, velocity, acceleration, angular velocity, angular acceleration, sound, temperature, vibrations, pressure, biometrics, contents of surrounding atmosphere and combinations thereof.
As shown in the figure, detector 402 includes a plurality of parameter detectors, a sample of which are indicated as 1st parameter detector 502, 2nd parameter detector 504 and nth parameter detector 506.
In this example, the plurality of parameter detectors is illustrated as individual devices. However, in some embodiments, at least two of the plurality of parameter detectors may be combined as a unitary device. Further, in some embodiments, at least one of the plurality of parameter detectors may be implemented as a computer having tangible computer-readable media for carrying or having computer-executable instructions or data structures stored thereon.
The parameter detectors may each be a known parameter detector that is able to detect a known parameter. For example each parameter detector may be a known type of detector that is able to detect at least one of electric fields, electro-magnetic fields, position, velocity, acceleration, angular velocity, angular acceleration, geodetic position, sound, temperature, vibrations, pressure, biometrics, contents of surrounding atmosphere, a change in electric fields, a change in electro-magnetic fields, a change in velocity, a change in acceleration, a change in angular velocity, a change in angular acceleration, a change in geodetic position, a change in sound, a change in temperature, a change in vibrations, a change in pressure, a change in biometrics, a change in contents of surrounding atmosphere and combinations thereof. For purposes of discussion, let: 1st parameter detector 502 be able to detect the position of agent 106, p106; 2nd parameter detector 504 be able to detect velocity of agent 106, v106; and nth parameter detector 506 be able to detect acceleration of agent 106, a106.
In some non-limiting example embodiments, at least one of the parameter detectors of detector 402 may detect a respective parameter as an amplitude at an instant of time. In some non-limiting example embodiments, at least one of the parameter detectors of detector 402 may detect a respective parameter as a function over a period of time.
Each of the parameter detectors of detector 402 is able to generate a respective detected signal based on the detected parameter. Each of these detected signals may be provided to Kalman filter component 606 via communication channel 418, an information of the state of the environment.
Detector 410 of agent 110 operates in a manner similar to detector 402. It should be noted that, while not shown or discussed for brevity, each of agents 108, 112 and 114 have a respective detector that operates in a manner similar to detector 402.
Controller 406 may be any device or system that is operable to instruct agent 106 to perform tasks, as will be described in greater detail below.
Controller 414 of agent 110 operates in a manner similar to controller 406. It should be noted that, while not shown or discussed for brevity, each of agents 108, 112 and 114 have a respective controller that operates in a manner similar to controller 406.
Communication component 404 may be any device or system that is operable to communicate with another agent by known communication methods.
Communication component 412 of agent 110 operates in a manner similar to communication component 404. It should be noted that, while not shown or discussed for brevity, each of agents 108, 112 and 114 have a respective communication component that operates in a manner similar to communication component 404.
Performing component 408 may be any device or system that is operable to perform a predetermined task, a non-limiting example of which includes moving to a position, with a velocity and an acceleration.
Performing component 416 of agent 110 operates in a manner similar to performing component 408. It should be noted that, while not shown or discussed for brevity, each of agents 108, 112 and 114 have a respective performing component that operates in a manner similar to performing component 408.
Communication channels 418, 420, 422, 426, 424 and 428 may be any known type of communication channels, non-limiting examples of which include wired and wireless communication channels.
For purposes of this example, let the parameters being detected for an agent be the position, velocity, and acceleration of the agent. For example, for agent 106 in
Returning to
Returning to
For example, as shown in
Returning to
Returning to
Further, a controller within each agent will determine whether the current state of the agent is within a reward function. For example, as shown in
As shown in the figure, controller 406 includes a Kalman consensus filter (KCF) 602 and a task controller 604. KCF 602 includes a distributed average consensus (DAC) component 608 and a Kalman filter component 606.
DAC component 608 is arranged to communicate with communication component 404 (not shown) via communication channel 420. Kalman filter component 606 is arranged to communicate with detector 402 via communication channel 418. KCF 602 is arranged to communicate with task controller 604 via a communication channel 610. Task controller 604 is additionally arranged to communicate with performing component 408 (not shown) via communication channel 422.
In this example, KCF 602 and task controller 604 are illustrated as individual devices. However, in some embodiments, KCF 602 and task controller 604 may be combined as a unitary device. Further, in some embodiments, KCF 602 and task controller 604 may be implemented as a computer having tangible computer-readable media for carrying or having computer-executable instructions or data structures stored thereon.
In this example, DAC component 608 and Kalman filter component 606 are illustrated as individual devices. However, in some embodiments, DAC component 608 and Kalman filter component 606 may be combined as a unitary device. Further, in some embodiments, DAC component 608 and Kalman filter component 606 may be implemented as a computer having tangible computer-readable media for carrying or having computer-executable instructions or data structures stored thereon.
KCF 602 may be any device or system that is operable to output a parameter consensus based on parameters data provided by a plurality of agents, wherein the parameter consensus describes a state of the area of operation 100 at a current time, as will be described in more detail below.
Kalman filter component 606 may be any device or system that is operable to generate a local state signal associated with a parameter provided by a single agent, or as detected by agent 106, as will be described in more detail below.
DAC component 608 may be any device or system that is operable to parameter consensus based parameter data received from at least two different agents, as will be described in greater detail below.
Task controller 604 may be any device or system that is operable to generate a task instruction for performing component based on parameters detected by detector and parameters detected by and received from other agents.
In this example, for purposes of discussion only, agents within area of operation 100 do make any determinations at time t1. More specifically, for purposes of discussion, at time t1, is merely detecting parameters and exchanging the data associated with the detected parameters.
As shown in
Agent column 702 lists the agents in area of operation 100 at time t1 as shown in
Detected parameters column 704 indicates what parameters an agent detects. In this example embodiment, the detectable parameters include: the position, pi, of an agent i; the velocity, vi, of agent i; the acceleration, ai, of agent i; and the position, ptdetectable target. For example, the first entry in column 714, row 714 of table 700 is “p108, v108, a108 (t1),” which means that the received information includes data of the position of agent 108, p108, the velocity of agent 108, v108 and the acceleration of agent 108, a108, at time t1.
Detected agents/target column 706 indicates which agents and/or targets are detected by each agent, respectively.
Transmits column 708 indicates what information is transmitted by a respective agent. For example, the first entry in column 708, row 714 of table 700 is “p108, v108, a108 (t1)(*1),” which means that the received information includes data of the position of agent 108, p108, the velocity of agent 108, v108 and the acceleration of agent 108, a108, at time t1, wherein the information is provided with an identifier of (*1) for ease of future reference.
Receives column 710 indicates: the information that is received by a respective agent; the channel error parameter, γ, associated with the channel by which a respective agent receives the information; the time, t, the information is received; the detector error parameter, δ, associated with the source of the information; and an identifier, (*n) identifying the information. For example, the first entry in column 710, row 714 of table 700 is “p108, v108, a108 (δ108)(t1)(γ212)(*3),” which means that the received information includes data of the position of agent 108, p108, the velocity of agent 108, v108 and the acceleration of agent 108, coos, at time t1, wherein the information was received with detector error associated with the detector error parameter of agent 108, δ108, wherein information was additionally received with channel error associated with communication channel 222 from agent 108, γ212, and wherein the information is provided with an identifier of (*3) for ease of future reference.
It should be noted that many errors may be associated with any system. For purposes of discussion only, a channel error represented by channel error parameter, γ, and a detector error represented by detector error parameter, δ, are discussed herein. In some cases, error may be known or calculated. In accordance with aspects of the present disclosure, errors, including channel error and detector error, are accounted for with a consensus Kalman filter, whether or not such errors are known.
Determines column 712 indicates what each respective agent determines. For purposes of discussion, in this example, at time t1, each agent may detect parameters, may transmit information and may receive information. However, a determination is performed in a subsequent time, as will be described in further detail below.
Row 714 is for agent 106. As indicated in row 714, column 704, agent 106 detects its own position, p106, its own velocity, v106, and its acceleration, a106, at time t1. Further, agent 106 detects the position, p102, of target 102 at time t1.
As indicated in row 714, column 706, agent 106 additionally detects agents 108, 110, 112 and target 102. Returning to
Returning to
As indicated in row 714, column 710, agent 106 receives data (*3), data (*4), data (*5), data (*6) and data (*7). Data (*3) is the position data of agent 108, p108, the velocity data of agent 108, v108, and the acceleration data of agent 108, a108, as received from agent 108 via communication channel 212 at time t1, and having a detector error parameter δ108. Communication channel 212 will have a channel error parameter γ212 associated with transmission errors, including Gaussian noise and channel interference. A detector error parameter, δ, accounts for detector errors of the originating agent, such as for example errors associated with the output of an accelerometer of the originating agent when detecting the acceleration of the originating agent. Detector error parameter δ108 accounts for detector errors in the detectors of agent 108, whereas channel error parameter γ212 accounts for channel errors in communication channel 212.
As will be described in more detail below, KCF 602 addresses errors that are introduced into the system, which include those errors that are attributed to the communication channel and the detector error parameter δ.
As further indicated in row 714, column 710, data (*4) is the position data of agent 110, p110, the velocity data of agent 110, v110, and the acceleration data of agent 110, a110, as received from agent 110 via communication channel 214 at time t1, and having a channel error parameter γ214 and a detector error parameter δ110. Detector error parameter δ110 accounts for detector errors in the detectors of agent 110, whereas channel error parameter γ214 accounts for channel errors in communication channel 214.
Data (*5) is the position data of agent 112, p112, the velocity data of agent 112, v112, and the acceleration data of agent 112, a112, as received from agent 112 via communication channel 216 at time t1, and having a channel error parameter γ216 and a detector error parameter δ112. Detector error parameter δ112 accounts for detector errors in the detectors of agent 112, whereas channel error parameter γ216 accounts for channel errors in communication channel 216.
Data (*6) is the position data of target 104, p104, as received from agent 108 via communication channel 212, having a channel error parameter γ212 and detector error parameter δ108. As will be described below, agent 108 detects the position of target 104 because target 104 is within local detection area 204 (please see
Data (*7) is the position data of target 104, p104, as received from agent 110 via communication channel 214, having a channel error parameter γ214 and detector error parameter δ110. In this case, as shown in
As indicated in row 716, column 704, agent 108 detects its own position, p108, its own velocity, v108, and its acceleration, a108, at time t1. Further, agent 108 detects the position, p104, of target 104 at time t1.
As indicated in row 716, column 706, agent 108 additionally detects agents 106, 110, 112, 114 and target 104. Returning to
Returning to
As indicated in row 716, column 710, agent 108 receives data (*10), data (*11), data (*12), data (*13), data (*14), data (*15) and data (*16). Data (*10) is the position data of agent 106, p106, the velocity data of agent 106, v106, and the acceleration data of agent 106, a106, as received from agent 106 via communication channel 212 at time t1, having a channel error parameter γ212 and having detector error parameter δ106. In this example, δ106, accounts for detector errors in the detectors of agent 106. Further, in this example, channel error parameter γ212 accounts for channel errors in communication channel 212.
Data (*11) is the position data of agent 110, p110, the velocity data of agent 110, v110, and the acceleration data of agent 110, a110, as received from agent 110 via communication channel 218 at time t1, having a channel error parameter γ218 and having detector error parameter δ110. In this example, channel error parameter γ218 accounts for channel errors in communication channel 218.
Data (*12) is the position data of agent 112, p112, the velocity data of agent 112, v112, and the acceleration data of agent 112, a112, as received from agent 112 via communication channel 220 at time t1, having a channel error parameter γ220 and having detector error parameter δ112. In this example, channel error parameter γ220 accounts for channel errors in communication channel 220.
Data (*13) is the position data of agent 114, p114, the velocity data of agent 114, v114, and the acceleration data of agent 114, a114, as received from agent 114 via communication channel 222 at time t1, having a channel error parameter γ222 and having a detector error parameter δ114. In this example, δ114, accounts for detector errors in the detectors of agent 114, whereas channel error parameter γ222 accounts for channel errors in communication channel 222.
Data (*14) is the position data of target 102, p102, as received from agent 106 via communication channel 212 with detector error parameter δ106 and with channel error parameter γ212.
Data (*15) is the position data of target 104, p104, as received from agent 110 via communication channel 218 with detector error parameter δ110 and with channel error parameter γ218.
Data (*16) is the position data of target 104, p104, as received from agent 114 via communication channel 222 with detector error parameter δ114 and with channel error parameter γ222.
As indicated in row 718, column 704, agent 110 detects its own position, p110, its own velocity, v110, and its acceleration, a110, at time t1. Further, agent 110 detects the position, p104, of target 104 at time t1.
As indicated in row 718, column 706, agent 110 additionally detects agents 106, 108 and target 104. Returning to
Returning to
As indicated in row 718, column 710, agent 110 receives data (*19), data (*20), data (*21) and data (*22). Data (*19) is the position data of agent 106, p106, the velocity data of agent 106, v106, and the acceleration data of agent 106, a106, as received from agent 106 via communication channel 214 at time t1, having a channel error parameter γ214 and having a detector error parameter δ106. In this example, δ106, accounts for detector errors in the detectors of agent 106.
Data (*20) is the position data of agent 108, p108, the velocity data of agent 108, v108, and the acceleration data of agent 108, a108, as received from agent 110 via communication channel 218 at time t1, having a channel error parameter γ218 and having detector error parameter δ108.
Data (*21) is the position data of target 102, p102, as received from agent 106 via communication channel 214 at time t1, having a channel error parameter γ214 and having detector error parameter δ106.
Data (*22) is the position data of target 104, p104, as received from agent 108 via communication channel 218 at time t1, having an channel error parameter γ218 and having detector error parameter δ108.
As indicated in row 720, column 704, agent 112 detects its own position, p112, its own velocity, v112, and its acceleration, a112, at time t1.
As indicated in row 720, column 706, agent 112 additionally detects agents 106, 108 and 114. Returning to
Returning to
As indicated in row 720, column 710, agent 112 receives data (*24), data (*25), data (*26), data (*27), data (*28) and data (*22). Data (*24) is the position data of agent 106, p106, the velocity data of agent 106, v106, and the acceleration data of agent 106, a106, as received from agent 106 via communication channel 216 at time t1, having an channel error parameter γ216 and having detector error parameter δ106.
Data (*25) is the position data of agent 108, p108, the velocity data of agent 108, v108, and the acceleration data of agent 108, a108, as received from agent 108 via communication channel 220 at time t1, having an channel error parameter γ220 and having detector error parameter δ108.
Data (*26) is the position data of agent 114, p114, the velocity data of agent 114, v114, and the acceleration data of agent 114, a114, as received from agent 114 via communication channel 224 at time t1, having an channel error parameter γ224 and having detector error parameter δ114.
Data (*27) is the position data of target 102, p102, as received from agent 106 via communication channel 216 at time t1, having an channel error parameter γ216 and having detector error parameter δ106.
Data (*28) is the position data of target 104, p104, as received from agent 108 via communication channel 220 at time t1, having an channel error parameter γ220 and having detector error parameter δ108.
Data (*29) is the position data of target 104, p104, as received from agent 114 via communication channel 224 at time t1, having an channel error parameter γ224 and having detector error parameter δ114.
As indicated in row 722, column 704, agent 114 detects its own position, p114, its own velocity, v114, and its acceleration, a114, at time t1. Further, agent 114 detects the position, p104, of target 104 at time t1.
As indicated in row 722, column 706, agent 114 additionally detects agents 108 and 112 and target 104. Returning to
Returning to
As indicated in row 720, column 710, agent 114 receives data (*32), data (*33) and data (*25). Data (*32) is the position data of agent 108, p108, the velocity data of agent 108, v108, and the acceleration data of agent 108, a108, as received from agent 108 via communication channel 222 at time t1, having an channel error parameter γ222 and having detector error parameter δ108.
Data (*33) is the position data of agent 112, p112, the velocity data of agent 112, v112, and the acceleration data of agent 112, a112, as received from agent 112 via communication channel 224 at time t1, having an channel error parameter γ224 and having detector error parameter δ108.
Data (*34) is the position data of target 104, p104, as received from agent 108 via communication channel 222 at time t1, having an channel error parameter γ222 and having detector error parameter δ108.
Please consider the following items of note with respect to table 700 of
Among the data received by agent 106, it receives position data (*6 and *7) of target 104 from two different agents. These two different data of the position of target 104 will be address by KCF 602 at time t2, as will be described in greater detail below.
Similarly, among the data received by agent 108, it receives position data (*15 and *16) of target 104 from two different agents. These two different data of the position of target 104 will be address by the KCF (not shown) of agent 108 at time t2.
Further, among the data received by agent 112, it receives position data (*28 and *29) of target 104 from two different agents. These two different data of the position of target 104 will be address by the KCF (not shown) of agent 112 at time t2.
With respect to the operation of a controller of an agent, the operation of controller 406 will be described as an example with reference to
As shown in the figure, graph 800 includes a y-axis 802, an x-axis 804, a plurality of data points 806, 808, 810, 812 and 814 and a function 816. Graph 800 is provided for discussion purpose, wherein y-axis 802 corresponds to a position on some predetermined axis from a predetermined origin and x-axis 804 corresponds to the time a position of target 104 is detected by agent 108.
Kalman filter component 606 keeps track of the estimated position of target 104 using data provided by agent 108 and the various uncertainty of the estimate from agent 108. As data point 806 is the first data point, it has the largest uncertainty, which is represented by data point 806 being the largest circle. The uncertainty of the estimate from agent 108 decreases over time as a result of processing by the Kalman filter component 606, which is represented by the size of data points 808, 810, 812 and 814 decreasing over time. The estimate is updated based on previous estimated positions of target 104 from agent 108. Accordingly, each step in time provided a more accurate estimate of the actual position of target 104 from agent 108. Function 816 is merely provided to show a general relationship of the estimated position of target 104 as a function of time.
As mentioned, graph 800 is provided merely for discussion purposes. Kalman filter component 606 uses input for any detected parameter to determine the state of the detected environment. As such, in the example embodiment for the system discussed with reference to
As shown in the figure, graph 900 includes a y-axis 902, an x-axis 904, plurality of data points 806, 808, 810, 812 and 814 (each provided as a dotted line), function 816 (provided as a dotted line), a plurality of data points 906, 908, 910, 912 and 914, a function 916.
Kalman filter component 606 keeps track of the estimated position of target 104 using data provided by agent 110 and the various uncertainty of the estimate from agent 110. As data point 906 is the first data point, it has the largest uncertainty, which is represented by data point 906 being the largest circle. The uncertainty of the estimate from agent 110 decreases over time as a result of processing by the Kalman filter component 606, which is represented by the size of data points 908, 910, 912 and 914 decreasing over time. The estimate is updated based on previous estimated positions of target 104 from agent 110. Accordingly, each step in time provided a more accurate estimate of the actual position of target 104 from agent 110. Function 916 is merely provided to show a general relationship of the estimated position of target 104 as a function of time.
As shown in graph 900, the estimated position of target 104 using data provided by agent 108 as illustrated by function 816 is slightly different from the estimated position of target 104 using data provided from agent 110 as illustrated by function 916. These differences are the result of the differences between channel error parameters and detector error parameters of the data. From the perspective of agent 106, it might not be clear which of the two distinct position estimates of target 104, one from agent 108 as represented by function 816 and one from agent 110 as represented by function 916, more accurately reflects the actual position of target 104. KCF 602 addresses this issue, as will be discussed with additional reference to
As shown in the figure, graph 1000 includes a y-axis 1002, an x-axis 1004, plurality of data points 806, 808, 810, 812 and 814 (each provided as a dotted line), function 816 (provided as a dotted line), plurality of data points 906, 908, 910, 912 and 914 (each provided as a dashed line), function 916 (provided as a dashed line), a plurality of data points 1006, 1008, 1010, 1012 and 1014, a function 1016.
Returning to
In an example embodiment, DAC component 608 averages data of the position of target 104 that was received from agent 108 data of the position of target 104 that was received from agent 110 to determine the estimated position of target 104. Such an average would be average consensus derived from the distributed agents 108 and 110.
In other example embodiments, DAC component 608 uses predetermined weighting factors. In particular, a respective weighting factor to be multiplied by a received data may be based on many predetermined factors.
In some embodiments, a predetermined weighting factor may be based on the number instances that a parameter is measured. For example, if agent 108 were to provide more instances of a measured position of target 104 as compared to the number of instances that agent 110 may provide the measured position of target 104, then the predetermined weighting factor for agent 108 may be larger than the predetermined weighting factor for agent 110.
In some embodiments, a predetermined weighting factor may be based on the distance from which a parameter is measured. For example, if agent 108 were further from the measured position of target 104 as compared to the distance between agent 110 and the measured position of target 104, then the predetermined weighting factor for agent 108 may be smaller than the predetermined weighting factor for agent 110.
In some embodiments, a predetermined weighting factor may be based on the agents themselves. For example, if agent 108 has a position measuring device that is lesser quality or precision as compared to the position measuring device of agent 110, then the predetermined weighting factor for agent 108 may be smaller than the predetermined weighting factor for agent 110.
Returning to
In particular, in this example, KCF 602 uses data associated with data point 806 from agent 108 and data associated with data point 906 from agent 110 to estimate the position of target 104 as indicated by data point 1006, which in this case lies between data point 806 and data point 906. Similarly, KCF 602 uses: data associated with data point 808 from agent 108 and data associated with data point 908 from agent 110 to estimate the position of target 104 as indicated by data point 1008, which in this case lies between data point 808 and data point 908; data associated with data point 810 from agent 108 and data associated with data point 910 from agent 110 to estimate the position of target 104 as indicated by data point 1010, which in this case lies between data point 810 and data point 910; data associated with data point 812 from agent 108 and data associated with data point 912 from agent 110 to estimate the position of target 104 as indicated by data point 1012, which in this case lies between data point 812 and data point 912; and data associated with data point 814 from agent 108 and data associated with data point 914 from agent 110 to estimate the position of target 104 as indicated by data point 1014, which in this case lies between data point 814 and data point 914.
The estimated position of target 104 from KCF 602 is illustrated by function 1016, which lies between the estimated position of target 104 using data provided from agent 108 as illustrated by function 816 and the estimated position of target 104 using data provided from agent 110 as illustrated by function 916.
Returning to
As shown in the figure, task controller 604 includes an actor component 1102 and a critic component 1104.
Actor component 1102 is arranged to communicate with KCF 602 (not shown) via communication channel 610, to communicate with critic component 1104 via communication channel 422, to communicate with critic component via a communication channel 1106 and to communicate with performing component (not shown) via communication channel 422. Critic component is additionally arranged to communicate with KCF 602 (not shown) via communication channel 610.
In this example, actor component 1102 and critic component 1104 are illustrated as individual devices. However, in some embodiments, actor component 1102 and critic component 1104 may be combined as a unitary device. Further, in some embodiments, at least one of actor component 1102 and critic component 1104 may be implemented as a computer having tangible computer-readable media for carrying or having computer-executable instructions or data structures stored thereon.
Actor component 1102 takes in the current environment state from KCF 602 via communication channel 610 and determines the best action for agent 106 to take to accomplish the predetermined task so as to reach a goal state. In this example, the current environment includes the position, velocity and acceleration of all agents and the position of each target within area of operation 100. The determined action is output as a task instruction on communication channel 422. In this example, a task instruction may take the form of an instruction for performing component 408 to move agent 106 to a new position (for example, as will be discussed below with reference to
Critic component 1104 plays the evaluation role by taking in the current environment state from KCF 602 via communication channel 610 and the outputs a score, based on a predetermined reward function, which represents how good the task instruction is based on the current environment state. The score is output to actor component 1102 via communication channel 1106. The scores provided by critic component 1104, over time, train actor component 1102 in order to eventually have all agents arrive at the goal state.
Returning to
For purposes of discussion, let actor component 1102 generate a task instruction such that performing component 408 will move agent 106 to a new position, as indicated in
Returning to
As shown in
Agent 106 is positioned such that agent 108 and agent 110 are in local detection area 1002. Agent 108 is positioned such that agent 106, agent 110, agent 112, agent 114 and target 104 are in local detection area 1304. Agent 110 is positioned such that agent 108, agent 114 and target 104 are in local detection area 1306. Agent 112 is positioned such that agent 106, agent 108 and target 102 are in local detection area 1308. Agent 114 is positioned such that agent 108, agent 110 and target 104 are in local detection area 1310.
Agent 106 is operable to bi-directionally communicate with agent 108 via a communication channel 1314 and to bi-directionally communicate with agent 112 via a communication channel 1312. Agent 108 is additionally operable to bi-directionally communicate with agent 110 via a communication channel 1316, to bi-directionally communicate with agent 112 via a communication channel 1318 and to bi-directionally communicate with agent 114 via a communication channel 1320. Agent 110 is additionally operable to bi-directionally communicate with agent 114 via a communication channel 1322.
As shown in
The columns of
As shown in
As indicated in row 1414, column 1406, agent 106 detects agents 108, 112 and target 102. Returning to
Returning to
As indicated in row 1414, column 1410, agent 106 receives data (*37), data (*38), data (*39), data (*40) and data (*41). Data (*37) is the position data of agent 108, p108, the velocity data of agent 108, v108, and the acceleration data of agent 108, a108, as received from agent 108 via communication channel 1314 at time t2, having a channel error parameter γ1314 and having detector error parameter δ108.
Data (*38) is the position data of agent 112, p112, the velocity data of agent 112, v112, and the acceleration data of agent 112, a112, as received from agent 112 via communication channel 1312 at time t2, having a channel error parameter γ1312 and having detector error parameter δ112.
Data (*39) is the position data of target 102, p102, as received from agent 112 via communication channel 1312, having channel error parameter γ1312 and detector error parameter δ112.
Data (*40) is the combination of data (*10)-(*16), as received from agent 108 via communication channel 1314. Data (*40) is therefore, returning to
It should be noted that data (*40) will have stacked errors associated with the communication channels. In particular, as discussed above and shown in
Data (*41) is the combination of data (*24)-(*29), as received from agent 112 via communication channel 1312. Data (*41) is therefore, returning to
Similar to data (*40) discussed above, data (*41) will have stacked errors associated with the communication channels. In particular, as discussed above and shown in
Column 1412 indicates the information that is determined by a respective agent. It should be noted that the information that is determined at time tn, is based on information that is received by the agent at a previous time tn_1. For example, the first entry in column 1412, row 1414 of table 1400 is “S106(*1),” which means that agent 106 determines the state of agent 106, S106, based on the data (*1), which is identified in column 708, row 714, of table 700 of
As further indicated in row 1414, column 1412, agent 106 additionally determines: the state of agent 108, S108, based on data (*3); the state of agent 110, S110, based on data (*4); the state of agent 112, S112, based on data (*5); the state of target 102, S102, based on data (*2); and the state of target 104, S104, based on data (*6) and data (*7), which agent 106 had previously received at time t1, as discussed above with reference to
With respect to determining the state of target 104, S104, please return to column 710, row 714 of
Returning to
As indicated in row 1416, column 1406, agent 108 additionally detects agents 106, 110, 112 and 114. Returning to
Returning to
As indicated in row 1416, column 1410, agent 108 receives data (*43), data (*44), data (*45), data (*46), data (*47), data (*48), data (*49), data (*50), data (*51), data (*52), data (*53) and data (*54).
Data (*43) is the position data of agent 106, p106, the velocity data of agent 106, v106, and the acceleration data of agent 106, a106, as received from agent 106 via communication channel 1314 at time t2, having channel error parameter γ1314 and having detector error parameter δ106.
Data (*44) is the position data of agent 110, p110, the velocity data of agent 110, v110, and the acceleration data of agent 110, a110, as received from agent 110 via communication channel 1316 at time t2, having channel error parameter γ1316 and having detector error parameter δ110.
Data (*45) is the position data of agent 112, p112, the velocity data of agent 112, v112, and the acceleration data of agent 112, a112, as received from agent 112 via communication channel 1318 at time t2, having a channel error parameter γ1318 and having detector error parameter δ112.
Data (*46) is the position data of agent 114, p114, the velocity data of agent 114, v114, and the acceleration data of agent 114, a114, as received from agent 114 via communication channel 1320 at time t2, having a channel error parameter γ1320 and having detector error parameter δ114.
Data (*47) is the position data of target 102, p102, as received from agent 106 via communication channel 1314, having channel error parameter γ1314 and having detector error parameter δ106.
Data (*48) is the position data of target 104, p104, as received from agent 110 via communication channel 1316, having channel error parameter γ1316 and having detector error parameter δ110.
Data (*49) is the position data of target 102, p102, as received from agent 110 via communication channel 1318, having channel error parameter γ1318 and having detector error parameter δ110.
Data (*50) is the position data of target 104, p104, as received from agent 114 via communication channel 1320, having channel error parameter γ1320 and having detector error parameter δ114.
Data (*51) is the combination of data (*3)-(*7), as received from agent 106 via communication channel 1314. Data (*51) is therefore, returning to
As discussed above with reference to data (*41), data (*51) will have stacked errors associated with the communication channels. In particular, as discussed above and shown in
Data (*52) is the combination of data (*19)-(*22), as received from agent 110 via communication channel 1316. Data (*52) is therefore, returning to
Similar to data (*51) discussed above, data (*52) will have stacked errors associated with the communication channels. In particular, as discussed above and shown in
Data (*53) is the combination of data (*24)-(*29), as received from agent 112 via communication channel 1318. Data (*53) is therefore, returning to
Similar to data (*52) discussed above, data (*53) will have stacked errors associated with the communication channels. In particular, as discussed above and shown in
Data (*54) is the combination of data (*32)-(*34), as received from agent 114 via communication channel 1320. Data (*54) is therefore, returning to
Similar to data (*53) discussed above, data (*54) will have stacked errors associated with the communication channels. In particular, as discussed above and shown in
As indicated in row 1416, column 1412, agent 108 determines: the state of agent 106, S106, based on data (*10); the state of agent 110, S110, based on data (*11); the state of agent 112, S112, based on data (*12); the state of agent 114, S114, based on data (*13); the state of target 102, S102, based on data (*14); and the state of target 104, S104, based on data (*9), which agent 108 detected itself as discussed above with reference to
With respect to determining the state of target 104, S104, please return to column 710, row 716 of
As shown in
As indicated in row 1418, column 1406, agent 110 additionally detects agents 108 and 114 and target 104. Returning to
Returning to
As indicated in row 1418, column 1410, agent 110 receives data (*57), data (*58), data (*59), data (*60) and data (*61).
Data (*57) is the position data of agent 108, p108, the velocity data of agent 108, v108, and the acceleration data of agent 108, a108, as received from agent 108 via communication channel 1316 at time t2, having channel error parameter γ1316 and having detector error parameter δ108.
Data (*58) is the position data of agent 114, p114, the velocity data of agent 114, v114, and the acceleration data of agent 114, a114, as received from agent 114 via communication channel 1322 at time t2, having channel error parameter γ1322 and having detector error parameter δ114.
Data (*59) is the position data of target 104, p104, as received from agent 114 via communication channel 1322, having channel error parameter γ1322 and having detector error parameter δ114.
Data (*60) is the combination of data (*10)-(*16), as received from agent 108 via communication channel 1316. Data (*60) is therefore, returning to
As discussed above with reference to data (*54), data (*60) will have stacked errors associated with the communication channels. In particular, as discussed above and shown in
Data (*61) is the combination of data (*32)-(*34), as received from agent 114 via communication channel 1322. Data (*52) is therefore, returning to
Similar to data (*60) discussed above, data (*61) will have stacked errors associated with the communication channels. In particular, as discussed above and shown in
As indicated in row 1418, column 1412, agent 110 determines: the state of agent 106, S106, based on data (*19); the state of agent 108, S108, based on data (*20); the state of target 102, S102, based on data (*21); and the state of target 104, S104, based on data (*18), which agent 110 detected itself as discussed above with reference to
As shown in
As indicated in row 1420, column 1406, agent 112 additionally detects agents 106 and 108 and target 102. Returning to
Returning to
As indicated in row 1420, column 1410, agent 112 receives data (*64), data (*65), data (*66), data (*67) and data (*68).
Data (*64) is the position data of agent 106, p106, the velocity data of agent 106, v106, and the acceleration data of agent 106, a106, as received from agent 106 via communication channel 1312 at time t2, having channel error parameter γ1312 and having detector error parameter δ106.
Data (*65) is the position data of agent 108, p108, the velocity data of agent 108, v108, and the acceleration data of agent 108, a108, as received from agent 108 via communication channel 1318 at time t2, having channel error parameter γ1318 and having detector error parameter δ108.
Data (*66) is the position data of target 102, p102, as received from agent 106 via communication channel 1312, having channel error parameter γ1312 and having detector error parameter δ106.
Data (*67) is the combination of data (*3)-(*7), as received from agent 106 via communication channel 1312. Data (*67) is therefore, returning to
Similar to data (*61) discussed above, data (*67) will have stacked errors associated with the communication channels. In particular, as discussed above and shown in
Data (*68) is the combination of data (*10)-(*16), as received from agent 108 via communication channel 1318. Data (*68) is therefore, returning to
As discussed above with reference to data (*67), data (*68) will have stacked errors associated with the communication channels. In particular, as discussed above and shown in
As indicated in row 1420, column 1412, agent 112 determines: the state of agent 106, S106, based on data (*24); the state of agent 108, S108, based on data (*25); the state of agent 114, S114, based on data (*26); the state of target 102, S102, based on data (*27); and the state of target 104, S104, based on data (*28) and data (*29), which agent 112 had previously received at time t1, as discussed above with reference to
With respect to determining the state of target 104, S104, please return to column 710, row 720 of
As shown in
As indicated in row 1422, column 1406, agent 114 additionally detects agents 108 and 110 and target 104. Returning to
Returning to
As indicated in row 1422, column 1410, agent 114 receives data (*71), data (*72), data (*73), data (*74) and data (*75).
Data (*71) is the position data of agent 108, p108, the velocity data of agent 108, v108, and the acceleration data of agent 108, a108, as received from agent 108 via communication channel 1320 at time t2, having channel error parameter γ1320 and having detector error parameter δ108.
Data (*72) is the position data of agent 110, p110, the velocity data of agent 110, v110, and the acceleration data of agent 110, a110, as received from agent 110 via communication channel 1322 at time t2, having channel error parameter γ1322 and having detector error parameter δ110.
Data (*73) is the position data of target 104, p104, as received from agent 108 via communication channel 1320, having channel error parameter γ1320 and having detector error parameter δ108.
Data (*74) is the combination of data (*10)-(*16), as received from agent 108 via communication channel 1318. Data (*74) is therefore, returning to
As discussed above with reference to data (*68), data (*74) will have stacked errors associated with the communication channels. In particular, as discussed above and shown in
Data (*75) is the combination of data (*19)-(*22), as received from agent 110 via communication channel 1322. Data (*75) is therefore, returning to
Similar to data (*74) discussed above, data (*75) will have stacked errors associated with the communication channels. In particular, as discussed above and shown in
As indicated in row 1422, column 1412, agent 114 determines: the state of agent 108, S108, based on data (*32); the state of agent 112, S112, based on data (*33); and the state of target 104, S104, based on data (*31), which agent 114 detected itself as discussed above with reference to
Please consider the following items of note with respect to table 1400 of
Among the data received by agent 106, data (*40) and data (*41) will be used by DAC component 608, at time t3, which will be discussed in more detail below with reference to
Further, similar to that discussed above with reference to
Still further, as mentioned previously with reference to
As shown in
As shown in
In
In
As shown in
As shown in
The columns of
As shown in
As indicated in row 1714, column 1706, agent 106 detects agents 108, 110 and 112. Returning to
Returning to
As indicated in row 1714, column 1710, agent 106 receives data (*77), data (*78), data (*79), data (*80), data (*81), data (*82), data (*83), data (*84) and data (*85). Data (*77) is the position data of agent 108, p108, the velocity data of agent 108, v108, and the acceleration data of agent 108, a108, as received from agent 108 via communication channel 1610 at time t3, having an channel error parameter γ1612 and having detector error parameter δ108.
Data (*78) is the position data of agent 110, p110, the velocity data of agent 110, v110, and the acceleration data of agent 110, a110, as received from agent 110 via communication channel 1606 at time t3, having an channel error parameter γ1606 and having detector error parameter δ110.
Data (*79) is the position data of agent 112, p112, the velocity data of agent 112, v112, and the acceleration data of agent 112, a112, as received from agent 112 via communication channel 1616 at time t3, having an channel error parameter γ1616 and having detector error parameter δ112.
Data (*80) is the position data of target 104, p104, as received from agent 108 via communication channel 1612, having channel error parameter γ1612 and detector error parameter δ108.
Data (*81) is the position data of target 104, p104, as received from agent 110 via communication channel 1606, having channel error parameter γ1606 and detector error parameter δ110.
Data (*82) is the position data of target 102, p102, as received from agent 112 via communication channel 1616, having channel error parameter γ1616 and detector error parameter δ112.
Data (*83) is the combination of data (*43)-(*54), as received from agent 108 via communication channel 1612. Data (*83) is therefore, returning to
It should be noted that data (*83) will have stacked errors associated with the communication channels. In particular, as discussed above and shown in
Data (*84) is the combination of data (*57)-(*61), as received from agent 110 via communication channel 1606. Data (*84) is therefore, returning to
Similar to data (*83) discussed above, data (*84) will have stacked errors associated with the communication channels. In particular, as discussed above and shown in
Column 1712 indicates the information that is determined by a respective agent. It should be noted that the information that is determined at time tn is based on information that is received by the agent at a previous time tn−1. For example, the first entry in column 1712, row 1714 of table 1700 is “S106(*1, *76),” which means that agent 106 determines the state of agent 106, S106, based on the data (*1), which is identified in column 708, row 714, of table 700 of
As further indicated in row 1714, column 1712, agent 106 additionally determines: the state of agent 108, S108, based on data (*3), data (*37) and data (*25); the state of agent 110, S110, based on data (*4); the state of agent 112, S112, based on data (*5), data (*38) and data (*12); the state of agent 114, S114, based on data (*13) and data (*26); the state of target 102, S102, based on data (*39) and data (*27); and the state of target 104, S104, based on data (*6), data (*7), date (*15), data (*16), data (*28) and data (*29), which agent 106 had previously received at time t2, as discussed above with reference to
With respect to determining the state of agent 108, S108, in a manner similar to the determining of the state of target 104, S104, discussed above with reference to
For the purpose of brevity, the remainder of the information listed in
Please consider the following items of note with respect to table 1700 of
In this example, as shown in column 1712 of table 1700 of
Returning to
The above discussed embodiment is provided merely for purposes of describing aspects of the present disclosure. Other multi-agents tasks may be performed in accordance with aspects of the present disclosure.
As shown in the figure, an area of operation 1800 includes a plurality of agents, a sample of which is indicated as agent 1802, a corresponding local area of operation for each respective agent, a sample of which is indicated as dotted circle 1804, and a number of connections, a sample of which is indicated as connection 1806.
In the Graph Connect task of
In this task, agents must localize one another and congregate; however, once they are connected they are also incentivized to spread as far as possible without losing their established links.
As shown in the figure, an area of operation 1900 includes a plurality of agents, a sample of which is indicated as agent 1902, a corresponding local area of operation for each respective agent, a sample of which is indicated as dotted circle 1804, a number of connections, a sample of which is indicated as connection 1906, a star 1908 and a star 1910.
In the Ad-Hoc Link task of
where δ(la, lb) is the weight of the shortest path connecting la, lb.
In the Predator-Prey task of
In a system and method in accordance with aspects of the present disclosure, the agents are required to share only local state information; this enables the multi-agent reinforcement problem (MARL) to be solved (i.e. to “converge”) even with imperfect communication between distal agents. This represents a significantly more realistic and applicable scenario for multi-agent systems acting outside of a laboratory environment. Thus, a system and method in accordance with aspects of the present disclosure makes a significant contribution to the usability of such techniques in various physical implementations.
A distributed consensus approach outperforms simpler methods in three exemplar tasks, for example as discussed above with reference to
A system and method in accordance with aspects of the present disclosure requires only for local information from each actor and does not require cooperative behavior during a multi-agent task.
It should be noted that the chosen tasks are irrelevant, in accordance with aspects of the present disclosure. A local reward policy given local state observations is sufficient to utilize this technique for any task in which actors must effect actions and receive reward for doing so. Further, actors may also communicate their action choices along with a local state. Still further, the specific level of communications reliability may be considered an independent variable, and should not change aspects of the present disclosure.
The foregoing description of various preferred embodiments have been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed, and obviously many modifications and variations are possible in light of the above teaching. The example embodiments, as described above, were chosen and described in order to best explain the principles of the invention and its practical application to thereby enable others skilled in the art to best utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims appended hereto.
The United States Government has ownership rights in this invention. Licensing inquiries may be directed to Office of Research and Technical Applications, Naval Information Warfare Center, Pacific, Code 72120, San Diego, Calif., 92152; telephone (619) 553-5118; email: ssc_pac_t2@navy.mil. Reference Navy Case No. 108875.
Number | Name | Date | Kind |
---|---|---|---|
11069082 | Ebrahimi Afrouzi | Jul 2021 | B1 |
11153503 | Ebrahimi Afrouzi | Oct 2021 | B1 |
11274929 | Afrouzi | Mar 2022 | B1 |
20160205697 | Tan et al. | Jul 2016 | A1 |
20180012137 | Wright et al. | Jan 2018 | A1 |
20180165603 | Van Seijen et al. | Jun 2018 | A1 |
20180204111 | Zadeh et al. | Jul 2018 | A1 |
20180314266 | Shalev-Shwartz et al. | Nov 2018 | A1 |
20190339688 | Cella | Nov 2019 | A1 |
20200225673 | Ebrahimi Afrouzi | Jul 2020 | A1 |
20200364456 | Tran | Nov 2020 | A1 |
20220057519 | Goldstein | Feb 2022 | A1 |
Entry |
---|
Yu-Han Chang, Tracey Ho, and Leslie Pack Kaelbling. Multi-agent learning in mobilized ad-hoc networks. Jan. 2004. |
Timothy P. Lillicrap et al. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2015. |
Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. CoRR, abs/1706.02275, 2017. |
R. Olfati-Saber. Distributed kalman filtering for sensor networks. In 2007 46th IEEE Conference on Decision and Control, pp. 5492-5498, Dec. 2007. |
L Xiao, Stephen Boyd, and Sanjay Lall. Distributed average consensus with time-varying metropolis weights. Jan. 2006. |
Kaiqing Zhang, Zhuoran Yang, Han Liu, Tong Zhang, and Tamer Basar. Fully decentralized multi-agent reinforcement learning with networked agents. CoRR, abs/1802.08757, 2018. |
Number | Date | Country | |
---|---|---|---|
20200380401 A1 | Dec 2020 | US |