The present invention provides methods and apparatus for a planner having adversarial reasoning. Exemplary embodiments of the invention provide an efficient way to generate plan iterations after identifying and resolving conflicts. While invention embodiments are shown and described in conjunction with illustrative examples, planner types, and implementations, it is understood that the invention is applicable to planners in general in which it is desirable to generate multiagent plans.
In one aspect of the invention, a method for generating a plan using adversarial reasoning comprises creating a first plan for a first agent and a second plan for a second agent, wherein the first and second plans are independent, identifying a conflict between the first and second plans, replanning to address the identified conflict by planning a contingency branch for the first plan that resolves the conflict in favor of the first agent, splicing the contingency branch into the first plan, and outputting the first plan in a format to enable a user to see the first plan using a user interface.
The foregoing features of this invention, as well as the invention itself, may be more fully understood from the following description of the drawings in which:
In general, the present invention provides methods and apparatus for an adversarial reasoning system, RAPSODI (Rapid Adversarial Planning with Strategic Opponent-Driven Intelligence). In an exemplary embodiment, the RAPSODI system includes a multi-agent reasoning module and a fast single agent planner module. The multi-agent reasoning module refines and expands plans for two or more adversaries by making calls to a planning service provided by the fast single agent planner. In one embodiment, the RAPSODI system employs an iterative plan critic process that results in a contingency plan for each agent, based on a best model of-their capabilities, assets, and intents. The process iterates as many times as the user wants and as long as conflicts can be found. With each iteration agents get “smarter” in the sense that their plans are expanded to handle more possible conflicts with other agents.
Before describing the invention in detail, some introductory material is provided. Adversarial reasoning is a subset of multi-agent reasoning, but agents in adversarial problems are generally not just self-interested, they are actively hostile. Adversarial reasoning aims to predict what the enemy is likely to do and then use that prediction to decide the best ways an agent can achieve its own objectives, which may include subverting the enemy's goals. Ideally, an adversarial planner should be able to suggest not only confrontational, lethal options, but also ways to avoid confrontation and to mislead the enemy.
The gamemaster module 102 refines and expands plans for two or more adversaries by constructing single-agent planning subproblems and sending them to the fast single-agent planner 104. This single-agent planner 104 provides a plan service that can be located on a different machine in the network. Also the gamemaster module 102 may connect to more than one instance of the planner at a time in order to process different parts of a problem in parallel.
Exemplary embodiments of the inventive system approach adversarial reasoning as a competition between the plans of two or more opponents, where the plans for adversaries are based on a best model of their capabilities, assets, and intents. The gamemaster module 102 embodies an iterative plan critic process that finds specific conflicts between the plans and adds contingency branches to repair the conflicts in favor of one of the agents. The system 100 can iterate as long as the user wants and for as long as conflicts are found. With each iteration, the agents get “smarter” in the sense that their plans are expanded to handle more possible conflicts with other agents. The iteratively improving “anytime” nature of this design is ideal for a decision support application in which users direct and focus the search.
While the inventive RAPSODI system is described in conjunction with the gamemaster reasoner and the single agent planner as deterministic: actions have deterministic effects, and agents know the state of the world without making observations, it is understood that the inventive system is not limited to deterministic embodiments. Although a probabilistic planner may be a better match to the real world, the computational intractability of that type of planner led us to explore a deterministic approach. Deterministic planning for a single agent is already PSPACE-complete, exponential in the number of propositions and actions, and for multiple agents it goes up by another factor. Probabilistic planning, even in the simplest case of single-agent planning with full observability, is undecidable at worst. For example, stochastic games, which extend Markov Decision Processes to multiple agents, is undecidable. The complexity of these approaches increases with the size of the state space and the length of the time horizon. Tractable approaches to probabilistic planning do exist, but they must compromise by using strategies to reduce the search space and limit the time horizon.
Early Artificial Intelligence approaches to adversarial planning, known as game theory, dealt with deterministic, turn-taking, two-player, zero-sum games of perfect information. The minimax algorithm generates the entire search space before nodes can be evaluated, which is not practical in most real-world problems. Since then, game-theory algorithms have developed to prune the search and relax the assumptions in various ways. The inventive plan critic algorithm could be viewed as a relaxation of most of the assumptions of minimax.
The known Course of Action Development and Evaluation Tool (CADET) employs a simple Action-Reaction-Counteraction (ARC) procedure during plan creation. As each action is added to a friendly plan, a likely enemy reaction is looked up from a knowledge base, then a friendly counteraction is added. This is the current state of the art. ARC is a good way to deal with the complexity of adversarial planning, but a simple action reaction lookup does not necessarily produce a strategic response to the best estimates of the enemy's goals and intent.
Texas A&M's Anticipatory Planning Support System (APSS) iteratively expands actions in a friendly plan by searching for enemy reactions and friendly counteractions, using an agent-based approach. Agents select actions to expand based on some importance criteria, and use a genetic simulator to generate different options at the most promising branches along the frontier of the search. A meta-process prevents the combinatorial search from exhausting computing resources.
Referring again to
Consider a problem with two agents: RED and BLUE. In general, our implementation handles any number of agents, with any mix of collaborative or adversarial intents. The problem is very simple in order to illustrate some features of our approach. RED is a land combat unit of two squads having as a goal to gain control of (“clear”) a building. BLUE is an opposing land combat unit of two platoons that has the same goal. Initially, BLUE knows that two RED squads are in the area, but has not yet considered the possibility that they might want to enter the building as well.
Some details of our Adversarial Planning Domain Description Language (APDDL), and excerpts of the input files used to specify this problem are given below. For now it is sufficient to point out that actions are defined in terms of required pre-conditions, and post-action effects, and APDDL includes agent-specific considerations.
An example is now presented illustrating general stages of the plan-critic algorithm. The process begins when the gamemaster module 102 tasks the planner to build a complete plan for each adversary. This means that a commander 108 doing course-of-action planning has specified the goals, capabilities, and intent of each the opposing forces (RED and BLUE in this case) in the input files, which the system will use to plan. The gamemaster module 102 formulates the single-agent planning tasks using applicable parts of the adversarial problem specification. This algorithm builds a model of each agent's behavior incrementally by searching for conflicts and integrating their resolutions, a process that approaches min-max in the limit.
Once an initial plan is made for each agent, the gamemaster module 102 begins the plan-critic iteration process illustrated in
In
In
In
Because of the iterative human-in-the-loop nature of our inventive processing, it offers the user a chance to monitor its progress and to influence its operation at each iteration. This is desirable in many situations where it is desired that the system act as a decision-support system for a user, and can act like an automated war-gaming assistant, as discussed in further detail below.
Pseudo-code for this iterative-refinement plan-critic adversarial reasoning algorithm used in the gamemaster module is set forth below
The above algorithm for adversarial reasoning finds conflicts between player's plan and every other agent's plan, finds a way to resolve one of the conflicts, and splices the resolution into the player's plan as a contingency branch. In summary, each agent takes the following steps:
Step 1: Finding a Conflict
As mentioned above, a conflict means an action in one plan interferes with an action in another plan. The planning community has a similar concept for conflicts within a singleagent plan, called mutual exclusions (MUTEX is a common abbreviation). A difference between our concept of a conflict and that described as a mutex include the fact that conflicts are anti-symmetric.
Definition 1. Subversion: Given an action a1 scheduled to be performed during some time interval [t11; t12] and an action a2 scheduled for the interval [t21; t22], then there is a conflict between a1 and a2 if the following conditions hold:
In the example above a2 subverts al. Note that we assume that the preconditions for an action must hold throughout the duration of the action, and that the effects of an action are applied only at the end of the action. This is less expressive at characterizing real world problems than the full PDDL language allows, but for our purpose is a simplifying assumption that can be made in suitable situations.
Pseudocode for an exemplary generateConflicts method is given below:
StartQ and endQ are priority queues of actions, sorted by the earliest start time and end time, respectively. StartQ is additionally sorted in priority of the player name passed in to the method, in whose favor conflicts are to be resolved. Actions are selected for processing from the start queue, and moving them to the endQ schedules them for execution. Either pop(queue) or action←queue removes the action at the top of the list. Conflicts are discovered by checking for the conflict conditions mentioned above between actions in the chosen player's plan against opponent actions.
The procedure is to simulate forward the plans of each player, starting from the root, recording every conflict between a single Course Of Action (COA) from each player's plan. A COA is a single path through a contingency plan, choosing a branch at each decision node. It starts by initializing the simulation with the initial conditions, applying the earliest action by each player, and then sequentially updating the world by interleaving actions in temporal order. Actions scheduled to execute from time [t0; t1] are allowed to successfully execute if and only if their preconditions hold in the interval [t0; t1). This is called the “serial simulation” (serial_sim in the pseudocode), because the algorithm effectively serializes the actions from among all agents(i.e., merges into a single temporally ordered list), and simulates which actions would fail due to subversion and which actions would successfully be applied to the state.
In line 10, subverters of an action are found by analyzing stateHistory to find actions that deleted required preconditions of the action. In line 15, stateHistory supports an action if all the action's preconditions are true in the state. Later, in getNextAction line 6 or 11, when an action is applied to stateHistory, the action's effects are made true in the state. Note that after the first conflict is found (e.g., another player's action deletes the add effect of the priority player's action), the state of that fact is uncertain. Therefore the method returns when the first conflicted action in player's plan is found (line 5 exits the while loop). Multiple conflicts may be returned for that player's action because it may conflict with more than one other players.
Exemplary pseudo code for getNextAction is set forth below:
The routine getNextAction, called in line 5 of generateConflicts, returns the next action in time from each player's plan. The same two priority queues, startQ and endQ, are used in both methods. GetNextAction replaces the top node on the startQ with its successor, and puts the node into the endQ where it can be processed according to end time. In lines 5 and 10 the endQ is not necessarily emptied. Actions are removed and applied only as long as their end times are not later than the node just pulled off the startQ. If node is a decision node, the method returns the next action after that.
Step 2: Finding a Resolution to the Chosen Conflict
A resolution is a fact and associated time that would resolve the chosen conflict if the value of the fact could be changed by the specified time. There are several types of resolutions for a conflict:
The exemplary method generateResolutions generates these three types of resolutions. A specific resolution will be chosen for inclusion with the original set of goals. The choice may be made by asking the user to make a choice, or a decision engine can make the choice, based on some metrics. Resolution type 1 is straightforward (see lines 1-3 of generateResolutions below). Each precondition of the conflicting action is negated and added individually to the list of candidate resolutions. This means that if we can make any one of the preconditions false, the action cannot be performed, and hence cannot lead to a conflict.
Type 2 is a generalization of Type 1 (see generateResolution, lines 4-11) and requires the information resulting from our serial simulation. The basic idea is that a chain of actions—each one providing support for the next—which eventually leads to the conflict. Interrupting this chain by negating the precondition of any action in the chain at the appropriate time would effectively prevent the conflict from arising later on. Hence, the serial simulation list is processed backwards to find the action that most recently supported each fact that we want to subvert. Then we find the actions that supported each of those facts, and put negations of their preconditions on the resolution list. Of course, this process can be repeated all the way back to the initial conditions, although we only show one step for clarity.
Type 3 resolution causes the opponent to choose a different branch on its contingentPlan tree so that the action on the current branch of the tree will not be taken (generateResolution, lines 12-25). Each decision node (dNode) in the opponent's plan is inspected. A decision node is equivalent to a chain of if-then-else-if statements. Each if-condition is a set of propositions (or their negations) whose conjunction must be true in order for that particular branch to be taken. A default case is one with no conditions, and is taken if none of the other cases are true. The strategy is to manipulate the state so that the opponent would branch differently in his contingent plan upon arrival at a decision point in his plan whereby avoiding the path that leads to the observed conflict. In military situations, this is akin to operations like “channelizing the enemy” where we cause the enemy to move in a way that is easier for us to prepare for. This is done by falsifying the condition that would cause the conflicting branch to be taken (the branch of the opponent's contingentPlan that contains the action that is in conflict with ours), and at the same time, to make one of the other branch conditions true. Due to our assumptions about the iterative method of building the opponent model, any alternative branch behavior to the current one would necessarily reduce the opponent model to a previously solved problem. The choice of which other branch is actually made true may be left up to the user or to a decision engine. The gamemaster module is only compiling the user's options into the contingentPlan.
An exemplary pseudo code for generateResolutions for generating resolutions to a chosen conflict is set forth below:
Step 3. Planning to Achieve the Resolution
By now we have found ways of resolving the conflict, and have chosen which resolution we want to implement. A resolution is just a fact that we want to negate, which will prevent the generation of a conflict. By “planning to achieve the resolution” we mean finding a plan that not only achieves our original goals, but also makes a particular fact true or false by a time deadline. The resulting plan must be spliced into the current plan no later than a time deadline that must be met to satisfy the resolution, less the makespan of the plan. We search for a partial plan iteratively, moving backward in time from the required resolution time, until we can construct a successful partial plan. Each time the planner is tasked to add the original goals plus the new resolution goal, and replan from an initial state that is a step earlier in the existing plan (an earlier action from serial sim in line 2). In addition, we constrain the planner to react to enemy actions in the serial simulation by asserting them as constraints whose form will be explained below. The process proceeds like the pseudocode for resolve(conflict, resolution) below. Note that in step 5 we are planning with the world state after a as the initial state, and the resolution added to the goals.
Exemplary pseudo code for resolve(conflict, resolution) for generating a plan to achieve the chosen resolution is set forth below:
Note that this procedure returns the first plan with which we can achieve the resolution successfully; e.g., we move backward in time looking for the first point at which we can implement the resolution and subvert the conflicting action. There is an argument for looking for the latest splice point, and it may be worth mentioning here. First, the later the splice point, the more the “element of surprise” is capitalized upon which gives the opponent less time to find alternative means to generate that same conflict. Second, the further back we place the splice point, the less accurate the current state is of predicting the opponent's intent to cause the conflict. However, in some circumstances it may be desirable to keep searching for splice points earlier in the plan to find the best place to branch. For example, a required resource may be more available at an earlier time.
The constraints are asserted to the planner in the form of “Timed Initial Literals” (TILs). As is known in the art, TILs were developed for the 2004 International Planning Competition as a way to express “a certain restricted form of exogenous events: facts that will become TRUE or FALSE at time points that are known to the planner in advance, independently of the actions that the planner chooses to execute. Timed initial literals are thus deterministic unconditional exogenous events.” Planners that are capable of processing TILs turn them into preconditions that, when active, may disallow some actions and enable others. We use them to describe the appearance and activities of an adversary at certain times and places. The consequence of using this mechanism for asserting our constraints is that the TILs are just a projection of the opponent model and simply play back a pre-determined script of propositions being asserted and negated. Therefore the single-player planning agent is not allowed to interact with these propositions, but only allowed to plan around the events. In fact, in order to allow actions to change these events, it is necessary to encode the opponent model into the planner itself. In such a case we wouldn't be able to simply substitute in any single-agent planner in the system.
Step 4. Splicing the Resolution into the ContingencyPlan
The splice method is given a plan that achieves the resolution, and a time when it should be spliced into our plan. The main purpose of splice is to figure out how to set up the decision node that will become the splice point. Again, a serial simulation is created by adding all the actions in all plans to one list, and sorting them by start time. We calculate a conjunctive set of facts that are preconditions of any opponent action that can create the conflict, and that will become the test condition in the decision node. This is done by iterating backward on the serial simulation to find the fact preconditions of the actions whose effects support the conflict fact. In general, the properties of the state that this method recommends to examine may be an inaccurate indicator of the opponent's intent to cause a particular conflict. The inaccuracy increases when there are multiple ways an opponent might cause such a conflict, in which the predictor for a single method of causing the conflict would fail.
Another issue is to figure out the splice point in the current player's Contingent-Plan. This is not obvious, because typically we are given an insertion point from the serial simulation that is just before the adversary's action that we want to subvert, and we need to translate that into a corresponding point in the current player's plan (i.e the node in the current player's plan that occurs immediately before the splice point in the serial simulation). This is implemented by traversing backward in the serial simulation to the first action that our agent owns that occurs after the insertion point. Then we traverse backward from this node in-our ContingentPlan to the first node whose parent starts prior to the other player's action. This node is the splice point, or “effectiveSP”.
The partial plan is spliced into the current player's contingent-Plan by adding a decision node linked to the partial plan. If the effectiveSP points to a pre-existing decision node, we just add a case to that node. Otherwise, we add a new decision node.
Exemplary pseudo code for splicing in a plan is set forth below:
Adversarial problems are asserted to RAPSODI in our variant of the Planning Domain Description Language, PDDL 2.2, developed for the International Planning Competitions. PDDL describes actions in terms of a predicate-logic language of precondition facts that must obtain must be satisfied for the action to fire, and effect facts that will become true when the action is applied. Durative actions can be specified, and the PDDL spec also includes quantification in preconditions.
Our Adversarial PDDL (APDDL) adds to PDDL 2.2 features to describe multiple agents with private knowledge and individual goals. An excerpt of the APDDL problem description files used to specify the problem discussed above is given in
The RAPSODI system keeps track of the sets of actions that each agent can perform and each agent's goal that must achieved. It is possible to feed each agent a separate set of facts to plan with. This is the place to feed in beliefs that each agent may hold. Note that a fact that is not referenced in the preconditions of an action is in effect a private fact. Since APDDL provides a way to specify which agents can perform which actions, a private belief is implemented by ensuring that only actions owned by a certain agent can read or write that fact.
The top-level gamemaster process shown above asks in each iteration which conflict to resolve (step 4) and which of a number of possible resolutions is most desirable to attempt (step 7). In these decisions a user applies heuristics and experience that cannot be captured in our simple problem definition format. For now, we leave this up to the user, regarding it as a positive way for the user to interact with the planning process and influence its decisions while the computer works out the details. So during each iteration of the algorithm, the user is given a choice of conflicts and resolutions for each player. However, this approach means that the planner must describe the conflicts and resolutions in a meaningful way, which is actually more difficult than having the planner make the choices. One would like to describe a conflict in a way that includes the cost of ignoring it versus the cost of dealing with it.
For example, the problem specified in
The start and end times of each action are listed on the left. BLUE is moving unit armorsqd b1 to the objective, bldg e, where a RED unit is expected. It performs a contact operation to neutralize the Red, and then a clear building action. Red's plan is to move another unit into the building and then clear the building, putting it under RED control. When we ask for conflicts from BLUE's perspective, the presence of the extra red unit in the building is flagged because it violates a constraint that one contact action only neutralizes one enemy. The system displays the conflict in terms of the two conflicting actions:
The conflict is chosen, and 5 resolutions are found. Each is a fact that, if made true, will resolve the conflict in favor of player BLUE:
The planner is tasked to find a partial plan that can implement the chosen resolution. The resolution is to bring up another BLUE platoon to attack the RED squad in a contact action. Then gamemaster merges the resolution into contingent plan. In this process it must find a partial plan that can be implemented in time, so there is an additional check for a starting time from which the resolution can be planned. Finally, the partial plan can be spliced into the main contingency plan at a decision node that contains a masking conditional that is used to decide which way to branch:
The present invention provides methods and apparatus for an iterative plan-critic technique for adversarial reasoning that has been implemented in an automated planning system, RAPSODI (Rapid Adversarial Planning with Strategic Operational Decision Intelligence). The main process, gamemaster, can connect to one or more planning services at a time over a socket. The single-agent planning could in theory be replaced by any planner that can implement the planner API.
It is understood that exemplary methods and apparatus of the invention may take the form, at least partially, of program code (i.e., instructions) embodied in tangible media 950 (
Having described exemplary embodiments of the invention, it will now become apparent to one of ordinary skill in the art that other embodiments incorporating their concepts may also be used. The embodiments contained herein should not be limited to disclosed embodiments but rather should be limited only by the spirit and scope of the appended claims. All publications and references cited herein are expressly incorporated herein by reference in their entirety.
The present application claims the benefit of U.S. Provisional Patent Application No. 60/968,987, filed on Aug. 30, 2007, which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60968987 | Aug 2007 | US |