This invention relates to probabilistic graphical models (PGMs) and more particularly to evidence decay in probabilistic trees via pseudo virtual evidence.
A probabilistic graphical model (PGM) is a data structure representing the conditional dependence structure between a set of random variables. As such, it is a compact representation of the joint distribution between the variables.
As a concrete example of a PGM, consider a simple probabilistic graph 100 shown in
Each random variable is discrete with variable Iris having three possible values corresponding to a species of Iris. The other two variables have two values each. Sepal_width has values LT_4 and GTE_4. These correspond to sepal widths less than 4.0 centimeters and sepal widths greater than or equal to 4.0 centimeters, respectively. Petal_width has values LT_1P5 and GTE_1P5. These correspond to petal widths less than 1.5 cm and petal widths greater than or equal to 1.5 cm, respectively.
The conditional dependence between these variables is represented by the tree structure of the model in which Iris is a parent node and the two width nodes are children. This encodes the fact that the probability of a width variable having a certain value is determined by the species of the parent variable. These probabilities are given in conditional probability tables (CPTs) 202, which are shown in
Belief vectors 200 for a node are computed using a process called inference. A standard method of inference in PGMs is belief propagation (BP), which is described below. In
The PGM is represented by a probabilistic trees structure including parent and child nodes, some of which represent unobservable (query) variables and others of which represent observable (evidence) variables. Evidence nodes correspond to variables whose values can be directly measured. In the example above, the two width nodes are evidence nodes. Query nodes correspond to variables whose values cannot be directly measured or observed. In this example, the Iris node is a query node since species is not something that can be directly measured, at least in an automated manner. The usefulness of a PGM lies in the process of using measurements regarding observable evidence nodes to update the belief regarding non-observable query nodes. For example, an automated system measuring sepal width and petal width could be used to compute the probability (i.e. belief) that a given example of Iris is a particular species. This is accomplished by setting a given evidence value in the model and subsequently performing inference. For example, suppose a flower instance was measured (observed) to have a petal width of 0.9 cm. In this case, evidence is set corresponding to Petal_Width=LT_1P5. The probability of LT_1P5=1 and the probability of GTE_1P5=0. The resulting belief vectors 204 are shown in
Belief propagation (BP) is a method for performing inference in a probabilistic graphical model. In other words, it is a method for computing node beliefs, either when a model is first instantiated or when something changes in the model such as evidence being applied or removed. BP was described by Judea Pearl “Probabilistic Reasoning In Intelligent Systems”, Morgan Kaufmann Publishers, 1988 and still remains a standard approach for performing inference in PGMs. For PGMs in tree or polytree topologies, it provides exact belief values whereas for more general PGM topologies, it only provides approximate results.
BP is based on a message passing process in which each node performs local computations and then constructs messages that are sent to neighboring nodes. Each node's local computations take into account the messages it has received from neighbors. In this manner, the effect of information stored at one node propagates to the rest of the model. For a tree topology 300, like that of
1. Starting at leaf nodes, lambda messages 302 are passed upward to parent nodes. Each node waits until it has received messages from all of its children before composing its own lambda message to send to its parent. This step is complete when the root node has received messages from each child.
2. Starting at the root node, pi messages 304 are sent downward to child nodes. Each node receiving a pi message from a parent then composes a new pi message to send to each of its children. This step terminates when each leaf node has received a pi message from its parent.
For example, leaf nodes Y and Z pass lambda messages 302 to parent node X, which then computes its own lambda message. Leaf nodes V, X and W then compose and pass lambda messages to parent and root node U. Root node U passes pi messages 304 down to child nodes V, X and W and node X in turn sends pi messages to child nodes Y and Z.
Pi messages 304 are vectors sent from a node to each child node that represent the probability of the sending node, given the evidence contained in the sub-network of the model consisting of the sending node and above. For example, node U's pi message to child node X is denoted πX(u) and is defined as:
πX(u)=P(u|eX+)
Here, eX+ represents the evidence in the model above node X and thus includes any evidence applied to nodes U, V or W. The phrase above node X means all nodes that are not in a sub-network in which X is the root node. The pi message is a vector of probabilities, one for each value of u.
Lambda messages 302 are vectors sent from a node to the node's parent node that represent the probability of the evidence in the model in a sub-network below the parent, given the parent value. For example, node X's lambda message to parent node U is denoted λX(u) and is defined as:
λX(u)=P(eX−|u)
Here, eX− represents the evidence in the model below node U and thus includes any evidence applied to a sub-network of nodes X, Y or Z. The lambda message is a vector of probabilities, one for each value of u.
A node computes its own belief vector once it has received all expected pi and lambda messages. It does this by computing a pi value and a lambda value and then using these to compute belief.
The pi value represents the probability of a node's value given all evidence in the model above the node. For example, node X's pi value is denoted π(x) and is defined as:
π(x)=P(x|eX+)
This value is computed using the pi message from the parent and the CPT values:
Note that P(x|u) is obtained from node X's CPT.
The lambda value represents the probability of the evidence below the node, given a particular node value. For example, node X's lambda value is denoted λ(x) and is defined as:
λ({dot over (x)})=P(eX−|x)
This value is computed using the lambda messages from the child nodes:
λ(x)=λY(x)λZ(x)
Having computed the pi and lambda values, a node can then update its belief as follows:
BEL(x)=αλ(x)π(x)
Where α is a normalizing constant.
Certain evidence nodes may be both observable and have one or more child evidence nodes. Belief computation accounts for both observations of the random variable and evidence provided by the children.
PGMs are used to intelligence, surveillance and reconnaissance (ISR) systems and other systems, which often face limited collection (observation) opportunities. In these systems, evidence collected in the past, such as the fact of a ship being docked at a particular port, represents knowledge that comes under greater and greater doubt as time passes and the state of the object under surveillance has more and more opportunity to change. If the system is unable to update knowledge via re-observation, then the system is forced to somehow represent increasing doubt in the ISR models. This process of doubt increase is referred to here as “evidence decay”.
The present state-of-art for handling evidence decay in PGMs is the timer method. Under the timer method, a timer is started when an observation is made and evidence resulting from the observation is set in a model. Then, when the timer expires, the observations are removed from the relevant variables. The timer is set to a variable length of time representing the maximum duration over which the results of a particular observation are trustworthy.
As an illustration of the timer method, consider a simple probabilistic graph such as model 100 shown in
PGMs as constituted only allow for an observation of a particular value or state of a random variable, which then sets that value to a probability of one and the remaining values to a probability of zero. The observation may either be set in the model or removed. There is no mechanism to “decay” the observation over a period of time to provide a smooth, continuous transition of the node belief from the certainty of the observation to what would be dictated by the model absent the observation, it is all or nothing. Under the timer method this produces beliefs for observed nodes that are artificially inflated over a period of time and a discontinuity in the belief of the observed nodes and nodes to which it provides evidence.
The following is a summary of the invention in order to provide a basic understanding of some aspects of the invention. This summary is not intended to identify key or critical elements of the invention or to delineate the scope of the invention. Its sole purpose is to present some concepts of the invention in a simplified form as a prelude to the more detailed description and the defining claims that are presented later.
The present invention provides for the decay of evidence in PGMs in which evidence node beliefs, following an observation of the node's random variable, decay in a smooth, continuous manner over a decay period to a target belief in a manner that is determined and updated via pseudo virtual evidence.
This is accomplished within the PGM construct by creating virtual evidence nodes that create and send lambda messages that when combined with the other evidence force a specified belief onto the parent (decaying) evidence node. Observations are treated as before to assign a deterministic state to random variable of a node, that state being assigned a probability of one and the remaining states a probability of zero. Unlike the timer method, the observations are not held over the decay period. Following an observation, a target belief is computed by executing belief propagation on the model absent evidence of a given decay node. The virtual evidence node computes a step along a path from the current belief (initially the deterministic state) to the target belief to determine the specified belief BEL*. The virtual node uses the pi value π(x) for the decaying node x to determine the lambda message required to force the specified belief BEL* onto the parent bide. The lambda message constitutes “pseudo” virtual evidence derived from the model; it does not represent actual observed evidence. After each application of evidence, belief propagation is executed to process the lambda and pi messages to update node beliefs, target beliefs for decaying nodes are updated and observation evidence is removed. As time progresses over the decay period, the observation and the initial deterministic state is de-emphasized relative to the belief assigned to a node by the model absent the observation providing for a smooth, continues transition at the end of the decay period.
These and other features and advantages of the invention will be apparent to those skilled in the art from the following detailed description of preferred embodiments, taken together with the accompanying drawings, in which:
For intelligence, surveillance and reconnaissance (ISR) systems and other systems that often face limited collection (observation) opportunities, a better solution for “decay evidence” in PGMs than the timer method in which observations are artificially maintained over a decay period causing node beliefs to exhibit a sharp discontinuity at the end of the decay period is desired. Following an observation of a node's random variable, the node belief should decay in a smooth, continuous manner over a decay period from the deterministic state to a target belief the node has, or will have, when the evidence in question is completely removed from the model at the end of the decay period. This solution must reside within the PGM construct and belief propagation.
The “decay evidence” approach leverages the concept of “virtual evidence” as described by Pearl. Under this approach, a virtual evidence node 500 is conceptually instantiated in a simple model 502 for a given evidence node 504, as shown in
Virtual evidence is a category of“soft” evidence that is also referred to as likelihood evidence as it conveys the likelihood of a node's values, as opposed to the actual probability distribution of those values. For example, the virtual evidence vector [0.5, 0.25, 0.25] indicates that the first value is twice as likely as the other two but does not indicate that the probability of the first value is 0.5.
Virtual evidence corresponds to lambda messages from the virtual evidence node to the parent. Since the belief for virtual nodes are irrelevant, there is no need to send pi messages to them and no need for belief computation within them. Consequently, the virtual nodes are not implemented as actual nodes in the model. Instead the virtual nodes are implemented as special lambda messages.
Furthermore, Pearl's virtual evidence, which is not used to decay evidence, represents actual observed evidence. The evidence originates from a source external to the model but nevertheless represents observed evidence of some sort. The “decay evidence” approach uses “pseudo” virtual evidence that is derived entirely from the model itself; it does not represent actual observed evidence.
This approach creates virtual evidence nodes that send lambda messages that when combined with the other evidence “force” a belief onto the parent evidence (decaying) node. Observations are treated as before to assign a deterministic state to random variable of a node, that state being assigned a probability of one and the remaining states a probability of zero. Unlike the timer method, the observations are not held over the decay period. Following an observation, a target belief is computed by executing belief propagation on the model absent evidence of a given decay node. The virtual evidence node computes a step along a path from the current belief (initially the deterministic state) to the target belief to determine the specified belief BEL*. The virtual node uses the pi value π(x) for the decaying node x to determine the lambda message required to force the specified belief BEL* onto the parent node. The lambda message constitutes “pseudo” virtual evidence derived from the model; it is does not represent actual observed evidence. After each application of evidence, belief propagation is executed to process the lambda and pi messages to update node beliefs. The target beliefs for decaying nodes are updated and observation evidence is removed. As time progresses over the decay period, the observation and the initial deterministic state is de-emphasized relative to the target belief assigned to a node by the model absent the observation providing for a smooth, continues transition at the end of the decay period.
Referring now to
However, the knowledge does not discontinuously change at times 40 and 90 at the end of the decay periods for the observations for variables B and C. During the period following the observation of an evidence random variable, confidence in the knowledge reduces monotonically, eventually becoming zero. During this knowledge “decay period”, it makes sense that the current “forced” belief should change in a smooth manner toward the belief value it has when the evidence in question is completely removed from the model. As shown, the beliefs 602 and 604 decay smoothly and continuously from the time of the observation over the decay periods 606 and 608, respectively, to the belief when all evidence is removed. As a result, belief 600 of the parent node does not exhibit discontinuous changes at the end of decay periods. Note, as the target belief for a given decaying node is computed and updated by executing belief propagation on the model absent evidence of the decaying node that target belief can and will vary based on the evidence in the rest of the model.
To achieve the “decay” of an evidence node's belief, virtual evidence is used in a non-standard way to “force” a desired belief distribution on an evidence node. This enables moving the belief associated with a given evidence variable along a trajectory in belief space that represents decreasing confidence. The evidence decay approach maintains a belief target for every decaying evidence node. The vector from the present node belief to this decay target belief represents the belief path to follow during the decay process, as discussed above. The velocity with which the specified belief moves along the trajectory is based, at least in part, on the trajectory length and the remaining decay time. The belief target for an evidence node is defined as the belief for that node given all other evidence in the model, except any evidence for the node in question. Thus, if evidence is applied to a node, it must be removed and the model updated in order to obtain the target belief.
Consider a 3-dimensional evidence variable, V, with belief, BEL(V)=(x,y,z). V is observed and BEL(V) is assigned a certain value of P1=(0,1,0) 700 as shown in
At the next update, the node belief is forced from P1 to P* 706 by a step 708 along a trajectory 710 between P1 and P2. Belief propagation is applied to the model to update beliefs and then for each decaying node is again applied to the model absent evidence of the decaying node to generate the updated target beliefs BEL*. In this example, belief propagation produces a resulting belief P3=(0.2 0.1 0.7) 712, which is saved as the target belief. At the next update, the node belief is forced from P* 706 to P** 714 by a step 716 along a trajectory 718 between P3 and P*. The process repeats at each evidence update until reaching the end of the decay period. This ensures smooth changes in the evidence node's belief and also accomplishes the goal of achieving smooth changes in target node belief during the evidence decay process.
Under BP, posterior probabilities in a probabilistic model (i.e. beliefs) are updated via a message passing algorithm [Pearl]. Belief at node X is computed as:
BEL(x)=αλ(x)π(x) (1)
Here, π(x) represents the probability of x given evidence in the model above node X (i.e. the parent's network excluding node X) and λ(x) represents the probability of the evidence below node X, given the observed value of X. The latter is given by the product of lambda messages from the children of node X:
Where CH(X) is the set of nodes corresponding to X's children. The model is only concerned with controlling belief in evidence nodes, which may or may not have child nodes. A conceptual child node corresponding to a virtual evidence node, X*, sends a lambda message λX*(x). This is a conceptual node only since it is not necessary to actually instantiate the virtual evidence node in the model.
Given a specified belief to force on node x, BEL*, solve for the corresponding lambda message necessary under (1):
Here, all quantities are vectors and the division is performed element-wise. This lambda message is pseudo virtual evidence because it does not represent actual observed evidence. Instead, it represents the virtual evidence implied by the belief to be imposed. Thus, belief is forced on an evidence node by setting the node's lambda value according to (2) and executing the standard BP algorithm.
Referring now to
A PGM is a model of random variables whose conditional dependence is represented by a probabilistic tree structure including parent and child nodes that represent unobservable query and observable evidence random variables. Each node (x) has a belief vector BEL(x) of n possible values whose probabilities sum to one for a random variable. The belief vectors BEL(x) are computed by inference using belief propagation (BP) in which lambda messages representing the probability of a sub-network below the parent node given the belief of the parent node are passed upwards to the parent nodes and pi messages representing the probability of a sub-network including the parent node and above are passed downward to each child node.
The method initializes the PGM using belief propagation on the tree structure to initialize the beliefs for all nodes (step 802). The method creates virtual evidence nodes for evidence nodes that may or will be decayed (step 804). This may be done as part of the initialization process as shown or created on the fly following the observation of a random variable associated with a particular evidence node.
Upon occurrence of an evidence update or “trigger” (step 806), which may be the result of either an asynchronous observation of a random variable (step 808) or a synchronous update of a decaying random variable (step 810), the method applies the new evidence to the model (step 812). The method applies “conventional evidence” to the model by updating the evidence node belief based on an observation (step 814) for every evidence node with a corresponding observation. The deterministic state associated with the observation is applied once and is not held. The virtual evidence nodes apply “decaying evidence” to the model by computing a step along a path from the node's current belief to a target belief to determine a specified belief and generating a lambda message that when combined with other evidence in the model forces a specified belief onto the decaying node e.g., any evidence node within its decay period following an observation (step 816). This is done using a belief forcing algorithm shown in
Once all of the evidence has been applied to the model, the method executes belief propagation on the model to process all of the evidence including the “special” lambda messages from the virtual evidence nodes to update node beliefs (step 818). Any time evidence is applied to a node, either as conventional evidence in a non-decay context, or as pseudo virtual evidence in a decay context, the belief target for all other nodes potentially changes. Thus, it is necessary to update these belief targets every time evidence is set anywhere in the mode.
For each decaying evidence node (including those for which an observation was just made), the method removes observation evidence at the onset of the decay period (step 819) and executes belief propagation on the model without the evidence of the particular decaying evidence node to update the target belief for that node (step 820). This is done using an update target belief algorithm shown in
The model waits for the next trigger (step 806) and repeats the process, updating the joint probability distributions over all of the nodes represented by the model and the posterior probability distribution for each node. In particular, the model provides the posterior probability distribution for non-observable query nodes. These belief vectors provide the likelihood of some physical attribute or characteristic of an object. For example, the belief vector may provide the likelihood of a flower being a particular type or Iris.
This process requires more executions of the BP algorithm than is necessary under the timer method; however it allows the process to precisely decay evidence in a smooth manner. Note that every time a variable's evidence changes, either through decay or via an observation, it is necessary to update the belief target of every other variable in the model. Thus, the decay trajectories are not necessarily lines in belief space since the targets do not remain fixed.
One limitation of the approach described above is that the resulting belief in the evidence nodes is non-commutative. This is because the order in which the decay value is applied to nodes under the decay process will affect the results. This means that only the last node for which a belief was forced will have the forced value. All others will be perturbed slightly off of their forced values by the subsequent variables. This should not prevent the overall goal from being achieved, i.e. smooth changes in belief in the target variable but could be considered undesirable. The effects of this on the evidence variables could be minimized by decaying variables on small time intervals and randomizing the order in which the variables are selected during the update loop.
In an alternate embodiment, to reduce computation complexity, the method may calculate the updated target belief only once for an observed node, either the current belief just prior to the observation or an updated target belief immediately following the observation. The deterministic state (belief) associated with the observation would be stepped over the decay period to this target belief. If the model generates a target belief for the node (sans the node evidence) that is relatively stable over the decay period, the overall goal of a smooth decay and transition at the end of the decay period should be achieved. If the target belief changes considerably due to new evidence at other nodes, the decay although smooth may exhibit a degree of discontinuity at the end of the decay period.
Referring now to
In an embodiment, the belief forcing algorithm receives as inputs knowledge of the current time t, decay end time T, step duration d and target belief BT to force a belief onto a decaying node node (step 902). The forcing algorithm saves the current belief (node.BEL) in variable B (step 904) and computes a belief path ΔB as the difference between the target belief BT and current belief B (step 906). The forcing algorithm computes a unitless measure α as the ratio of step duration d to the remaining time in the decay period (T−t) (step 908) and a unitless measure β as a function of α (step 910).
The forcing algorithm multiplies a step size (α*ΔB) with a scale factor (β/α) and adds that to the current belief B to get the specified belief BEL* (step 912). The forcing algorithm divides the specified belief BEL* by the other evidence for the parent node (pi value π(x)) to determine the lambda message (node.λ) that when combined with the other evidence will force the specified belief onto the parent node (step 914). The forcing algorithm then sends the lambda message to the parent node (step 916) and returns (step 918).
Generally speaking α and β can be computed in many different ways as long as the current belief B is stepped along the path towards the target belief BT. The steps should be small enough to maintain a smooth, continuous decay and large enough to reach the target belief at the end of the decay period to avoid a discontinuity.
In this particular embodiment, the function for computing α in step 908 provides for equal length steps down the path under certain conditions. Assuming that the target belief does not change (which it will), that the number of steps n in the decay period is an integer number and that the scale factor is a constant of one, the update of BEL* in step 912 becomes BEL*=B plus ΔB (for the 1st step) divided by the number of steps n of equal step duration d. For example, if n=10, α steps from 1/10, 1/9, . . . 1/2 to 1 while ΔB steps from ΔB, (9/10*ΔB), . . . . (2/10*ΔB) to (1/10*ΔB). Under actual conditions in which the target belief does change and the decay period is not an integer number of steps, the step size will vary somewhat but the goal of taking approximately equal steps throughout the decay period to de-emphasize the initial observation will be maintained.
The function β=f(α) 920 in step 910 for computing the scale factor β/α provides for the ability to control the rate of decay over the decay period. For simplicity, assume α provides for equal step sizes over the decay period. If β=α the scale factor is one throughout the decay period. The emphasis of the initial observation will decay at a constant rate 922. If f(α) is a function whereby the scale factor starts at a value >1 and is reduced as α increases with time, the emphasis of the initial observation will decay quickly at first and then slow down 924. Conversely, if f(α) is a function whereby the scale factor starts at a value <1 and is increased as α increases with time, the emphasis of the initial observation will decay slow at first and then speed 924.
These embodiments are merely illustrative of a way to step along the path from the current belief towards the target belief Alternately, one could forego the computations of α and β and simply fix the value of β in step 912 at a certain percentage e.g. 50%. In this case, the forcing algorithm would take a step of 50% of the current path ΔB. Unless the target belief is fluctuating wildly, which is unlikely, AB should be getting smaller and smaller and thus stepping halfway down the path at each update should converge to approximately the target value at the end of the decay period.
Referring now to
Referring now to
At noon, with the sun 120 out the helicopter 1106 images the ship at the dock and determines that the supply trucks are present. The visible light sensor is unable to determine whether the ship's engines are running or not. As shown in
At midnight, with the moon 122 the UAV 1108 takes an IR image, which reveals that the ship's engines are running. The IR image cannot determine the presence of the supply trucks. Although there is an EO visible light sensor on the UAV, it is too dark to see the trucks. This means the system cannot re-observe the presence of trucks.
As shown in
Generally speaking the PGM is represented by a probabilistic trees structure including parent and child nodes, some of which represent unobservable (query) variables and others of which represent observable (evidence). Evidence nodes correspond to variables whose values can be directly measured. The unobservable variables represent a state for one or more objects. For example, the state may be a detection, classification, identification, or operating status of the object. The observable variables represent a physical attribute of the object that provides evidence as to the state of that object. For example, the physical attribute could size, shape, color, heat signature, RF signature or countless other physical attributes that provide evidence. The “decay evidence” approach processes the observations of the physical attributes to update beliefs for the unobservable variables for the state of the object(s).
The machine 1200 may include processors 1210 and memory 1230, which may be configured to communicate with each other such as via a bus 1202. In an example embodiment, the processors 1210 (e.g., a Central Processing Unit (CPU), a Reduced Instruction Set Computing (RISC) processor, a Complex Instruction Set Computing (CISC) processor, a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Radio-Frequency Integrated Circuit (RFIC), another processor, or any suitable combination thereof) may include, for example, processor 1212 and processor 1214 that may execute instructions 1216. The term “processor” is intended to include multi-core processor that may comprise two or more independent processors (sometimes referred to as “cores”) that may execute instructions contemporaneously. Although
The memory/storage 1230 may include a memory 1232, such as a main memory, or other memory storage, and a storage unit 1236, both accessible to the processors 1210 such as via the bus 1202. The storage unit 1236 and memory 1232 store the instructions 1216 embodying any one or more of the methodologies or functions described herein. The instructions 1216 may also reside, completely or partially, within the memory 1232, within the storage unit 1236, within at least one of the processors 1210 (e.g., within the processor's cache memory), or any suitable combination thereof, during execution thereof by the machine 1200. Accordingly, the memory 1232, the storage unit 1236, and the memory of processors 1210 are examples of machine-readable media.
As used herein, “machine-readable medium” means a device able to store instructions and data temporarily or permanently and may include, but is not be limited to, random-access memory (RAM), read-only memory (ROM), buffer memory, flash memory, optical media, magnetic media, cache memory, other types of storage (e.g., Electrically Erasable Programmable Read-Only Memory (EEPROM)) and/or any suitable combination thereof. The term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store instructions 1216. The term “machine-readable medium” shall also be taken to include any medium, or combination of multiple media, that is capable of storing instructions (e.g., instructions 1216) for execution by a machine (e.g., machine 1200), such that the instructions, when executed by one or more processors of the machine 1200 (e.g., processors 1210), cause the machine 1200 to perform any one or more of the methodologies described herein. Accordingly, a “machine-readable medium” refers to a single storage apparatus or device, as well as “cloud-based” storage systems or storage networks that include multiple storage apparatus or devices. The term “machine-readable medium” excludes signals per se.
While several illustrative embodiments of the invention have been shown and described, numerous variations and alternate embodiments will occur to those skilled in the art. Such variations and alternate embodiments are contemplated, and can be made without departing from the spirit and scope of the invention as defined in the appended claims.