1. Field of the Invention
The invention relates to computerized data modeling and more specifically to the generation of a hybrid classifier to support decision-making under uncertainty.
2. Introduction and Related Art
Uncertainty encountered in predictive modeling for various decision-making domains requires using probability estimates or other methods for dealing with uncertainty. For such modeling the probabilities must be derived using a combination of probabilistic modeling and analysis. Generally in such domains, probability-based systems should capture the analyst's causal understanding of uncertain events and system operational aspects and use this knowledge to construct probabilistic models (in contrast to an expert system, where the knowledge worker attempts to capture the reasoning process that a subject matter expert uses during analysis).
The probability-based systems that are most often used to incorporate uncertainty reasoning are Bayesian networks. A Bayesian network (BN) is a graph-based framework combined with a rigorous probabilistic foundation used to model and reason in the presence of uncertainty. The ability of Bayesian inference to propagate consistently the impact of evidence on the probabilities of uncertain outcomes in the network has led to the rapid emergence of BNs as the method of choice for uncertain reasoning in many civilian and military applications.
In the last two decades, much effort has been focused on the development of efficient probabilistic inference algorithms. These algorithms have for the most part been designed to efficiently compute the posterior probability of a target node or the result of simple arbitrary queries. It is well known that for classification purposes, the algorithms for exact inference are either computationally infeasible for dense networks or impossible for the networks containing mixed (discrete and continuous) variables with nonlinear or non-Gaussian probability distribution. In those cases, one either has to discretize all the continuous variables in order to apply an exact algorithm or rely on approximate algorithms such as stochastic simulation methods mentioned above. However, the simulation methods may take a long time to converge to a reliable answer and are not suitable for real time applications.
In practical situations, Bayesian nets with mixed variables are commonly used for various applications where real-time classification is required, as described in R. Fung and K. C. Chang. Weighting and Integrating Evidence for Stochastic Simulation in Bayesian Networks. Proceedings of the 5th Uncertainty in AI Conference, 1989. Uri N. Lerner. Hybrid Bayesian Networks for Reasoning about Complex Systems. PhD Dissertation, Stanford University, 2002. It is therefore important to develop efficient algorithms to apply in such situations. The trade-offs of some existing inference approaches for mixed Bayesian nets by comparing performance using a mixed linear Gaussian network for testing. The algorithms to be compared include: (1) an exact algorithm (e.g., Junction tree) on the original network, and (2) an approximate algorithm based on stochastic simulation with likelihood weighting [Lerner, 2002] Uri N. Lerner. Hybrid Bayesian Networks for Reasoning about Complex Systems. PhD Dissertation, Stanford University, 2002. Ross D. Shachter and Mark A. Poet. Simulation Approaches to General Probabilistic Inference on Belief Networks. Proceedings of the 5th Uncertainty in AI Conference, 1989 on the original network.
Since, in general, inference is computationally intensive, one approach is to develop a hybrid method by combining the Bayesian net with a decision tree concept. An approach called BNTree R. Kohavi. Scaling up the Accuracy of Naïve-Bayes Classifiers: a Decision-Tree Hybrid, Proceedings of the KDD-96, 1996 was developed which includes a hybrid of a decision-tree classifier and Naïve Bayesian classifier. The structure of the tree is generated as it is in regular decision trees, but the leaves contain local Naïve-Bayesian classifiers. The local Naïve-Bayesian classifiers are used to predict classes of examples that are traced down to the leaf instead of predicting the single labeled class of the leaf.
Assuming a mixed Bayesian net is given an object of the present invention, the question is how to develop an efficient algorithm for classification where the direct Bayesian inference is computationally intensive. This object is achieved in accordance with the invention by developing a corresponding decision tree given the target and the feature nodes of the Bayesian net to control the classification process. The decision tree is learned based on the simulated data using forward sampling Max Henrion, Propagation of Uncertainty in Bayesian Networks by Probabilistic Logic Sampling. Proceedings of the 4th Uncertainty in AI Conference, 1988 from the Bayesian network or the real data (if available) by which the Bayesian net was constructed from.
The above hybrid approach in accordance with the invention can be extended to dynamic Bayesian networks. Two embodiments are multiple tree projection for integration with dynamic Bayesian networks, and incremental tree update for integration with dynamic Bayesian networks
The inventive method, in all forms, is embodied in a computer program product stored on a computer-readable medium that causes the inventive method to be implemented when loaded into a computer.
Elements of the Hybrid Approach
Bayesian Network
A generic Bayesian net with mixed (discrete-continuous) variables is first considered. Without loss of generality, assuming the goal of inference is to identify the target class with the highest posterior probability of a target node S from K possible states, Sε{s1, . . . sK}, given a number of evidence/observations E. The a posterior probability of each state sk is given by
where the coefficient ck is a normalization factor, Ω is the set of unknown random variables other than the observable set E that may exist in the network.
Decision Tree
A decision tree (
The tree learning process employs information theory for the selection of attributes at the decision tree nodes. An entropy measure, described by a mathematical equation (2), is calculated and optimized at every node. It is used to determine the existence of a branch and the selection of node attribute name and its threshold value.
The Hybrid Approach (Decision Tree+Static Bayesian Network)
The hybrid approach according to the invention can synergistically combine the strengths of the two techniques. Such an approach trades off the accuracy and computation. Experimental results conducted by the patent applicants show that a significant computation saving can be achieved with a minimum performance drop.
One main difference of the approach according to the invention is that instead of using a Naïve-Bayesnet as in the aforementioned article by Kohavi, a regular Bayesian net with normal conditional independence assumption is used. The inventor has found, in general, that the performance could be poor when a Naïve-Bayesnet was used.
This hybrid approach builds a decision tree based on the target and the feature nodes of the given Bayesian net. The decision tree is constructed/learned based on the simulated data using forward sampling (See Max Henrion. Propagation of Uncertainty in Bayesian Networks by Probabilistic Logic Sampling. Proceedings of the 4th Uncertainty in AI Conference, 1988) from the Bayesian net or the data (if available) from which the Bayesian net was constructed.
In the resulting decision tree, each leaf could either correspond to a strong rule where the data fallen into the leaf is highly probable to be from the same class or a weak rule where the decision is less confident. For example, a leaf node with 1% or less data from the target with the declared ID (identification) is considered a weak rule (
To train the decision tree in accordance with the invention, the random samples are used that are obtained by the forward sampling from the Bayesian net as described earlier. An algorithm such as InferView (J. Bala, S. Baik, B K. Gogia, A. Hadjarian, Inferring and Visualizing Classification Rules. International Symposium on Data Mining and Statistics. University of Augsburg, Germany, Nov. 20-21, 2000) was used to derive the tree structure. To do so, the target node is treated as the classification node and all the evidence nodes are treated as the attribute variables. The resulting tree contained approximately 1,200 leaves.
The basic steps in the inventive computerized method can be summarized as follows:
(1) Generate random samples for evidence nodes given each target state as training data using forward sampling where each sample consists of a six-dimensional vector of real values.
(2) Learn the decision tree from the training data. Each leaf of the resulting decision tree corresponds to a rule for classifying a target ID.
(3) At each leaf, the percentage of data from the declared target ID is calculated. A leaf with this percentage below some threshold value (e.g., below 1%) is labeled as a weak rule.
(4) Generate a different set of random samples of evidence nodes to test the algorithm. Each data sample is first passed through the decision tree. Data fallen into a non-weak rule is declared as the one from the target ID designated by the rule. Otherwise, the data is sent to the Bayesian net for classification decision.
The hybrid approach described above can be extended to dynamic Bayesian networks. A dynamic BN predicts the future states of the system. Two approaches for hybrid modeling (i.e., combining decision trees with dynamic BNs) are described below.
DTBN Multiple Tree Projection
In dynamic states the data points from different states are correlated. The decision trees for the future states learned from synthetic data (similar to that described above) obtained from the dynamic BN for specific time points. Each of these trees is interfaced with a transitioned BN for a specific time point. The method is shown in
There are two kinds of voting for multiple trees; one is uniform voting and the second is called weighted voting and is based on the rule strength. The stronger rule has a higher priority to dominate the decision-making.
DTBN Method with Incremental Tree Update
The dynamic state changes gradually with time, therefore a decision tree learned from an early data set may become obsolete and have no predictive power on the new target information. Consequently, another approach is incremental decision tree learning. This approach requires an online tree to be updated incrementally as needed. It is applied to the data points for which no pre-computed (learned) tree exists. The following steps summarize this approach (schematically illustrated in
The above-described inventive method is physically implemented in the form of a computer program product embodying the inventive method, in any or all of the above forms, as computer-readable data (software) stored on a suitable medium.
Experimental Results.
First a set of 10,000 random data is generated to train the decision tree. The second set of random data is used to test the algorithm. The results are summarized in Table 1. Table 1 shows that the hybrid approach saves approximately 70% of computation with only about 1.4% reduction in performance. While the decision tree approach is the fastest, it suffers a significant performance loss.
The inventor also has investigated learning rules verses accuracy.
Although modifications and changes may be suggested by those skilled in the art, it is the intention of the inventor to embody within the patent warranted hereon all changes and modifications as reasonably and properly come within the scope of his contribution to the art.
The present application claims the benefit of the filing date of provisional application 60/556,554 filed Mar. 26, 2004.
| Number | Date | Country | |
|---|---|---|---|
| 60556554 | Mar 2004 | US |