Claims
- 1. A method of forming a decision tree that comprises a hierarchical set of nodes and that predicts properties of interest of physical items by testing characteristics of physical items against defined thresholds at each of a series of nodes of said hierarchy, said physical items having two or more known characteristics and at least one unknown property of interest, said method comprising defining tests or thresholds for a series of two or more nodes in said hierarchy by determining the total combined predictive accuracy of said series of nodes under a plurality of alternative test or threshold definitions for at least one of the nodes in said series.
- 2. The method of claim 1 wherein the series of two or more nodes separate physical items into a plurality of classes based on the properties of interest.
- 3. The method of claim 1 wherein one of the plurality of alternative tests or thresholds is selected for each node having said plurality of alternative tests or thresholds by choosing alternative tests that yield highest combined predictive accuracy.
- 4. A method of forming a decision tree, said decision tree operating to provide predictions of at least one attribute of interest of a physical item, wherein the physical item has one or more measured or computed descriptors representative of one or more physical characteristics of the physical item, and wherein the physical item has one or more unknown attributes of interest which have not been physically measured or otherwise previously determined, the decision tree comprising two or more nodes in a hierarchical structure, wherein each node is defined by a test based on one or more of the physical items' descriptors, wherein the result of the test at a node operates to classify the physical item into one of two or more classes, and wherein the classification made at any given node determines the identity of a node lower in the hierarchy which is next applied to the physical item until a final node in the hierarchy is reached, and wherein the classification made by the final node comprises a prediction of the behavior of the physical item with respect to the unknown attribute of interest, the method of forming the decision tree comprising:
providing a set of training physical items, each training physical item in said set of training physical items having one or more measured or computed physical descriptors and one or more attributes of interest which have been physically measured or previously determined; defining a set of two or more alternative tests for defining a selected node of the decision tree; defining at least one test for defining at least one node lower in said hierarchy for each of said alternative tests previously defined for said selected node higher in the hierarchy; determining the results of applying at least said selected node and said at least one node lower in said hierarchy to said set of training physical items; and choosing one of said alternative tests from said set to define said selected node based on said determined results.
- 5. The method of claim 4 wherein said one of said alternative tests is chosen by choosing as said one of said alternative tests a test that yields highest accuracy in predicting the one or more attributes of interest of the set of training physical items.
- 6. The method of claim 4 further comprising the step of dividing the set of training items into clusters having similar descriptor values and attributes of interest and wherein one of said two or more alternative tests is defined based on an improvement in cluster purity when said set of training items are classified by said test.
- 7. The method of claim 4 wherein one of said two or more alternative tests is defined based on an improvement in class purity when said set of training items are classified by said test.
- 8. A method of forming a super tree, said super tree representing a plurality of decision trees, each decision tree operating to provide prediction of at least one property of interest of a physical item, wherein the physical item has one or more measured or computed descriptors representative of one or more physical characteristics of the physical item and one or more physical properties of interest, and wherein said super tree comprises a plurality of nodes in a hierarchical structure, said plurality of nodes defined by at least one test based on at least one of said one or more physical characteristics, at least one of said plurality of nodes comprising a plurality of alternative tests such that each alternative test results in an alternative decision tree, the method comprising:
providing a set of training physical items, the set of items having one or more physical descriptors and one or more physical properties of interest; creating a first node comprising two or more alternative tests based on one or more of the set of items' physical descriptors, wherein each test operates to classify training items into two or more classes; and creating one or more additional nodes that comprise two or more alternative tests based on one or more of the set of items' physical descriptors, wherein each additional node operates on one of the two or more classes created by the first node and each alternative test in each additional node test operates to further classify the training items into two or more classes for each additional node.
- 9. A method of pruning a decision super tree comprising discarding all but one alternative test at each node to produce one or more decision trees.
- 10. The method of claim 9 wherein the one alternative test at each node is chosen based on the produced decision tree's combined predictive accuracy.
- 11. The method of claim 9 further comprising, pruning by removing all nodes below a particular partition to produce one or more decision trees.
- 12. The method of claim 9 wherein pruning comprises minimizing a formula Rα=R0+αNleaf, wherein Rα is a cost-complexity measure to be minimized, R0 is a miscalculation cost on the training items, Nleaf is a number of leaf nodes of a given decision tree, and α is a parameter.
- 13. A method of clustering a set of items, the set of items having a plurality of numeric physical descriptors and one or more physical properties, at least one of the properties being characterized by a non-numeric value, the method comprising:
for each item, representing each property that has a non-numeric value with a numeric value and using a clustering algorithm to cluster the items into subsets of items that have similar descriptor values, similar values among numeric values that represent non-numeric property values, and similar values for properties inherently characterized by a numerical value, if any.
- 14. The method of claim 13 wherein the step of representing each property that has a non-numeric value with a numeric value comprises:
for each item, creating a bit vector for a set of properties having a non-numeric value, each bit of said vector corresponding to a property having a non-numeric value, each bit vector comprising one bit for each distinct non-numeric value that the property to which the bit vector corresponds can have, each bit corresponding to a distinct non-numeric value and for each bit within each bit vector, setting the bit to a designated value if the bit corresponds to a non-numeric value that the property to which the bit vector corresponds has and setting the bit to a different designated value if the bit corresponds to a non-numeric value that the property to which the bit vector corresponds does not have.
- 15. A method of forming a decision tree, said decision tree operating to provide predictions of the behavior of a physical item with respect to at least one attribute of interest, wherein the physical item has one or more measured or computed descriptors representative of one or more characteristics of the physical item, and wherein the physical item exhibits unknown behavior with respect to at least one attribute of interest, the decision tree comprising two or more nodes in a hierarchical structure, wherein each node is defined by a test based on one or more of the physical items' descriptors, wherein the result of the test at a given node operates to categorize the physical item into one of a plurality of categories, and wherein the categorization made at any given node determines the identity of a node lower in the hierarchy which is next applied to the physical item until an end node of the hierarchy is reached, and wherein the classification made by said end node comprises a prediction of the behavior of the physical item with respect to the unknown attribute of interest, the method of forming the decision tree comprising:
providing a set of training physical items, each training physical item in said set of training physical items having one or more measured or computed physical descriptors and one or more attributes of interest which have been physically measured or previously determined; clustering said training items into a plurality of clusters defined by similarity in both descriptors and attributes; defining at least one of said nodes during tree formation based on an improvement in cluster purity when the training items are partitioned at said node irrespective of improvement or degradation of class purity when the training items are partitioned at said node.
- 16. A computer implemented system for constructing decision trees for predicting the behavior of a physical item with respect to an attribute of interest, said system comprising:
a memory storing a set of measured or computed characteristics for each of a set of training physical items, said memory also storing information assigning each of said training physical items with a behavioral classification defining an aspect of each training item's actual physical behavior with respect to an attribute of interest; a clustering module operative to assign each of said training physical items to one of a plurality of clusters of training items, said clusters defined by similarities in characteristics and in behavioral classification among subsets of said set of training physical items; a test definition module operative to define tests and test thresholds for separating said training physical items into separate groups when said test and test thresholds are applied to said characteristics; a decision tree construction module operative to store a hierarchical set of nodes, each node defined by a test and test threshold created by said test definition module; wherein said test definition module and decision tree construction module create and store several alternative node definitions during tree construction, wherein one of said alternative node definitions produces maximum association between said groups and said clusters, and wherein another of said alternative node definitions produces maximum association between said groups and said behavioral classifications.
- 17. The system of claim 16, wherein said decision tree construction module comprises an alternative node definition selection module that is operative to select single definitions for each of a series of nodes from said alternatives based on an evaluation on the predictive accuracy of the series in separating training items having different behavioral classifications.
RELATED APPLICATIONS
[0001] This application claims priority to provisional application No. 60/434,169, filed on Dec. 16, 2002, which is hereby incorporated by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60434169 |
Dec 2002 |
US |