Claims
- 1. A decision tree system, comprising:
a module to read the data, a module to sort the data, a module to evaluate a potential split of the data according to some criterion, using a random sample of the data, a module to split the data, and a module to combine multiple decision trees in ensembles.
- 2. The decision tree system of claim 1 including a file and a main memory, wherein said module to read the data reads the data from said file to said main memory.
- 3. The decision tree system of claim 1 wherein said module to read the data creates multiple decision trees.
- 4. The decision tree system of claim 1 wherein the said module to sort the data, sorts it N times for each of the N trees in the ensemble.
- 5. The decision tree system of claim 1 wherein the said module to sort the data, sorts it once for the N trees in the ensemble.
- 6. The decision tree system of claim 1 wherein said module to evaluate a potential split of the data according to some criterion using a random sample of the data, uses a different sample for each attribute.
- 7. The decision tree system of claim 1 wherein said module to evaluate a potential split of the data according to some criterion using a random sample of the data, uses the same sample for each attribute.
- 8. The decision tree system of claim 6 wherein said sample is a fixed number of the instances.
- 9. The decision tree system of claim 7 wherein said sample is a fixed number of the instances.
- 10. The decision tree system of claim 6 wherein said sample is a percentage of the instances.
- 11. The decision tree system of claim 7 wherein said sample is a percentage of the instances.
- 12. The decision tree system of claim 1, wherein the said criterion to evaluate a potential split of the data is the information gain criterion.
- 13. The decision tree system of claim 1, wherein the said criterion to evaluate a potential split of the data is the gini criterion.
- 14. The decision tree system of claim 1, wherein the said criterion to evaluate a potential split of the data is the information gain ratio criterion.
- 15. The decision tree system of claim 1, wherein the said criterion to evaluate a potential split of the data is the Twoing rule.
- 16. The decision tree system of claim 1, wherein the said module to combine multiple trees in ensembles uses plurality (that is majority) voting.
- 17. The decision tree system of claim 1, wherein the said module to combine multiple trees in ensembles uses weighted voting, where different weights are given to the output from each tree.
- 18. A decision tree system, comprising the:
means to read the data, means to sort the data, means to evaluate a potential split of the data according to some criterion, using a random sample of the data, means to split the data, and the means to combine multiple decision trees in ensembles.
- 19. The decision tree system of claim 18 including a file and a main memory and wherein said means to read the data reads the data from said file to said main memory.
- 20. The decision tree system of claim 18 wherein said means to read the data creates multiple decision trees.
- 21. The decision tree system of claim 18 wherein the said means to sort the data, sorts it N times for each of the N trees in the ensemble.
- 22. The decision tree system of claim 18 wherein the said means to sort the data, sorts it once for the N trees in the ensemble.
- 23. The decision tree system of claim 18 wherein the said means to evaluate a potential split of the data according to some criterion using a random sample of the data, uses a different sample for each attribute.
- 24. The decision tree system of claim 18 wherein the said means to evaluate a potential split of the data according to some criterion using a random sample of the data, uses the same sample for each attribute.
- 25. The decision tree system of claim 23 wherein said sample is a fixed number of the instances.
- 26. The decision tree system of claim 24 wherein said sample is a fixed number of the instances.
- 27. The decision tree system of claim 23 wherein said sample is a percentage of the instances.
- 28. The decision tree system of claim 24 wherein said sample is a percentage of the instances.
- 29. The decision tree system of claim 18, wherein the said means to evaluate a potential split of the data is the information gain criterion.
- 30. The decision tree system of claim 18, wherein the said means to evaluate a potential split of the data is the gini criterion.
- 31. The decision tree system of claim 18, wherein the said means to evaluate a potential split of the data is the information gain ratio criterion.
- 32. The decision tree system of claim 18, wherein the said means to evaluate a potential split of the data is the Twoing rule.
- 33. The decision tree system of claim 18, wherein the said means for combining multiple trees in ensembles uses plurality (that is, majority) voting.
- 34. The decision tree system of claim 18, wherein the said means for combining multiple trees in ensembles uses weighted voting, where different weights are given to the output from each tree.
- 35. A decision tree method, comprising the steps of:
reading the data, sorting the data, evaluating a potential split of the data according to some criterion, using a random sample of the data, splitting the data, and combining multiple decision trees in ensembles.
- 36. The decision tree method of claim 35 including a file and a main memory and wherein the said step to read the data, reads the data from said file to said main memory.
- 37. The decision tree method of claim 35 wherein the said step to read the data creates multiple decision trees.
- 38. The decision tree method of claim 35 wherein the said step to sort the data, sorts it N times for each of the N trees in the ensemble.
- 39. The decision tree method of claim 35 wherein the said step to sort the data, sorts it once for the N trees in the ensemble.
- 40. The decision tree method of claim 35 wherein the said step to evaluate a potential split of the data according to some criterion using a random sample of the data, uses a different sample for each attribute.
- 41. The decision tree method of claim 35 wherein the said step to evaluate a potential split of the data according to some criterion using a random sample of the data, uses the same sample for each attribute.
- 42. The decision tree method of claim 40 wherein the said sample is a fixed number of the instances.
- 43. The decision tree method of claim 41 wherein the said sample is a fixed number of the instances.
- 44. The decision tree method of claim 40 wherein said sample is a percentage of the instances.
- 45. The decision tree method of claim 41 wherein said sample is a percentage of the instances.
- 46. The decision tree method of claim 35, wherein the said step to evaluate a potential split of the data uses the information gain criterion.
- 47. The decision tree method of claim 35, wherein the said step to evaluate a potential split of the data uses the gini criterion.
- 48. The decision tree method of claim 35, wherein the said step to evaluate a potential split of the data uses the information gain ratio criterion.
- 49. The decision tree method of claim 35, wherein the said step to evaluate a potential split of the data uses the Twoing rule.
- 50. The decision tree method of claim 35, wherein the said step to combine multiple trees in ensembles uses plurality (that is majority) voting.
- 51. The decision tree method of claim 35, wherein the said step to combine multiple trees in ensembles uses weighted voting, where different weights are given to the output from each tree.
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] Related subject matter is disclosed and claimed in the following commonly owned, copending, U.S. patent applications; “PARALLEL OBJECT-ORIENTED DECISION TREE SYSTEM,” by Chandrika Kamath and Erick Cantu-Paz, U.S. patent application Ser. No. 09/977,570, filed Jun. 8, 2001 and “CREATING ENSEMBLES OF OBLIQUE DECISION TREES WITH EVOLUTIONARY ALGORITHMS AND SAMPLING,” by Erick Cantu-Paz and Chandrika Kamath, U.S. patent application Ser. No. xx/xxx,xxx, filed Apr. 25, 2002. The commonly owned, copending, U.S. patent applications identified above are incorporated herein by reference in their entirety.
Government Interests
[0002] The United States Government has rights in this invention pursuant to Contract No. W-7405-ENG-48 between the United States Department of Energy and the University of California for the operation of Lawrence Livermore National Laboratory.