Claims
- 1. An oblique decision tree induction method based on evolutionary algorithms and statistical sampling techniques, comprising the steps of:
reading the data; if necessary, sorting the data; evaluating a potential split of the data according to some criterion, determining an initial split of the data, determining the final split of the data using evolutionary algorithms and statistical sampling techniques, spliting the data, and combining multiple decision trees in ensembles.
- 2. The oblique decision tree induction method of claim 1, wherein said step of evaluating a potential split of the data according to some criterion utilizes Gini index.
- 3. The oblique decision tree induction method of claim 1, wherein said step of evaluating a potential split of the data according to some criterion utilizes information gain.
- 4. The oblique decision tree induction method of claim 1, wherein said step of evaluating a potential split of the data according to some criterion utilizes information ratio.
- 5. The oblique decision tree induction method of claim 1, wherein said step of evaluating a potential split of the data according to some criterion utilizes Twoing rule.
- 6. The oblique decision tree induction method of claim 1, wherein said step of determining the initial split of the data is based on tests on single attributes of the data or on random multivariate tests.
- 7. The oblique decision tree induction method of claim 1, wherein said step of determining the final split of the data using evolutionary algorithms and statistical sampling techniques is based on tests of linear combinations of attributes of the data obtained using evolutionary algorithms and statistical sampling techniques.
- 8. The oblique decision tree induction method of claim 7, wherein said statistical sampling techniques are applied once at the beginning of an experiment or every time that a potential split is evaluated.
- 9. The oblique decision tree induction method of claim 8, wherein said statistical sampling techniques include simple random sampling, where every data item has an equal probability of being selected, but other techniques are possible such as stratified sampling preserving the proportion of items of each class in the original data.
- 10. The oblique decision tree induction method of claim 1, wherein said step of combining multiple decision trees in ensembles is based on plurality (usually called majority) voting.
- 11. The oblique decision tree induction method of claim 1, wherein said step of combining multiple decision trees in ensembles is based on other combination techniques based on assigning different weights to each tree based on their accuracy or other criteria.
- 12. An oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques, comprising:
means for reading the data, means for sorting the data, if necessary, means for evaluating a potential split of the data according to some criterion, means for determining an initial split of the data, means for determining the final split of the data using evolutionary algorithms and statistical sampling techniques, means for spliting the data, and means for combining multiple decision trees in ensembles.
- 13. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 12, wherein said means for evaluating a potential split of the data according to some criterion utilizes Gini index.
- 14. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 12, wherein said means for evaluating a potential split of the data according to some criterion utilizes information gain.
- 15. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 12, wherein said means for evaluating a potential split of the data according to some criterion utilizes information ratio.
- 16. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 12, wherein said means for evaluating a potential split of the data according to some criterion utilizes Twoing rule.
- 17. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 12, wherein said means for determining an initial split of the data is based on tests on single attributes of the data or on random multivariate tests.
- 18. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 12, wherein said means for determining the final split of the data using evolutionary algorithms and statistical sampling techniques is based on tests of linear combinations of attributes of the data obtained using evolutionary algorithms and statistical sampling techniques.
- 19. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 18, wherein said statistical sampling techniques are applied once at the beginning of an experiment or every time that a potential split is evaluated.
- 20. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 19, wherein said statistical sampling techniques include simple random sampling, where every data item has an equal probability of being selected, but other techniques are possible such as stratified sampling preserving the proportion of items of each class in the original data.
- 21. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 12, wherein said means for combining multiple decision trees in ensembles is based on plurality (usually called majority) voting.
- 22. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 12, wherein said means for combining multiple decision trees in ensembles is based on other combination techniques based on assigning different weights to each tree based on their accuracy or other criteria.
- 23. An oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques, comprising:
a module to read the data, a module to sort the data if necessary, a module to evaluate a potential split of the data according to some criterion, a module to determine an initial split of the data, a module to determine the final split of the data using evolutionary algorithms and statistical sampling techniques, a module to split the data, and a module to combine multiple decision trees in ensembles.
- 24. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 23, wherein said module to evaluate a potential split of the data according to some criterion utilizes Gini index.
- 25. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 23, wherein said module to evaluate a potential split of the data according to some criterion utilizes information gain.
- 26. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 23, wherein said module to evaluate a potential split of the data according to some criterion utilizes information ratio.
- 27. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 23, wherein said module to evaluate a potential split of the data according to some criterion utilizes Twoing rule.
- 28. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 23, wherein said module to determine the initial split of the data is based on tests on single attributes of the data or on random multivariate tests.
- 29. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 23, wherein said module to determine the final split of the data using evolutionary algorithms and statistical sampling techniques is based on tests of linear combinations of attributes of the data obtained using evolutionary algorithms and statistical sampling techniques.
- 30. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 29, wherein said statistical sampling techniques are applied once at the beginning of an experiment or every time that a potential split is evaluated.
- 31. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 30, wherein said statistical sampling techniques include simple random sampling, where every data item has an equal probability of being selected, but other techniques are possible such as stratified sampling preserving the proportion of items of each class in the original data.
- 32. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 23, wherein said module to combine multiple decision trees in ensembles is based on plurality (usually called majority) voting.
- 33. The oblique decision tree induction system based on evolutionary algorithms and statistical sampling techniques of claim 23, wherein said module to combine multiple decision trees in ensembles is based on other combination techniques based on assigning different weights to each tree based on their accuracy or other criteria.
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] Related subject matter is disclosed and claimed in the two commonly owned, copending, U.S. patent applications; “PARALLEL OBJECT-ORIENTED DECISION TREE SYSTEM,” by Chandrika Kamath and Erick Cantu-Paz, U.S. patent application Ser. No. 09/977,570, filed Jun. 8, 2001; “PARALLEL OBJECT-ORIENTED DATA MINING SYSTEM,” by Chandrika Kamath and Erick Cantu-Paz, U.S. patent application Ser. No. 09/877,685, filed Jun. 8, 2001. The two commonly owned, copending, U.S. patent applications are incorporated herein by reference in their entirety.
Government Interests
[0002] The United States Government has rights in this invention pursuant to Contract No. W-7405-ENG-48 between the United States Department of Energy and the University of California for the operation of Lawrence Livermore National Laboratory.