Claims
- 1. In a computer-based system, a method of training a multi-category classifier using a binary SVM algorithm, said method comprising:
storing a plurality of user-defined categories in a memory of a computer; analyzing a plurality of training examples for each category so as to identify one or more features associated with each category; calculating at least one feature vector for each of said examples; transforming each of said at least one feature vectors using a first mathematical function so as to provide desired information about each of said training examples; and building a SVM classifier for each one of said plurality of categories, wherein said process of building a SVM classifier comprises:
assigning each of said examples in a first category to a first class and all other examples belonging to other categories to a second class, wherein if any one of said examples belongs to both said first category and another category, such examples are assigned to the first class only; optimizing at least one tunable parameter of a SVM classifier for said first category, wherein said SVM classifier is trained using said first and second classes; and optimizing a second mathematical function that converts the output of the binary SVM classifier into a probability of category membership.
- 2. The method of claim 1 further comprising determining whether said first category has more than a predetermined number of training examples assigned to it, wherein if the number of training examples assigned to said first category does not exceed said predetermined number, the process of building a SVM classifer for said first category is aborted.
- 3. The method of claim 1 further comprising testing whether the trained SVM classifier could be optimized, wherein if said SVM classifier could not be optimized, said SVM classifer for said first category is discarded.
- 4. The method of claim 1 wherein said at least one tunable parameter of said SVM classifier is optimized using a method comprising the steps of:
allocating a subset of the training examples assigned to said first category to a “holdout” set, wherein said subset of training examples are left out of said training step; calculating a solution for the SVM classifier for the first category using predetermined initial value(s) for said at least one tunable parameter; and testing said solution for said first category to determine if the solution is characterized by either over-generalization or over-memorization.
- 5. The method of claim 4 wherein said test to determine whether said SVM classifier solution for said first category is characterized by either over-generalization or over-memorization is based on a relationship between symmetric SVM classifier scores (e.g. s and −s) produced by said SVM classifier, a first estimated probability indicative of class membership and a second estimated probability indicative of non-class membership for training examples with an SVM classifier score s, as provided by probability equations q(C|s) and 1.0−q(C|−s), respectively.
- 6. The method of claim 5 wherein the test to determine whether said SVM classifier solution for said first category is characterized by either over-generalization or over-memorization is based on a difference between a harmonic mean of said first and second estimated probabilities, on the one hand, and an arithmetic mean of said first and second estimated probabilities, on the other hand.
- 7. The method of claim 4 wherein said at least one tunable parameter comprises two tunable parameters for said SVM classifier, one for a positive class, and one for a negative class.
- 8. The method of claim 4 wherein said SVM classifer is based on a formulation having two cost factors (one for a positive class, one for a negative class), as follows:
- 9. The method of claim 4 wherein said SVM classifer is based on a formulation having two cost factors (one for a positive class, one for a negative class), as follows:
- 10. The method of claim 4 wherein said steps of calculating a solution and testing said solution are repeated as necessary according to a numerical optimization routine based on a simplex optimization method.
- 11. The method of claim 1 wherein said SVM classifier for said first category calculates a score s for said first category, wherein said score is optimized to fit a slope parameter in a sigrnoid function that transforms SVM scores to probability estimates.
- 12. The method of claim 1 wherein the calibration of SVM scores is performed using unbound support vector training examples.
- 13. The method of claim 1, wherein the calibration of SVM scores is performed using training examples allocated to a holdout set.
- 14. The method of claim 1, wherein the calibration of SVM scores is performed using both unbound support vector training examples and training examples allocated to a holdout set.
- 15. The method of claim 1 wherein said training examples comprise documents containing text.
- 16. A method of classifying new examples into at least one of multiple categories, comprising:
analyzing a new example so as to generate a feature vector for said new example; classifying the feature vector into a first category using a binary SVM classifier; transforming a SVM score calculated for said first category using a sigmoid function so as to generate at least one probability estimate; using a misclassification cost matrix to transform said at least one probability estimate; and determining whether any of said at least one transformed probability estimates are above a probability threshold.
- 17. The method of claim 16 wherein said transformation sigmoid function includes a slope parameter that is optimized during training for said first category.
- 18. A method of classifying new examples into at least one of multiple categories, comprising:
analyzing a new example so as to generate a feature vector for said new example; classifying the feature vector into a first category using a binary SVM classifier; transforming a SVM score calculated for said first category using a sigmoid function so as to generate at least one probability estimate; using a category prior matrix to transform said at least one probability estimate; and determining whether any of said at least one transformed probability estimates are above a probability threshold.
- 19. The method of claim 18 wherein said transformation sigmoid function includes a slope parameter that is optimized during training for said first category.
- 20. The method of claim 18 wherein values contained in said category prior matrix are estimated during training of said SVM classifier by counting the number of examples in said first category and dividing by the number of examples in all categories.
- 21. A computer-readable medium for storing instructions that when executed by a computer perform a method of training a multi-category classifier using a binary SVM algorithm, the method comprising:
storing a plurality of user-defined categories in a memory of a computer; analyzing a plurality of training examples for each category so as to identify one or more features associated with each category; calculating at least one feature vector for each of said examples; transforming each of said at least one feature vectors using a first mathematical function so as to provide desired information about each of said training examples; and building a SVM classifier for each one of said plurality of categories, wherein said process of building a SVM classifier comprises:
assigning each of said examples in a first category to a first class and all other examples belonging to other categories to a second class, wherein if any one of said examples belongs to both said first category and another category, such examples are assigned to the first class only; optimizing at least one tunable parameter of a SVM classifier for said first category, wherein said SVM classifier is trained using said first and second classes; and optimizing a second mathematical function that converts the output of the binary SVM classifier into a probability of category membership.
- 22. The computer-readable medium of claim 21 wherein said method further comprises determining whether said first category has more than a predetermined number of training examples assigned to it, wherein if the number of training examples assigned to said first category does not exceed said predetermined number, the process of building a SVM classifer for said first category is aborted.
- 23. The computer-readable medium of claim 21 wherein said method further comprises testing whether the trained SVM classifier could be optimized, wherein if said SVM classifier could not be optimized, said SVM classifer for said first category is discarded.
- 24. The computer-readable medium of claim 21 wherein said at least one tunable parameter of said SVM classifier is optimized using a method comprising:
allocating a subset of the training examples assigned to said first category to a “holdout” set, wherein said subset of training examples are left out of said training step; calculating a solution for the SVM classifier for the first category using predetermined initial value(s) for said at least one tunable parameter; and testing said solution for said first category to determine if the solution is characterized by either over-generalization or over-memorization (i.e., too specific).
- 25. The computer-readable medium of claim 24 wherein said test to determine whether said SVM classifier solution for said first category is characterized by either over-generalization or over-memorization is based on a relationship between symmetric SVM classifier scores (e.g. s and −s) produced by said SVM classifier, a first estimated probability indicative of class membership and a second estimated probability indicative of non-class membership for training examples with an SVM classifier score s, as provided by probability equations q(C|s) and 1.0−q(C|−s), respectively.
- 26. The computer-readable medium of claim 25 wherein said test to determine whether said SVM classifier solution is characterized by either over-generalization or over-memorization is based on a difference between a harmonic mean of said first and second estimated probabilities, on the one hand, and an arithmetic mean of said first and second estimated probabilities, on the other hand.
- 27. The computer-readable medium of claim 24 wherein said at least one tunable parameter comprises two tunable parameters for said SVM classifier, one for a positive class, and one for a negative class.
- 28. The method of claim 4 wherein said SVM classifer is based on a formulation having two cost factors (one for a positive class, one for a negative class), as follows:
- 29. The method of claim 4 wherein said SVM classifer is based on a formulation having two cost factors (one for a positive class, one for a negative class), as follows:
- 30. The computer-readable medium of claim 24 wherein said steps of calculating a solution and testing said solution are repeated as necessary according to a numerical optimization routine based on a simplex optimization method.
- 31. The computer-readable medium of claim 21 wherein said SVM classifier for said first category calculates a score s for said first category, wherein said score is optimized to fit a slope parameter in a sigmoid function that transforms SVM scores to probability estimates.
- 32. The computer-readable medium of claim 31 wherein the calibration of SVM scores is performed using unbound support vector training examples.
- 33. The computer-readable medium of claim 31, wherein the calibration of SVM scores is performed using training examples allocated to a holdout set.
- 34. The computer-readable medium of claim 31, wherein the calibration of SVM scores is performed using both unbound support vector training examples and training examples allocated to a holdout set.
- 35. The computer-readable medium of claim 21 wherein said training examples comprise documents containing text.
- 36. A computer-readable medium containing computer-executable instructions that when executed by a computer perform a method of classifying new examples into at least one of multiple categories, the method comprising:
analyzing a new example so as to generate a feature vector for said new example; classifying the feature vector into a first category using a binary SVM classifier; transforming a SVM score calculated for said first category using a sigmoid function so as to generate at least one probability estimate; using a misclassification cost matrix to transform said at least one probability estimate; and determining whether any of said at least one transformed probability estimates are above a probability threshold.
- 37. The computer-readable medium of claim 36 wherein said transformation sigmoid function includes a slope parameter that is optimized during training for said first category.
- 38. A computer-readable medium containing computer-executable instructions that when executed by a computer perform a method of classifying new examples into at least one of multiple categories, the method comprising:
analyzing a new example so as to generate a feature vector for said new example; classifying the feature vector into a first category using a binary SVM classifier; transforming a SVM score calculated for said first category using a sigmoid function so as to generate at least one probability estimate; using a category prior matrix to transform said at least one probability estimate; and determining whether any of said at least one transformed probability estimates are above a probability threshold.
- 39. The computer-readable medium of claim 38 wherein said transformation sigmoid function includes a slope parameter that is optimized during training for said first category.
- 40. The computer-readable medium of claim 38 wherein values contained in said category prior matrix are estimated during training of said SVM classifier by counting the number of examples in said first category and dividing by the number of examples in all categories.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims the benefit of priority under 35 U.S.C. § 119(e) to U.S. Provisional Patent Application Serial No. 60/431,299 entitled “EFFECTIVE MULTICLASS SVM CLASSIFICATION,” filed on Dec. 6, 2002, the entirety of which is incorporated by reference herein.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60431299 |
Dec 2002 |
US |