Claims
- 1. A text classifier in a natural language interface that receives a natural language user input, the text classifier comprising:
a feature extractor extracting a feature vector from a textual input indicative of the natural language user input; a statistical classifier coupled to the feature extractor outputting a class identifier identifying a target class associated with the textual input based on the feature vector.
- 2. The text classifier of claim 1 wherein the statistical classifier comprises:
a plurality of statistical classification components each outputting a class identifier.
- 3. The text classifier of claim 2 wherein the statistical classifier comprises:
a class selector coupled to the plurality of statistical classification components and selecting one of the class identifiers as identifying the target class.
- 4. The text classifier of claim 3 wherein the class selector comprises a voting component.
- 5. The text classifier of claim 3 wherein the class selector comprises an additional statistical classifier.
- 6. The text classifier of claim 1 and further comprising:
a rule-based classifier receiving the textual input and outputting a class identifier; and a selector selecting at least one of the class identifiers as identifying the target class.
- 7. The text classifier of claim 1 and further comprising:
a rule-based parser receiving the textual input and the class identifier and outputting a semantic representation of the textual input.
- 8. The text classifier of claim 7 wherein the semantic representation includes a class having slots, the slots being filled with semantic expressions.
- 9. The text classifier of claim 1 and further comprising:
a pre-processor identifying words in the textual input having semantic content.
- 10. The text classifier of claim 9 wherein the preprocessor is configured to remove words from the textual input that have insufficient semantic content.
- 11. The text classifier of claim 9 wherein the preprocessor is configured to insert tags for words in the textual input, the tags being semantic labels for the words.
- 12. The text classifier of claim 1 wherein the feature vector is based on words in a vocabulary supported by the natural language interface.
- 13. The text classifier of claim 12 wherein the feature vector is based on n-grams of the words in the vocabulary.
- 14. The text classifier of claim 12 wherein the feature vector is based on words in the vocabulary having semantic content.
- 15. The text classifier of claim 1 wherein the statistical classifier comprises a Naive Bayes Classifier.
- 16. The text classifier of claim 1 wherein the statistical classifier comprises a support vector machine.
- 17. The text classifier of claim 1 wherein the statistical classifier comprises a plurality of class-specific statistical language models.
- 18. The text classifier of claim 1 wherein a number c of classes are supported by the natural language interface and wherein the statistical classifier comprises c class-specific statistical language models.
- 19. The text classifier of claim 1 and further comprising:
a speech recognizer receiving a speech signal indicative of the natural language input and providing the textual input.
- 20. The text classifier of claim 1 wherein the statistical classifier identifies a plurality of n-best target classes.
- 21. The text classifier of claim 20 and further comprising:
an output displaying the n-best target classes for user selection.
- 22. The text classifier of claim 2 wherein each statistical classifier outputs a plurality of n-best target classes.
- 23. A computer-implemented method of processing a natural language input for use in completing a task represented by the natural language input, comprising:
performing statistical classification on the natural language input to obtain a class identifier for a target class associated with the natural language input; identifying rules in a rule-based analyzer based on the class identifier; and analyzing the natural language input with the rule-based analyzer using the identified rules to fill semantic slots in the target class.
- 24. The method of claim 23 and further comprising:
prior to performing statistical classification, identifying words in the natural language input that have semantic content.
- 25. The method of claim 23 wherein the natural language input is represented by a speech signal and further comprising:
performing speech recognition on the speech signal prior to performing statistical classification.
- 26. The method of claim 23 wherein performing statistical classification comprises:
performing statistical classification on the natural language input using a plurality of different statistical classifiers; and selecting a class identifier output by one of the statistical classifiers as representing the target class.
- 27. The method of claim 26 wherein selecting comprises:
performing statistical classification on the class identifiers output by the plurality of statistical classifiers to select the class identifier that represents the target class.
- 28. The method of claim 26 wherein selecting comprises:
selecting the class identifier output by a greatest number of the plurality of statistical classifiers.
- 29. The method of claim 23 and further comprising:
performing rule-based analysis on the natural language input to obtain a class identifier; and identifying the target class based on the class identifier obtained from the statistical classification and the class identifier obtained from the rule-based analysis.
- 30. A system for identifying a task to be performed by a computer based on a natural language input, comprising:
a feature extractor extracting features from the natural language input; and a statistical classifier, trained to accommodate unseen data, receiving the extracted features and identifying the task based on the features.
- 31. The system of claim 30 wherein the statistical classifier and wherein probabilities used by the statistical classifier are smoothed using smoothing data to accommodate for the unseen data.
- 32. The system of claim 31 wherein smoothing data is obtained using cross-validation data.
- 33. A text classifier identifying a target class corresponding to a natural language input, comprising:
a feature extractor extracting a set of features from the natural language input; and a Naïve Bayes Classifier receiving the set of features and identifying the target class based on the set of features.
- 34. The text classifier of claim 33 wherein the target class is indicative of a task to be performed based on the natural language input.
- 35. The text classifier of claim 34 and further comprising:
a preprocessor identifying content words in the natural language input prior to the feature extractor extracting the set of features.
- 36. The text classifier of claim 35 wherein the preprocessor identifies the content words by removing from the natural language input words having insufficient semantic content.
- 37. A text classifier identifying a target class corresponding to a natural language input, comprising:
a feature extractor extracting a set of features from the natural language input; and a statistical language model classifier receiving the set of features and identifying the target class based on the set of features.
- 38. The text classifier of claim 37 wherein the set of features includes n-grams.
- 39. The text classifier of claim 37 and further comprising:
a preprocessor identifying content words in the natural language input prior to the feature extractor extracting the set of features.
- 40. A text classifier identifying one or more target classes corresponding to a natural language input, comprising:
a feature extractor extracting a set of features from the natural language input; and a plurality of statistical classifiers receiving the set of features and identifying a target class based on the set of features.
- 41. The text classifier of claim 40 wherein each statistical classifier outputs a class identifier based on the set of features and further comprising:
a selector receiving the class identifiers from each of the statistical classifiers and selecting the target class as a class identified by at least one of the class identifiers.
- 42. The text classifier of claim 40 and further comprising:
a preprocessor identifying content words in the natural language input prior to the feature extractor extracting the set of features.
- 43. A text classifier identifying a target class corresponding to a natural language input, comprising:
a feature extractor extracting a set of features from the natural language input; a statistical classifier receiving the set of features and outputting a class identifier based on the set of features; a rules based classifier outputting a class identifier based on the natural language input; and a selector selecting a target class based on the class identifiers output by the statistical classifier and the rule-based classifier.
- 44. The text classifier of claim 43 and further comprising:
a preprocessor identifying content words in the natural language input prior to the feature extractor extracting the set of features and prior to the rule-based classifier receiving the natural language input.
- 45. A text classifier identifying a target task to be completed corresponding to a natural language input, comprising:
a feature extractor extracting a set of features from a textual input indicative of the natural language input; a statistical classifier receiving the set of features and identifying the target task based on the set of features; and a rule-based parser receiving the textual input and a class identifier indicative of the identified target task and outputting a semantic representation of the textual input.
- 46. The text classifier of claim 45 wherein the rule-based parser is configured to identify semantic expressions in the textual input.
- 47. The text classifier of claim 46 wherein the semantic representation includes a class having slots, the slots being filled with the semantic expressions.
- 48. The text classifier of claim 45 and further comprising:
a pre-processor identifying words in the textual input having semantic content.
- 49. The text classifier of claim 48 wherein the preprocessor is configured to remove words from the textual input that have insufficient semantic content.
- 50. The text classifier of claim 48 wherein the preprocessor is configured to insert tags for words in the textual input, the tags being semantic labels for the words.
- 51. The text classifier of claim 48 wherein the preprocessor is configured to replace words in the textual input with semantic tags, the semantic tags being semantic labels for the words.
- 52. A text classifier in a natural language interface that receives a natural language user input, the text classifier comprising:
a statistical classifier configured to receive a textual input and output a class identifier identifying a target class associated with the textual input.
- 53. The text classifier of claim 52 wherein the statistical classifier is configured to form tokens of the textual input and access a lexicon to ascertain token frequency of each token corresponding to the textual input in order to identify a target class. [LCW1]
- 54. The text classifier of claim 53 wherein the statistical classifier is configured to calculate a probability that the textual input corresponds to each of a plurality of possible classes based on token frequency of each token corresponding to the textual input.
- 55. The text classifier of claim 54 wherein the statistical classifier is configured to use a default value for token frequency if a token is not present in the lexicon.
- 56. The text classifier of claim 54 wherein the statistical classifier is configured to apply a scaling factor to a probability of a class based on whether a token is present in the lexicon.
- 57. The text classifier of claim 56 wherein the scaling factor varies as a function of the class.
- 58. The text classifier of claim 57 wherein the scaling factor for a class is a function of how frequently unseen words are encountered for the class.
- 59. The text classifier of claim 53 wherein tokens in the lexicon comprise words.
- 60. The text classifier of claim 53 wherein tokens in the lexicon comprise groups of words.
- 61. The text classifier of claim 53 wherein tokens in the lexicon comprise auxiliary features.
- 62. The text classifier of claim 53 wherein tokens in the lexicon comprise named entities.
- 63. The text classifier of claim 53 wherein tokens in the lexicon comprise generalized tokens that represent specific words.
- 64. The text classifier of claim 53 wherein the statistical classifier is configured to provide a list of class identifiers identifying target classes associated with the textual input.
- 65. The text classifier of claim 64 wherein the statistical classifier is configured to calculate a probability that the textual input corresponds to each of a plurality of possible classes based on token frequency of each token corresponding to the textual input.
- 66. The text classifier of claim 65 wherein the statistical classifier is configured to select a target class as a function of comparing calculated probabilities for each possible class.
- 67. The text classifier of claim 66 wherein the statistical classifier is configured to select a target class as a function of comparing calculated probabilities exceeding a selected threshold.
- 68. The text classifier of claim 67 wherein the statistical classifier is configured to use a first selected threshold for a first set of classes and a second selected threshold for a second set of classes.
- 69. The text classifier of claim 67 wherein the statistical classifier is configured to use a first selected threshold for a set of classes when a first class of the set has a greater probability than a second class of the set, and is configured to use a second selected threshold when the second class of the set has a greater probability than the first class of the set.
- 70. The text classifier of claim 53 wherein the lexicon includes a first class associated with natural language commands and a second class associated with search queries.
- 71. The text classifier of claim 52 and further comprising an interpretation collection module configured to receive the output from statistical classifier and combine the output with an output from a semantic analyzer analyzing the textual input to form a combined list of possible interpretations.
- 72. The text classifier of claim 71 wherein the interpretation collection module is configured to remove duplicates in the combined list.
- 73. The text classifier of claim 72 wherein the interpretation collection module is configured to ascertain if a first interpretation in the combined list is a subset of another interpretation.
- 74. A computer-implemented method of processing textual input, comprising:
performing statistical classification on the textual input to obtain a target class associated with the textual input; and forwarding the textual input to a search service if the target class identified relates to the textual input comprising a search query.
- 75. The computer-implemented method of claim 74 and further comprising:
forwarding the textual input to a statistical classifier if the target class identified relates to the textual input comprising a natural-language command; and performing statistical classification on the textual input to obtain a target class indicative of a natural language command associated with the textual input.
- 76. The computer-implemented method of claim 74 wherein the step of performing includes forming tokens of the textual input and accessing a lexicon to ascertain token frequency of each token corresponding to the textual input in order to identify a target class.
- 77. The computer-implemented method of claim 76 wherein the step of performing includes calculating a probability that the textual input corresponds to each of a plurality of possible classes based on token frequency of each token corresponding to the textual input.
- 78. The computer-implemented method of claim 77 wherein the step of performing includes providing a list of class identifiers identifying target classes associated with the textual input.
- 79. The computer-implemented method of claim 78 wherein the step of performing includes selecting a target class for the list as a function of comparing calculated probabilities for each possible class.
- 80. The computer-implemented method of claim 77 and further comprising taking action as a function of a calculated probability exceeding a selected threshold.
- 81. A computer-implemented method of processing textual input comprising a natural-language command, comprising:
performing statistical classification on the textual input to obtain a target class and associated interpretation with the textual input; and combining the interpretation from performing statistical classification with an interpretation from another form of analysis of the textual input to form a combined list of possible interpretations.
- 82. The computer-implemented method of claim 81 wherein combining includes removing duplicates in the combined list.
- 83. The computer-implemented method of claim 82 wherein combining includes ascertaining if a first interpretation in the combined list is a subset of another interpretation.
- 84. The computer-implemented method of claim 83 wherein combining includes removing the first interpretation from the combined list.
CROSS-REFERENCE TO RELATED APPLICATION
[0001] The present invention is a continuation-in-part and claims priority of U.S. Patent Application SYSTEM OF USING STATISTICAL CLASSIFIERS FOR SPOKEN LANGUAGE UNDERSTANDING, having Ser. No. 10/350,199 and filed Jan. 23, 2003.
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
10350199 |
Jan 2003 |
US |
Child |
10449708 |
May 2003 |
US |