Claims
- 1. A method of improving the reliability of keyword identification by a speech recognition system for continuously spoken speech in a given language comprising the steps of:
- providing a set of keyword templates each of which represents a respective keyword for recognition by said system;
- providing a set of filler templates each of which is representative of an arbitrary sound or utterance that is a component of spoken speech including words in said given language;
- generating a set of signals indicative of said spoken speech in a given time interval;
- providing parallel operations of:
- (a) comparing said set of signals with said keyword templates and selecting the keyword template having the greatest statistical similarity to said set of signals in said given time interval; and generating a keyword match score indicative of the statistical distance of said selected keyword template from said set of signals; and
- (b) separately comparing said set of signals with said filler templates and selecting a concatenation of filler templates having the greatest statistical similarity to said set of signals in the same given time interval; and generating a filler concatenation match score indicative of the statistical distance of said conctenation of filler templates from said set of signals; and
- comparing the keyword match score to the filler concatenation match score to determine, according to a pre-established threshold, whether said keyword match score is sufficiently better than said filler concatenation match score to confirm keyword identification.
- 2. A method of detecting keywords in continuously spoken speech comprising:
- generating a series of speech samples from said spoken speech for a given time interval;
- comparing said samples in said given time interval to a set of keyword templates each of which represents a respective keyword and selecting one of said keyword templates as a best keyword match for said speech samples;
- generating a keyword match score indicative of the degree of matching of said samples in said given time interval to said selected keyword template;
- separately comparing said samples to a set of filler templates, each filler template corresponding to an arbitrary sound or utterance that is a component of spoken speech including words, and selecting a concatenation of said filler templates as a best filler concatenation match for said samples in the same given time interval;
- generating a filler concatenation match score indicative of the degree of matching of said samples in said given time interval to said filler concatenation; and
- comparing said keyword match score to said filler concatenation match score to confirm that said keyword match score exceeds said filler concatenation match score by a predetermined threshold to confirm keyword detection.
Government Interests
The government has rights in this invention pursuant to contract No. MDA-904-83-C-0475 awarded by the Maryland Procurement Office.
US Referenced Citations (8)