Claims
- 1. A communications device, comprising:
- an interface for allowing a user to access a communications channel according a control signal; and
- a speech-recognition system for producing the control signal in response to a spoken command, the speech-recognition system including:
- a feature extractor for extracting a plurality of features from the spoken command; and
- a classifier for generating a discriminant signal according to a polynomial expansion having a form ##EQU4## wherein x.sub.j represents the plurality of features, y represents the discriminant signal, w.sub.i represents a coefficient, g.sub.ji represents an exponent, and i, j, m and n are integers; wherein the control signal is based on the discriminant signal.
- 2. The communications device of claim 1, wherein the polynomial expansion has a form ##EQU5## wherein a.sub.0 represents a zero-order coefficient, b.sub.i represents a first-order coefficient, and c.sub.ij represents a second-order coefficient.
- 3. The communications device of claim 1, wherein the interface includes a device selected from a group consisting of: a two-way radio, a telephone, a PDA, and a pager.
- 4. The communications device of claim 1, wherein the spoken command is a word selected from a group consisting of a digit between 0-9, "page", "send", and "help".
- 5. The communications device of claim 1, wherein the speech-recognition system further comprises:
- a pre-processor, operatively associated with the feature extractor, for transforming an audio signal using signal processing techniques into a sequence of data vectors that represent the spoken command and from which the plurality of features are extracted.
- 6. The communications device of claim 1, wherein the plurality of features are selected from a group consisting of: cepstral coefficients, first-order derivatives of cepstral coefficients, and word-level features.
- 7. A communications device, comprising:
- a pre-processor for transforming an audio signal into a sequence of data vectors;
- extraction means for extracting a plurality of feature frames from the sequence of data vectors;
- a plurality of classifiers for generating a plurality of discriminant signals, each of the plurality of classifiers designating a different spoken command and generating a discriminant signal according to a polynomial expansion having a form ##EQU6## wherein x.sub.j represents a feature frame, y represents the discriminant signal, w.sub.i represents a coefficient, g.sub.ji represents an exponent, and i, j, m and n are integers;
- an accumulator for generating a plurality of accumulated discriminant signals, the accumulator generating each of the plurality of accumulated discriminant signals by summing ones of the plurality of discriminant signals produced by a respective one of the plurality of classifiers;
- a selector for selecting a largest accumulated discriminant signal from the plurality of accumulated discriminant signals; and
- a two-way audio interface for transmitting and receiving data across a communications channel according a control signal, the control signal being a function of the largest accumulated discriminant signal.
- 8. The communications device of claim 7, wherein the extraction means includes:
- a feature extractor for extracting a sequence of feature frames from the sequence of data vectors; and
- a speech activity detector for selecting from the sequence of feature frames the plurality of feature frames representing a spoken command.
- 9. The communications device of claim 7, wherein the extraction means includes:
- a speech activity detector for selecting from the sequence of data vectors a vector sub-sequence representing a spoken command; and
- a feature extractor for extracting a plurality of feature frames from the vector sub-sequence.
- 10. The communications device of claim 7, wherein the polynomial expansion has a form ##EQU7## wherein a.sub.0 represents a zero-order coefficient, b.sub.i represents a first-order coefficient, and c.sub.ij represents a second-order coefficient.
- 11. The communications device of claim 7, wherein the two-way audio interface includes a device selected from a group consisting of: a two-way radio, a telephone, a PDA, and a pager.
- 12. The communications device of claim 7, wherein the audio signal represents a spoken command selected from a group consisting of a digit between 0-9, "page", "send", and "help".
- 13. The communications device of claim 7, wherein each of the plurality of feature frames includes a plurality of features selected from a group consisting of: cepstral coefficients, first-order derivatives of cepstral coefficients, and word-level features.
- 14. A two-way handheld communications device, comprising:
- a microphone for generating an audio signal;
- an A/D converter for digitizing the audio signal to produce a digitized audio signal;
- a pre-processor for transforming the digitized audio signal into a sequence of data vectors;
- a speech activity detector for producing a vector sub-sequence representing a spoken command, the speech activity detector continuously receiving the sequence of data vectors and including in the vector sub-sequence those of the sequence of data vectors having an energy-level that exceeds a background noise threshold;
- a feature extractor for extracting a sequence of feature frames from the vector sub-sequence;
- a plurality of classifiers for generating a plurality of discriminant signals, each of the plurality of classifiers designating a different spoken command and generating a discriminant signal according to a polynomial expansion having a form ##EQU8## wherein x.sub.j represents a feature frame, y represents the discriminant signal, w.sub.i represents a coefficient, g.sub.ji represents an exponent, and i, j, m and n are integers;
- a plurality of accumulators for generating a plurality of accumulated discriminant signals, each of the accumulators summing ones of the plurality of discriminant signals produced by a respective one of the plurality of classifiers;
- a selector for selecting a largest accumulated discriminant signal from the plurality of accumulated discriminant signals; and
- a two-way audio interface for transmitting and receiving data across a radio channel according a control signal, the control signal being a function of the largest accumulated discriminant signal.
- 15. The two-way handheld communications device of claim 14, wherein the polynomial expansion has a form ##EQU9## wherein a.sub.0 represents a zero-order coefficient, b.sub.i represents a first-order coefficient, and c.sub.ij represents a second-order coefficient.
- 16. The two-way handheld communications device of claim 14, wherein the two-way audio interface includes a device selected from a group consisting of: a two-way radio, a telephone, a PDA, and a pager.
- 17. The two-way handheld communications device of claim 14, wherein the spoken command is a word selected from a group consisting of a digit between 0-9, "page", "send", and "help".
- 18. The two-way handheld communications device of claim 14, wherein the speech activity detector detects boundaries of the spoken command by determining energy-level transitions across the background noise threshold.
- 19. The two-way handheld communications device of claim 18, wherein the speech activity detector associates an end-of-word boundary with a negative energy-level transition if the energy-level remains below the background noise threshold during a subsequent predetermined interval.
- 20. A method for controlling access to a communications channel, comprising the following steps:
- receiving a spoken command;
- extracting a plurality of features from the spoken command;
- generating a discriminant signal based on a polynomial expansion having a form ##EQU10## wherein x.sub.j represents the plurality of features, y represents the discriminant signal, w.sub.i represents a coefficient, g.sub.ji represents an exponent, and i, j, m and n are integers; and
- accessing the communications channel according the discriminant signal.
- 21. The method of claim 20, wherein the step of generating includes the following sub-step:
- basing the discriminant signal on a second-order polynomial expansion having a form ##EQU11## wherein a.sub.0 represents a zero-order coefficient, b.sub.i represents a first-order coefficient, and c.sub.ij represents a second-order coefficient.
- 22. The method of claim 20, further comprising the following step:
- selecting the spoken command from a group consisting of a digit between 0-9, "page", "send", and "help".
- 23. The method of claim 20, further comprising the step of:
- transforming an audio signal using signal processing techniques into a sequence of data vectors that represent the spoken command and from which the plurality of features are extracted.
- 24. The method of claim 20, wherein the step of extracting includes the following sub-step:
- generating the plurality of features selected from a group consisting of: cepstral coefficients, first-order derivatives of cepstral coefficients, and word-level features.
Parent Case Info
This is a continuation-in-part of application Ser. No. 08/253,893, filed Jun. 3, 1994, U.S. Pat. No. 5,509,103, and assigned to the same assignee as the present invention. The above-listed application is incorporated herein by this reference.
US Referenced Citations (12)
Non-Patent Literature Citations (2)
Entry |
"Handes Free Communication in an Automobile with a Microphone Array", OH, et al. ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech and Signal Processing, Mar. 1992, vol. 1, pp. 281-284. |
"Eyes Free Dialing For Cellular Telephones" Bendelac, et al. 17th Convention of Electrical and Electronics Engineers in Israel. Mar., 1991, pp. 234-237. |
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
253893 |
Jun 1994 |
|