Claims
- 1. An apparatus for recognizing speech sounds contained in audio speech input comprised of sound utterances separated by silences of at least a minimum silence duration, wherein said sound utterances may include non-speech sounds other than ambient noise, comprising:
- frame forming means for continuously forming frames of digital signals representative of said speech sounds, silences, and said non-speech sounds;
- recognizing means for continuously recognizing said speech sounds, silences and said non-speech sounds from said frames of digital signals;
- determining means for determining endpoints of said speech sounds based on the recognition of non-speech-speech-non-speech sequences, wherein the non-speech portions of the respective sequences include recognized silences as well as recognized non-speech sounds other than the ambient noise; and
- processing means for processing at least the speech portions of the respective sequences demarcated by the determined endpoints in accordance with a predefined syntax.
- 2. The apparatus of claim 1, wherein said recognizing means includes memory means for storing representations of speech and non-speech sounds as respective speech and non-speech templates, wherein said non-speech templates include a silence template of the minimum silence duration, an ambient noise template for the ambient noise, and multiple non-speech noise templates for noises other than the ambient noise having a duration several times longer than the minimum silence duration; and wherein said processing means includes means responsive to an endpoint determination by said determining means for recognizing said speech and non-speech sounds by comparing the respective frames of original signals with said speech and non-speech templates.
- 3. A method of recognizing speech sounds contained in audio speech input comprised of sound utterances separated by silences of at least a minimum silence duration, wherein said sound utterances may include non-speech sounds other than ambient noise, comprising the steps of:
- storing representations of speech sounds and non-speech sounds as respective speech and non-speech templates in a memory, wherein said non-speech templates include a silence template of the minimum silence duration, an ambient noise template for the ambient noise, and multiple non-speech noise templates for noises other than the ambient noise;
- continuously converting said speech sounds, silences and said non-speech sounds into frames of digital signals;
- determining endpoints of said speech sounds from respective non-speech-speech-non-speech sequences, including continuously comparing said frames of said speech sounds, silences, and non-speech sounds with said speech and non-speech templates until detection of the respective sequence and processing at least those of said frames in the respective sequence that are representative of speech sounds which are demarcated by the determined endpoints in accordance with a predefined syntax to recognize such speech sounds.
Parent Case Info
This application is a continuation, of application Ser. No. 687,610, filed Dec. 31, 1984, now abandoned.
US Referenced Citations (2)
Number |
Name |
Date |
Kind |
4481593 |
Bahlen |
Nov 1984 |
|
4489435 |
Moshier |
Dec 1984 |
|
Continuations (1)
|
Number |
Date |
Country |
Parent |
687610 |
Dec 1984 |
|