Claims
- 1. A system for detecting endpoints of an utterance, comprising:a processor configured to manipulate speech energy corresponding to said utterance; a filter bank which band-passes said speech energy before providing said speech energy to, an endpoint detector that is responsive to said processor, said endpoint detector analyzing said speech energy in real time by progressively examining frames of said speech energy in sequence to determine threshold values and energy parameters, said energy parameters being short-term energy parameters corresponding to said frames of said speech energy, said short-term energy parameters being calculated using a following equation: DTF(i)=∑m=0M-1 yi(m)wi(m)where wi(m) is a respective weighting value, yi(m) is channel signal energy of a channel m at a frame i, and M is a total number of channels of said filter bank, said endpoint detector smoothing said short-term energy parameters by using a multiple-point median filter, said endpoint detector using a starting threshold and said short-term energy parameters to determine a starting point for a reliable island, said speech energy including at least one reliable island in which said short-term energy parameters are greater than said starting threshold and an ending threshold, said endpoint detector calculating a background noise value, said background noise value being derived from said short-term energy parameters during a background noise period, said background noise period ending at least 250 milliseconds ahead of said reliable island and having a normalized deviation that is less than a predetermined value, said endpoint detector comparing said threshold values with said energy parameters to identify a beginning point and an ending point of said utterance; anda validity manager, responsive to said processor, for analyzing said speech energy according to selectable criteria to thereby verify said utterance.
- 2. The system of claim 1 wherein said endpoint detector uses a stopping threshold and said short-term energy parameters to determine a stopping point for said reliable island.
- 3. The system of claim 2 wherein said endpoint detector calculates an ending threshold used to refine said ending point by comparing said short-term parameters to said ending threshold or said stopping threshold.
- 4. The system of claim 1 wherein said endpoint detector calculates signal-to-noise ratios corresponding to said speech energy, and wherein said endpoint detector calculates said threshold values using said signal-to-noise ratios, said background noise value, and pre-determined constant values.
- 5. The system of claim 1 wherein said endpoint detector calculates a beginning threshold used to refine said beginning point by comparing said short-term parameters to said beginning threshold.
- 6. A method for detecting endpoints of a spoken utterance, comprising:analyzing speech energy corresponding to said spoken utterance; calculating energy parameters in real time, said energy parameters corresponding to frames of said speech energy; determining a starting threshold corresponding to a reliable island in said speech energy; locating a starting point of said reliable island by comparing said energy parameters to said starting threshold; performing a refinement procedure to identify a beginning point for said spoken utterance by calculating a beginning threshold corresponding to said spoken utterance, and comparing said energy parameters to said be ginning threshold to locate said beginning point of said spoken utterance, said beginning threshold Tsr being calculated according to a following equation: Tsr=Nbg(1+SNRls)+f(Nw)+c1Vbg where Nbg is said background noise value, SNRls is a starting signal-to-noise ratio, csr is a starting constant, c1 is a constant value, Nw is a parameter related to gain that is imposed on said energy parameters due to a weight vector w, f represents a mathematical weighting function that applies said Nw to said energy parameters, and Vbg is a sample standard deviation of said background noise; determining a stopping threshold corresponding to said reliable island in said speech energy; determining an ending threshold corresponding to said spoken utterance; comparing said energy parameters to said stopping threshold and to said ending threshold; performing a refinement procedure to identify an ending point for said spoken utterance; and analyzing said speech energy using a validity manager to thereby verify said utterance according to selectable criteria.
- 7. The method of claim 6 wherein said ending threshold is a threshold Ter that is calculated according to a following equation:Ter=Nbg(1+SNRle/cer)+f(Nw)+c1Vbg where Nbg is said background noise value, SNRle is an ending signal-to-noise ratio, cer is an ending constant, c1 is said constant value, Nw is a parameter related to gain that is imposed on said energy parameters due to a weight vector w, f represents said mathematical weighting function that applies said Nw to said energy parameters, and Vbg is a sample standard deviation of said background noise.
- 8. The system of claim 7 wherein said Nw is defined by a following equation: Nw=∑m=0P w(m)sw(m)where w(m) is a weighting value and sw(m) is a speech energy distribution value.
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is related to, and claims priority in, co-pending U.S. Provisional Patent Application Serial No. 60/160,809, entitled “Method For Utilizing Validity Constraints In A Speech Endpoint Detector,” filed on Oct. 21, 1999. This application is a continuation-in-part to, and claims priority in, U.S. patent application Ser. No. 08/957,875, entitled “Method For Implementing A Speech Recognition System For Use During Conditions With Background Noise,” filed on Oct. 20, 1997, now U.S. Pat. 6,216,103, and a continuation-in-part to U.S. patent application Ser. No. 09/176,178, entitled “Method For Suppressing Background Noise In A Speech Detection System,” filed on Oct. 21, 1998, now U.S. Pat. 6,230,122 entitled “Speech Detection With Noise Suppression Based On Principal Components Analysis. All of the foregoing related applications are commonly assigned, and are hereby incorporated by reference.
US Referenced Citations (10)
Non-Patent Literature Citations (1)
Entry |
Lawrence E. Rabiner and Ronald W. Schafer, Digital Processing of Speech Signals, Prentice Hall, Upper Saddle River, NJ, 1978, pp. 158-161. |
Provisional Applications (1)
|
Number |
Date |
Country |
|
60/160809 |
Oct 1999 |
US |
Continuation in Parts (2)
|
Number |
Date |
Country |
Parent |
09/176178 |
Oct 1998 |
US |
Child |
09/482396 |
|
US |
Parent |
08/957875 |
Oct 1997 |
US |
Child |
09/176178 |
|
US |