1. Technical Field
This invention relates to automatic speech recognition, and more particularly, to a system that isolates spoken utterances from background noise and non-speech transients.
2. Related Art
Within a vehicle environment, Automatic Speech Recognition (ASR) systems may be used to provide passengers with navigational directions based on voice input. This functionality increases safety concerns in that a driver's attention is not distracted away from the road while attempting to manually key in or read information from a screen. Additionally, ASR systems may be used to control audio systems, climate controls, or other vehicle functions. ASR systems enable a user to speak into a microphone and have signals translated into a command that is recognized by a computer. Upon recognition of the command, the computer may implement an application. One factor in implementing an ASR system is correctly recognizing spoken utterances. This requires locating the beginning and/or the end of the utterances (“end-pointing”).
Some systems search for energy within an audio frame. Upon detecting the energy, the systems predict the end-points of the utterance by subtracting a predetermined time period from the point at which the energy is detected (to determine the beginning time of the utterance) and adding a predetermined time from the point at which the energy is detected (to determine the end time of the utterance). This selected portion of the audio stream is then passed on to an ASR in an attempt to determine a spoken utterance.
Energy within an acoustic signal may come from many sources. Within a vehicle environment, for example, acoustic signal energy may derive from transient noises such as road bumps, door slams, thumps, cracks, engine noise, movement of air, etc. The system described above, which focuses on the existence of energy, may misinterpret these transient noises to be a spoken utterance and send a surrounding portion of the signal to an ASR system for processing. The ASR system may thus unnecessarily attempt to recognize the transient noise as a speech command, thereby generating false positives and delaying the response to an actual command.
Therefore, a need exists for an intelligent end-pointer system that can identify spoken utterances in transient noise conditions.
A rule-based end-pointer comprises one or more rules that determine a beginning, an end, or both a beginning and end of an audio speech segment in an audio stream. The rules may be based on various factors, such as the occurrence of an event or combination of events, or the duration of a presence/absence of a speech characteristic. Furthermore, the rules may comprise, analyzing a period of silence, a voiced audio event, a non-voiced audio event, or any combination of such events; the duration of an event; or a duration relative to an event. Depending upon the rule applied or the contents of the audio stream being analyzed, the amount of the audio stream the rule-based end-pointer sends to an ASR may vary.
A dynamic end-pointer may analyze one or more dynamic aspects related to the audio stream, and determine a beginning, an end, or both a beginning and end of an audio speech segment based on the analyzed dynamic aspect. The dynamic aspects that may be analyzed include, without limitation: (1) the audio stream itself, such as the speaker's pace of speech, the speaker's pitch, etc.; (2) an expected response in the audio stream, such as an expected response (e.g., “yes” or “no”) to a question posed to the speaker; or (3) the environmental conditions, such as the background noise level, echo, etc. Rules may utilize the one or more dynamic aspects in order to end-point the audio speech segment.
Other systems, methods, features and advantages of the invention will be, or will become, apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the following claims.
The invention can be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like referenced numerals designate corresponding parts throughout the different views.
A rule-based end-pointer may examine one or more characteristics of the audio stream for a triggering characteristic. A triggering characteristic may include voiced or non-voiced sounds. Voiced speech segments (e.g. vowels), generated when the vocal cords vibrate, emit a nearly periodic time-domain signal. Non-voiced speech sounds, generated when the vocal cords do not vibrate (such as when speaking the letter “f” in English), lack periodicity and have a time-domain signal that resembles a noise-like structure. By identifying a triggering characteristic in an audio stream and employing a set of rules that operate on the natural characteristics of speech sounds, the end-pointer may improve the determination of the beginning and/or end of a speech utterance.
Alternatively, an end-pointer may analyze at least one dynamic aspect of an audio stream. Dynamic aspects of the audio stream that may be analyzed include, without limitation: (1) the audio stream itself, such as the speaker's pace of speech, the speaker's pitch, etc.; (2) an expected response in an audio stream, such as an expected response (e.g., “yes” or “no”) to a question posed to the speaker; or (3) the environmental conditions, such as the background noise level, echo, etc. The dynamic end-pointer may be rule-based. The dynamic nature of the end-pointer enables improved determination of the beginning and/or end of a speech segment.
There are a variety of ways in which the voicing analysis may identify the presence of a vowel in the frame. One manner is through the use of a pitch estimator. The pitch estimator may search for a periodic signal in the frame, indicating that a vowel may be present. Or, pitch estimator may search the frame for a predetermined level of a specific frequency, which may indicate the presence of a vowel.
When the voicing analysis determines that a vowel is present in framen, framen is marked as speech, as shown at block 310. The system then may examine one or more previous frames. The system may examine the immediate preceding frame, framen−1, as shown at block 312. The system may determine whether the previous frame was previously marked as containing speech, as shown at block 314. If the previous frame was already marked as speech (i.e., answer of “Yes” to block 314), the system has already determined that speech is included in the frame, and moves to analyze a new audio frame, as shown at block 304. If the previous frame was not marked as speech (i.e., answer of “No” to block 314), the system may use one or more rules to determine whether the frame should be marked as speech.
As shown in
If the rules indicate that the speech is not present, the frame may be designated as being outside the end-point. If decision block 316 indicates that framen−1 is outside of the end-point (e.g., no speech is present), then a new audio frame, framen+1, is input into the system and marked as non-speech, as shown at block 304. If decision block 316 indicates that framen−1 is within the end-point (e.g., speech is present), then framen−1 is marked as speech, as shown in block 318. The previous audio stream may be analyzed, frame by frame, until the last frame in memory is analyzed, as shown at block 320.
The rules may be based on analyzing an event (e.g. voiced energy, non-voiced energy, an absence/presence of silence, etc.) or any combination of events (e.g. non-voiced energy followed by silence followed by voiced energy, voiced energy followed by silence followed by non-voiced energy, silence followed by non-voiced energy followed by silence, etc.). Specifically, the rules may examine transitions into energy events from periods of silence or from periods of silence into energy events. A rule may analyze the number of transitions before a vowel with a rule that speech may include no more than one transition from a non-voiced event or silence before a vowel. Or a rule may analyze the number of transitions after a vowel with a rule that speech may include no more than two transitions from a non-voiced event or silence after a vowel.
One or more rules may examine various duration periods. Specifically, the rules may examine a duration relative to an event (e.g. voiced energy, non-voiced energy, an absence/presence of silence, etc.). A rule may analyze the time duration before a vowel with a rule that speech may include a time duration before a vowel in the range of about 300 ms to 400 ms, and may be about 350 ms. Or a rule may analyze the time duration after a vowel with a rule that speech may include a time duration after a vowel in the range of about 400 ms to about 800 ms, and may be about 600 ms.
One or more rules may examine the duration of an event. Specifically, the rules may examine the duration of a certain type of energy or the lack of energy. Non-voiced energy is one type of energy that may be analyzed. A rule may analyze the duration of continuous non-voiced energy with a rule that speech may include a duration of continuous non-voiced energy in the range of about 150 ms to about 300 ms, and may be about 200 ms. Alternatively, continuous silence may be analyzed as a lack of energy. A rule may analyze the duration of continuous silence before a vowel with a rule that speech may include a duration of continuous silence before a vowel in the range of about 50 ms to about 80 ms, and may be about 70 ms. Or a rule may analyze the time duration of continuous silence after a vowel with a rule that speech may include a duration of continuous silence after a vowel in the range of about 200 ms to about 300 ms, and may be about 250 ms.
At block 402, a check is performed to determine if a frame or group of frames being analyzed has energy above the background noise level. A frame or group of frames having energy above the background noise level may be further analyzed based on the duration of a certain type of energy or a duration relative to an event. If the frame or group of frames being analyzed does not have energy above the background noise level, then the frame or group of frames may be further analyzed based on a duration of continuous silence, a transition into energy events from periods of silence, or a transition from periods of silence into energy events.
If energy is present in the frame or a group of frames being analyzed, an “Energy” counter is incremented at block 404. “Energy” counter counts an amount of time. It is incremented by the frame length. If the frame size is about 32 ms, then block 404 increments the “Energy” counter by about 32 ms. At decision 406, a check is performed to see if the value of the “Energy” counter exceeds a time threshold. The threshold evaluated at decision block 406 corresponds to the continuous non-voiced energy rule which may be used to determine the presence and/or absence of speech. At decision block 406, the threshold for the maximum duration of continuous non-voiced energy may be evaluated. If decision 406 determines that the threshold setting is exceeded by the value of the “Energy” counter, then the frame or group of frames being analyzed are designated as being outside the end-point (e.g. no speech is present) at block 408. As a result, referring back to
If no time threshold is exceeded by the value of the “Energy” counter at block 406, then a check is performed at decision block 410 to determine if the “noEnergy” counter exceeds an isolation threshold. Similar to the “Energy” counter 404, “noEnergy” counter 418 counts time and is incremented by the frame length when a frame or group of frames being analyzed does not possess energy above the noise level. The isolation threshold is a time threshold defining an amount of time between two plosive events. A plosive is a consonant that literally explodes from the speaker's mouth. Air is momentarily blocked to build up pressure to release the plosive. Plosives may include the sounds “P”, “T”, “B”, “D”, and “K”. This threshold may be in the range of about 10 ms to about 50 ms, and may be about 25 ms. If the isolation threshold is exceeded an isolated non-voiced energy event, a plosive surrounded by silence (e.g. the P in STOP) has been identified, and “isolatedEvents” counter 412 is incremented. The “isolatedEvents” counter 412 is incremented in integer values. After incrementing the “isolatedEvents” counter 412 “noEnergy” counter 418 is reset at block 414. This counter is reset because energy was found within the frame or group of frames being analyzed. If the “noEnergy” counter 418 does not exceed the isolation threshold, then “noEnergy” counter 418 is reset at block 414 without incrementing the “isolatedEvents” counter 412. Again, “noEnergy” counter 418 is reset because energy was found within the frame or group of frames being analyzed. After resetting “noEnergy” counter 418, the outside end-point analysis designates the frame or frames being analyzed as being inside the end-point (e.g. speech is present) by returning a “NO” value at block 416. As a result, referring back to
Alternatively, if decision 402 determines there is no energy above the noise level then the frame or group of frames being analyzed contain silence or background noise. In this case, “noEnergy” counter 418 is incremented. At decision 420, a check is performed to see if the value of the “noEnergy” counter exceeds a time threshold. The threshold evaluated at decision block 420 corresponds to the continuous non-voiced energy rule threshold which may be used to determine the presence and/or absence of speech. At decision block 420, the threshold for a duration of continuous silence may be evaluated. If decision 420 determines that the threshold setting is exceeded by the value of the “noEnergy” counter, then the frame or group of frames being analyzed are designated as being outside the end-point (e.g. no speech is present) at block 408. As a result, referring back to
If no time threshold is exceed by the value of the “noEnergy” counter 418, then a check is performed at decision block 422 to determine if the maximum number of allowed isolated events has occurred. An “isolatedEvents” counter provides the necessary information to answer this check. The maximum number of allowed isolated events is a configurable parameter. If a grammar is expected (e.g. a “Yes” or a “No” answer) the maximum number of allowed isolated events may be set accordingly so as to “tighten” the end-pointer's results. If the maximum number of allowed isolated events has been exceeded, then the frame or frames being analyzed are designated as being outside the end-point (e.g. no speech is present) at block 408. As a result, referring back to
If the maximum number of allowed isolated events has not been reached, “Energy” counter 404 is reset at block 424. “Energy” counter 404 may be reset when a frame of no energy is identified. After resetting “Energy” counter 404, the outside end-point analysis designates the frame or frames being analyzed as being inside the end-point (e.g. speech is present) by returning a “NO” value at block 416. As a result, referring back to
Block 512 illustrates how the end-pointer may respond to an input audio stream. As shown in
The end-pointer may also be configured to determine the beginning and/or end of an audio speech segment by analyzing at least one dynamic aspect of an audio stream.
The global and local initializations may occur at various times throughout the system's operation. The estimation of the background noise (local aspect initialization) may be performed every time the system is first powered up and/or after a predetermined time period. The determination of a speaker's pace of speech or pitch (global initialization) may be analyzed and initialized at a less often rate. Similarly, the local aspect that a certain response is expected may be initialized at a less often rate. This initialization may occur when the ASR communicates to the end-pointer that a certain response is expected. The local aspect for the environment condition may be configured to initialize only once per power cycle.
During initialization periods 1002 and 1004, the end-pointer may operate at its default threshold settings as previously described with regard to
A dynamic end-pointer may be configured similar to the end-pointer described in
The operation of a dynamic end-pointer may be similar to the end-pointer described with reference to
The methods shown in
A “computer-readable medium,” “machine-readable medium,” “propagated-signal” medium, and/or “signal-bearing medium” may comprise any means that contains, stores, communicates, propagates, or transports software for use by or in connection with an instruction executable system, apparatus, or device. The machine-readable medium may selectively be, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. A non-exhaustive list of examples of a machine-readable medium would include: an electrical connection “electronic” having one or more wires, a portable magnetic or optical disk, a volatile memory such as a Random Access Memory “RAM” (electronic), a Read-Only Memory “ROM” (electronic), an Erasable Programmable Read-Only Memory (EPROM or Flash memory) (electronic), or an optical fiber (optical). A machine-readable medium may also include a tangible medium upon which software is printed, as the software may be electronically stored as an image or in another format (e.g., through an optical scan), then compiled, and/or interpreted or otherwise processed. The processed medium may then be stored in a computer and/or machine memory.
While various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.
This application is a continuation of prior U.S. patent application Ser. No. 11/152,922, filed Jun. 15, 2005, now U.S. Pat. No. 8,170,875 which is incorporated by reference.
Number | Name | Date | Kind |
---|---|---|---|
55201 | Cushing | May 1866 | A |
4435617 | Griggs et al. | Mar 1984 | A |
4486900 | Cox et al. | Dec 1984 | A |
4531228 | Noso et al. | Jul 1985 | A |
4532648 | Noso et al. | Jul 1985 | A |
4630305 | Borth et al. | Dec 1986 | A |
4701955 | Taguchi | Oct 1987 | A |
4811404 | Vilmur et al. | Mar 1989 | A |
4843562 | Kenyon et al. | Jun 1989 | A |
4856067 | Yamada et al. | Aug 1989 | A |
4945566 | Mergel et al. | Jul 1990 | A |
4989248 | Schalk et al. | Jan 1991 | A |
5027410 | Williamson et al. | Jun 1991 | A |
5056150 | Yu et al. | Oct 1991 | A |
5146539 | Doddington et al. | Sep 1992 | A |
5151940 | Okazaki et al. | Sep 1992 | A |
5152007 | Uribe | Sep 1992 | A |
5201028 | Theis | Apr 1993 | A |
5293452 | Picone et al. | Mar 1994 | A |
5305422 | Junqua | Apr 1994 | A |
5313555 | Kamiya | May 1994 | A |
5400409 | Linhard | Mar 1995 | A |
5408583 | Watanabe et al. | Apr 1995 | A |
5479517 | Linhard | Dec 1995 | A |
5495415 | Ribbens et al. | Feb 1996 | A |
5502688 | Recchione et al. | Mar 1996 | A |
5526466 | Takizawa | Jun 1996 | A |
5568559 | Makino | Oct 1996 | A |
5572623 | Pastor | Nov 1996 | A |
5584295 | Muller et al. | Dec 1996 | A |
5596680 | Chow et al. | Jan 1997 | A |
5617508 | Reaves | Apr 1997 | A |
5677987 | Seki et al. | Oct 1997 | A |
5680508 | Liu | Oct 1997 | A |
5687288 | Dobler et al. | Nov 1997 | A |
5692104 | Chow et al. | Nov 1997 | A |
5701344 | Wakui | Dec 1997 | A |
5732392 | Mizuno et al. | Mar 1998 | A |
5794195 | Hormann et al. | Aug 1998 | A |
5933801 | Fink et al. | Aug 1999 | A |
5949888 | Gupta et al. | Sep 1999 | A |
5963901 | Vahatalo et al. | Oct 1999 | A |
6011853 | Koski et al. | Jan 2000 | A |
6021387 | Mozer et al. | Feb 2000 | A |
6029130 | Ariyoshi | Feb 2000 | A |
6098040 | Petroni et al. | Aug 2000 | A |
6163608 | Romesburg et al. | Dec 2000 | A |
6167375 | Miseki et al. | Dec 2000 | A |
6173074 | Russo | Jan 2001 | B1 |
6175602 | Gustafsson et al. | Jan 2001 | B1 |
6192134 | White et al. | Feb 2001 | B1 |
6199035 | Lakaniemi et al. | Mar 2001 | B1 |
6216103 | Wu et al. | Apr 2001 | B1 |
6240381 | Newson | May 2001 | B1 |
6304844 | Pan et al. | Oct 2001 | B1 |
6317711 | Muroi | Nov 2001 | B1 |
6324509 | Bi et al. | Nov 2001 | B1 |
6356868 | Yuschik et al. | Mar 2002 | B1 |
6405168 | Bayya et al. | Jun 2002 | B1 |
6434246 | Kates et al. | Aug 2002 | B1 |
6453285 | Anderson et al. | Sep 2002 | B1 |
6453291 | Ashley | Sep 2002 | B1 |
6487532 | Schoofs et al. | Nov 2002 | B1 |
6507814 | Gao | Jan 2003 | B1 |
6535851 | Fanty et al. | Mar 2003 | B1 |
6574592 | Nankawa et al. | Jun 2003 | B1 |
6574601 | Brown et al. | Jun 2003 | B1 |
6587816 | Chazan et al. | Jul 2003 | B1 |
6643619 | Linhard et al. | Nov 2003 | B1 |
6687669 | Schrögmeier et al. | Feb 2004 | B1 |
6711540 | Bartkowiak | Mar 2004 | B1 |
6721706 | Strubbe et al. | Apr 2004 | B1 |
6782363 | Lee et al. | Aug 2004 | B2 |
6822507 | Buchele | Nov 2004 | B2 |
6850882 | Rothenberg | Feb 2005 | B1 |
6859420 | Coney et al. | Feb 2005 | B1 |
6873953 | Lennig | Mar 2005 | B1 |
6910011 | Zakarauskas | Jun 2005 | B1 |
6996252 | Reed et al. | Feb 2006 | B2 |
7117149 | Zakarauskas | Oct 2006 | B1 |
7146319 | Hunt | Dec 2006 | B2 |
7535859 | Brox | May 2009 | B2 |
20010028713 | Walker | Oct 2001 | A1 |
20020071573 | Finn | Jun 2002 | A1 |
20020176589 | Buck et al. | Nov 2002 | A1 |
20030040908 | Yang et al. | Feb 2003 | A1 |
20030120487 | Wang | Jun 2003 | A1 |
20030216907 | Thomas | Nov 2003 | A1 |
20040078200 | Alves | Apr 2004 | A1 |
20040138882 | Miyazawa | Jul 2004 | A1 |
20040165736 | Hetherington et al. | Aug 2004 | A1 |
20040167777 | Hetherington et al. | Aug 2004 | A1 |
20050096900 | Bossemeyer et al. | May 2005 | A1 |
20050114128 | Hetherington et al. | May 2005 | A1 |
20050240401 | Ebenezer | Oct 2005 | A1 |
20060034447 | Alves et al. | Feb 2006 | A1 |
20060053003 | Suzuki et al. | Mar 2006 | A1 |
20060074646 | Alves et al. | Apr 2006 | A1 |
20060080096 | Thomas et al. | Apr 2006 | A1 |
20060100868 | Hetherington et al. | May 2006 | A1 |
20060115095 | Glesbrecht et al. | Jun 2006 | A1 |
20060116873 | Hetherington et al. | Jun 2006 | A1 |
20060136199 | Nongpiur et al. | Jun 2006 | A1 |
20060161430 | Schweng | Jul 2006 | A1 |
20060178881 | Oh et al. | Aug 2006 | A1 |
20060251268 | Hetherington et al. | Nov 2006 | A1 |
20070033031 | Zakarauskas | Feb 2007 | A1 |
20070219797 | Liu et al. | Sep 2007 | A1 |
20070288238 | Hetherington et al. | Dec 2007 | A1 |
Number | Date | Country |
---|---|---|
2158847 | Sep 1994 | CA |
2157496 | Oct 1994 | CA |
2158064 | Oct 1994 | CA |
1042790 | Jun 1990 | CN |
0 076 687 | Apr 1983 | EP |
0 629 996 | Dec 1994 | EP |
0 629 996 | Dec 1994 | EP |
0 750 291 | Dec 1996 | EP |
0 543 329 | Feb 2002 | EP |
1 450 353 | Aug 2004 | EP |
1 450 354 | Aug 2004 | EP |
1 669 983 | Jun 2006 | EP |
06269084 | Sep 1994 | JP |
06319193 | Nov 1994 | JP |
2000-250565 | Sep 2000 | JP |
10-1999-0077910 | Oct 1999 | KR |
10-2001-0091093 | Oct 2001 | KR |
WO 00-41169 | Jul 2000 | WO |
WO 0156255 | Aug 2001 | WO |
WO 01-73761 | Oct 2001 | WO |
WO 2004111996 | Dec 2004 | WO |
Entry |
---|
Avendano, C., Hermansky, H., “Study on the Dereverberation of Speech Based on Temporal Envelope Filtering,” Proc. ICSLP '96, pp. 889-892, Oct. 1996. |
Berk et al., “Data Analysis with Microsoft Excel”, Duxbury Press, 1998, pp. 236-239 and 256-259. |
Fiori, S., Uncini, A., and Piazza, F., “Blind Deconvolution by Modified Bussgang Algorithm”, Dept. of Electronics and Automatics—University of Ancona (Italy), ISCAS 1999. |
Learned, R.E. et al., A Wavelet Packet Approach to Transiet Signal Classification, Applied and Computational Harmonic Analysis, Jul. 1995, pp. 265-278, vol. 2, No. 4, USA, XP 000972660. ISSN: 1063-5203. abstract. |
Nakatani, T., Miyoshi, M., and Kinoshita, K., “Implementation and Effects of Single Channel Dereverberation Based on the Harmonic Structure of Speech,” Proc. of IWAENC-2003, pp. 91-94, Sep. 2003. |
Puder, H. et al., “Improved Noise Reduction for Hands-Free Car Phones Utilizing Information on a Vehicle and Engine Speeds”, Sep. 4-8, 2000, pp. 1851-1854, vol. 3, XP009030255, 2000. Tampere, Finland, Tampere Univ. Technology, Finland Abstract. |
Quatieri, T.F. et al., Noise Reduction Using a Soft-Dection/Decision Sine-Wave Vector Quantizer, International Conference on Acoustics, Speech & Signal Processing, Apr. 3, 1990, pp. 821-824, vol. Conf. 15, IEEE ICASSP, New York, US XP000146895, Abstract, Paragraph 3.1. |
Quelavoine, R. et al., Transients Recognition in Underwater Acoustic with Multilayer Neural Networks, Engineering Benefits from Neural Networks, Proceedings of the International Conference EANN 1998, Gibraltar, Jun. 10-12, 1998 pp. 330-333, XP 000974500. 1998, Turku, Finland, Syst. Eng. Assoc., Finland. ISBN: 951-97868-0-5. abstract, p. 30 paragraph 1. |
Seely, S., “An Introduction to Engineering Systems”, Pergamon Press Inc., 1972, pp. 7-10. |
Shust, Michael R. and Rogers, James C., Abstract of “Active Removal of Wind Noise From Outdoor Microphones Using Local Velocity Measurements”, J. Acoust. Soc. Am., vol. 104, No. 3, Pt 2, 1998, 1 page. |
Shust, Michael R. and Rogers, James C., “Electronic Removal of Outdoor Microphone Wind Noise”, obtained from the Internet on Oct. 5, 2006 at: <http://www.acoustics.org/press/136th/mshust.htm>, 6 pages. |
Simon, G., Detection of Harmonic Burst Signals, International Journal Circuit Theory and Applications, Jul. 1985, vol. 13, No. 3, pp. 195-201, UK, XP 000974305. ISSN: 0098-9886. abstract. |
Vieira, J., “Automatic Estimation of Reverberation Time”, Audio Engineering Society, Convention Paper 6107, 116th Convention, May 8-11, 2004, Berlin, Germany, pp. 1-7. |
Wahab A. et al., “Intelligent Dashboard With Speech Enhancement”, Information, Communications, and Signal Processing, 1997. ICICS, Proceedings of 1997 International Conference on Singapore, Sep. 9-12, 1997, New York, NY, USA, IEEE, pp. 993-997. |
Zakarauskas, P., Detection and Localization of Nondeterministic Transients in Time series and Application to Ice-Cracking Sound, Digital Signal Processing, 1993, vol. 3, No. 1, pp. 36-45, Academic Press, Orlando, FL, USA, XP 000361270, ISSN: 1051-2004. entire document. |
Canadian Examination Report of related application No. 2,575, 632, Issued May 28, 2010. |
Savoji, M. H. “A Robust Algorithm for Accurate Endpointing of Speech Signals” Speech Communication, Elsevier Science Publishers, Amsterdam, NL, vol. 8, No. 1, Mar. 1, 1989 (pp. 45-60). |
Turner, John M. and Dickinson, Bradley W., “A Variable Frame Length Linear Predicitive Coder”, “Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '78.” , vol. 3, pp. 454-457. |
Office Action dated Jun. 6, 2011 for corresponding Japanese Patent Application No. 2007-524151,9 pages. |
European Search Report dated Aug. 31, 2007 from corresponding European 06721766.1, 13 pages. |
International Preliminary Report on Patentability dated Jan. 3, 2008 from corresponding PCT Application No. PCT/CA2006/000512, 10 pages. |
International Search Report and Written Opinion dated Jun. 6, 2006 from corresponding Application No. PCT/CA2006/000512, 16 pages. |
Office Action dated Jun. 12, 2010 from corresponding Chinese Application No. 2006-80000746.6, 11 paqes. |
Office Action dated Mar. 27, 2008 from corresponding Korean Application No. 10-2007-7002573, 11 pages. |
Office Action dated Mar. 31, 2009 from corresponding Korean Application No. 10-2007-7002573, 2 oaoes. |
Office Action dated Jan. 7, 2010 from corresponding Japanese Application No. 2007-524151,7 paqes. |
Office Action dated Aug. 17, 2010 from corresponding Japanese Application No. 2007-524151, 3 paqes. |
Ying et al. “Endpoint Detection of Isolated Utterances Based on a Modified Teager Energy Estimate”. In Proc. IEEE ICASSP, vol. 2 pp. 732-735, 1993. |
Number | Date | Country | |
---|---|---|---|
20120265530 A1 | Oct 2012 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11152922 | Jun 2005 | US |
Child | 13455886 | US |