Claims
- 1. An integrated voice activation detector for detecting whether voice is present, the integrated voice activation detector comprising:
a semiconductor integrated circuit including,
at least one signal processing unit to perform voice detection; and a processor readable storage means to store signal processing instructions for execution by the at least one signal processing unit to:
detect whether noise is present to determine whether a noise flag should be set; detect a predetermined number of zero crossings to determine whether a zero crossing flag should be set; detect whether a threshold amount of energy is present to determine whether an energy flag should be set; detect whether instantaneous energy is present to determine whether a instantaneous energy flag should be set; and utilize a combination of the noise, zero crossing, energy, and instantaneous energy flags to determine whether voice is present.
- 2. The integrated voice activation detector of claim 1, wherein the signal processing instructions further for execution by the at least one signal processing unit to, perform fast Fourier transformation (FFT) processing to determine whether a FFT flag should be set.
- 3. The integrated voice activation detector of claim 1, wherein the signal processing instructions further for execution by the at least one signal processing unit to, perform an interim voice activity decision, a interim voice activity decision flag being set to indicate voice has been detected by determining if the instantaneous energy flag is set or the energy flag is set and the noise flag is not set and the zero crossing flag is not set.
- 4. The integrated voice activation detector of claim 3, wherein the signal processing instructions further for execution by the at least one signal processing unit to, perform HangOver and Speech Kick in processing after the interim voice activity decision has been made to determine whether a voice activity flag should be set or cleared.
- 5. The integrated voice activation detector of claim 4, wherein the signal processing instructions further for execution by the at least one signal processing unit to, if the voice activity flag is set, send a speech payload to be packetized and update the voice activity detection flag for external interaction with other functions of the semiconductor integrated circuit.
- 6. The integrated voice activation detector of claim 4, wherein the signal processing instructions further for execution by the at least one signal processing unit to, if the voice activity flag is not set, disable an automatic level control and cause a silence insertion description payload to be prepared.
- 7. The integrated voice activation detector of claim 1, wherein detecting a predetermined number of zero crossings to determine whether a zero crossing flag should be set includes determining whether a root mean square crossing value is greater than a threshold value.
- 8. The integrated voice activation detector of claim 1, wherein detecting whether noise is present to determine whether a noise flag should be set includes determining whether energy in a current frame multiplied by a threshold is greater than delayed frame energy.
- 9. The integrated voice activation detector of claim 1, wherein detecting whether a threshold amount of energy is present to determine whether an energy flag should be set includes deter mining if a logarithm of an autocorrelation of a frame is greater than a n energy threshold.
- 10. The integrated voice activation detector of claim 1, wherein detecting whether instantaneous energy is present to determine whether an instantaneous energy flag should be set includes determining whether a difference between a current frames energy at an autocorrelation of a tenth delayed sample and a prior frames energy at an autocorrelation of a tenth delayed sample is greater than a previous frames autocorrelation multiplied by a threshold.
- 11. A method for voice activation detection to detect whether voice is present, the method comprising:
detecting whether noise is present to determine whether a noise flag should be set; detecting a predetermined number of zero crossings to determine whether a zero crossing flag should be set; detecting whether a threshold amount of energy is present to determine whether an energy flag should be set; detecting whether instantaneous energy is present to determine whether a instantaneous energy flag should be set; and utilizing a combination of the noise, zero crossing, energy, and instantaneous energy flags to determine whether voice is present.
- 12. The method of claim 11, further comprising, performing fast Fourier transformation (FFT) processing to determine whether a FFT flag should be set.
- 13. The method of claim 11, further comprising, performing an interim voice activity decision, a interim voice activity decision flag being set to indicate that voice has been detected by determining if the instantaneous energy flag is set or the energy flag is set and the noise flag is not set and the zero crossing flag is not set.
- 14. The method of claim 13, further comprising, performing HangOver and Speech Kick in processing after the interim voice activity decision has been made to determine whether a voice activity flag should be set or cleared.
- 15. The method of claim 14, further comprising, if the voice activity flag is set, sending a speech payload to be packetized and updating the voice activity detection flag for external interaction with other functions.
- 16. The method of claim 14, further comprising, if the voice activity flag is not set, disabling an automatic level control and causing a silence insertion description payload to be prepared.
- 17. The method of claim 11, wherein detecting a predetermined number of zero crossings to determine whether a zero crossing flag should be set includes determining whether a root mean square crossing value is greater than a threshold value.
- 18. The method of claim 11, wherein detecting whether noise is present to determine whether a noise flag should be set includes determining whether energy in a current frame multiplied by a threshold is greater than delayed frame energy.
- 19. The method of claim 11, wherein detecting whether a threshold amount of energy is present to determine whether an energy flag should be set includes determining if a logarithm of an autocorrelation of a frame is greater than an energy threshold.
- 20. The method of claim 11, wherein detecting whether instantaneous energy is present to determine whether an instantaneous energy flag should be set includes determining whether a difference between a current frames energy at an autocorrelation of a tenth delayed sample and a prior frames energy at an autocorrelation of a tenth delayed sample is greater than a previous frames autocorrelation multiplied by a threshold.
- 21. An apparatus comprising:
at least one signal processing unit to perform voice detection; and a storage device to store signal processing instructions for execution by the at least one signal processing unit to:
determine whether a noise flag, a zero crossing flag, an energy flag, and an instantaneous energy flag should be set; and utilize a combination of the noise, zero crossing, energy, and instantaneous energy flags to determine whether voice is present.
- 22. The apparatus of claim 21, wherein the signal processing instructions further for execution by the at least one signal processing unit to:
detect whether noise is present to determine whether the noise flag should be set; detect a predetermined number of zero crossings to determine whether the zero crossing flag should be set; detect whether a threshold amount of energy is present to determine whether the energy flag should be set; and detect whether instantaneous energy is present to determine whether the instantaneous energy flag should be set.
- 23. The apparatus of claim 21, wherein the signal processing instructions further for execution by the at least one signal processing unit to, perform fast Fourier transformation (FFT) processing to determine whether a FFT flag should be set.
- 24. The apparatus of claim 21, wherein the signal processing instructions further for execution by the at least one signal processing unit to, perform an interim voice activity decision, a interim voice activity decision flag being set to indicate voice has been detected by determining if the instantaneous energy flag is set or the energy flag is set and the noise flag is not set and the zero crossing flag is not set.
- 25. The apparatus of claim 24, wherein the signal processing instructions further for execution by the at least one signal processing unit to, perform HangOver and Speech Kick in processing after the interim voice activity decision has been made to determine whether a voice activity flag should be set or cleared.
- 26. The apparatus of claim 25, wherein the signal processing instructions further for execution by the at least one signal processing unit to, if the voice activity flag is set, send a speech payload to be packetized and update the voice activity detection flag for external interaction with other functions of the semiconductor integrated circuit.
- 27. The apparatus of claim 25, wherein the signal processing instructions further for execution by the at least one signal processing unit to, if the voice activity flag is not set, disable an automatic level control and cause a silence insertion description payload to be prepared.
- 28. The apparatus of claim 22, wherein detecting a predetermined number of zero crossings to determine whether a zero crossing flag should be set includes determining whether a root mean square crossing value is greater than a threshold value.
- 29. The apparatus of claim 22, wherein detecting whether noise is present to determine whether a noise flag should be set includes determining whether energy in a current frame multiplied by a threshold is greater than delayed frame energy.
- 30. The apparatus of claim 22, wherein detecting whether a threshold amount of energy is present to determine whether an energy flag should be set includes determining if a logarithm of an autocorrelation of a frame is greater than an energy threshold.
- 31. The apparatus of claim 22, wherein detecting whether instantaneous energy is present to determine whether an instantaneous energy flag should be set includes determining whether a difference between a current frames energy at an autocorrelation of a tenth delayed sample and a prior frames energy at an autocorrelation of a tenth delayed sample is greater than a previous frames autocorrelation multiplied by a threshold.
- 32. A method comprising:
determining whether a noise flag, a zero crossing flag, an energy flag, and an instantaneous energy flag should be set; and utilizing a combination of the noise, zero crossing, energy, and instantaneous energy flags to determine whether voice is present.
- 33. The method of claim 32,further comprising:
detecting whether noise is present to determine whether the noise flag should be set; detecting a predetermined number of zero crossings to determine whether the zero crossing flag should be set; detecting whether a threshold amount of energy is present to determine whether the energy flag should be set; and detecting whether instantaneous energy is present to determine whether the instantaneous energy flag should be set.
- 34. The method of claim 33, further comprising, performing fast Fourier transformation (FFT) processing to determine whether a FFT flag should be set.
- 35. The method of claim 32, further comprising, performing an interim voice activity decision, a interim voice activity decision flag being set to indicate that voice has been detected by determining if the instantaneous energy flag is set or the energy flag is set and the noise flag is not set and the zero crossing flag is not set.
- 36. The method of claim 35, further comprising, performing Hangover and Speech Kick in processing after the interim voice activity decision has been made to determine whether a voice activity flag should be set or cleared.
- 37. The method of claim 36, further comprising, if the voice activity flag is set, sending a speech payload to be packetized and updating the voice activity detection flag for external interaction with other functions.
- 38. The method of claim 36, further comprising, if the voice activity flag is not set, disabling an automatic level control and causing a silence insertion description payload to be prepared.
- 39. The method of claim 33, wherein detecting a predetermined number of zero crossings to determine whether a zero crossing flag should be set includes determining whether a root mean square crossing value is greater than a threshold value.
- 40. The method of claim 33, wherein detecting whether noise is present to determine whether a noise flag should be set includes determining whether energy in a current frame multiplied by a threshold is greater than delayed frame energy.
- 41. The method of claim 33, wherein detecting whether a threshold amount of energy is present to determine whether an energy flag should be set includes determining if a logarithm of an autocorrelation of a frame is greater than an energy threshold.
- 42. The method of claim 33, wherein detecting whether instantaneous energy is present to determine whether an instantaneous energy flag should be set includes determining whether a difference between a current frames energy at an autocorrelation of a tenth delayed sample and a prior frames energy at an autocorrelation of a tenth delayed sample is greater than a previous frames autocorrelation multiplied by a threshold.
- 43. A machine-readable medium having stored thereon instructions, which when executed by a machine, causes the machine to perform operations comprising:
determining whether a noise flag, a zero crossing flag, an energy flag, and an instantaneous energy flag should be set; and utilizing a combination of the noise, zero crossing, energy, and instantaneous energy flags to determine whether voice is present.
- 44. The machine-readable medium of claim 43, further comprising:
detecting whether noise is present to determine whether the noise flag should be set; detecting a predetermined number of zero crossings to determine whether the zero crossing flag should be set; detecting whether a threshold amount of energy is present to determine whether the energy flag should be set; and detecting whether instantaneous energy is present to determine whether the instantaneous energy flag should be set.
- 45. The machine-readable medium of claim 43, further comprising, performing fast Fourier transformation (FFT) processing to determine whether a FFT flag should be set.
- 46. The machine-readable medium of claim 43, further comprising, performing an interim voice activity decision, a interim voice activity decision flag being set to indicate that voice has been detected by determining if the instantaneous energy flag is set or the energy flag is set and the noise flag is not set and the zero crossing flag is not set.
- 47. The machine-readable medium of claim 46, further comprising, performing HangOver and Speech Kick in processing after the interim voice activity decision has been made to determine whether a voice activity flag should be set or cleared.
- 48. The machine-readable medium of claim 47, further comprising, if the voice activity flag is set, sending a speech payload to be packetized and updating the voice activity detection flag for external interaction with other functions.
- 49. The machine-readable medium of claim 47, further comprising, if the voice activity flag is not set, disabling an automatic level control and causing a silence insertion description payload to be prepared.
- 50. The machine-readable medium of claim 44, wherein detecting a predetermined number of zero crossings to determine whether a zero crossing flag should be set includes determining whether a root mean square crossing value is greater than a threshold value.
- 51. The machine-readable medium of claim 44, wherein detecting whether noise is present to determine whether a noise flag should be set includes determining whether energy in a current frame multiplied by a threshold is greater than delayed frame energy.
- 52. The machine-readable medium of claim 44, wherein detecting whether a threshold amount of energy is present to determine whether an energy flag should be set includes determining if a logarithm of an autocorrelation of a frame is greater than an energy threshold.
- 53. The machine-readable medium of claim 44, wherein detecting whether instantaneous energy is present to determine whether an instantaneous energy flag should be set includes determining whether a difference between a current frames energy at an autocorrelation of a tenth delayed sample and a prior frames energy at an autocorrelation of a tenth delayed sample is greater than a previous frames autocorrelation multiplied by a threshold.
RELATED APPLICATION
[0001] This application claims the benefit of U.S. Provisional Patent Application No. 60/231,510 filed on Sep. 9, 2000.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60231510 |
Sep 2000 |
US |