Claims
- 1. A method for discriminating a digital speech sound comprising dividing digital speech signals into signal blocks each including a predetermined number of samples, and making a decision for each of said signal blocks as to whether the speech sound is voiced, said method further comprising the steps of:
- transforming signals of each of said signal blocks into data on the frequency scale,
- finding low frequency range energies based on said data on the frequency scale,
- finding high frequency range energies based on said data on the frequency scale,
- finding a mean signal level of each of said signal blocks from low frequency range energies and high frequency range energies,
- dividing signals of each of said signal blocks into plural sub-blocks,
- analyzing said sub-blocks to find statistical characteristics of each of said sub-blocks,
- calculating a bias of said statistical characteristics of said signals in the time domain, and,
- deciding whether or not said signal blocks are voiced by comparing said mean signal level with a first predetermined threshold and by further comparing said bias of said statistical characteristics in the time domain with a second predetermined threshold.
- 2. The method as claimed in claim 1 wherein a decision as to whether or not said signal blocks are voiced is made further based on a ratio between said low frequency range energies and said high frequency range energies.
- 3. The method as claimed in claim 1 wherein a ratio between low frequency range energies and high frequency range energies are found based on said low frequency range energies and said high frequency range energies and wherein a decision as to whether or not said signal blocks are voiced is made by further comparing said ratio with a predetermined threshold.
- 4. The method as claimed in claim 1 wherein said low frequency range energies and said high frequency range energies are demarcated from each other at a demarcation frequency which is between 0 kHz and 3.4 kHz.
- 5. The method as claimed in claim 1 further comprising the step of:
- finding between said low frequency range energies and said high frequency range energies, said ratio being used as basis in deciding whether or not said signal blocks are voiced.
- 6. The method as claimed in claim 1 further comprising the steps of:
- finding a ratio between said low frequency range energies and said high frequency range energies, and,
- deciding whether or not each of said signal blocks are voiced by further comparing said ratio with a predetermined threshold.
- 7. A method for discriminating a digital speech sound comprising dividing digital speech signals into signal blocks each including a predetermined number of samples, and making a decision as to whether or not the speech sound is voiced for each of said signal blocks, said method further comprising the steps of:
- finding an effective value of signals in each of a plurality of sub-blocks divided from each of said signal blocks,
- finding a standard deviation and a mean value of said signals of each signal block based on the effective value as found for each of said sub-blocks,
- finding a normalized standard deviation in the time domain based on said standard deviation and said mean value,
- frequency-analyzing signals of each of said signal blocks to find spectral intensities at a plurality of frequencies,
- finding an energy distribution based on said spectral intensity at each of said plurality of frequencies,
- finding a mean signal level of signals of each of said signal blocks from said energy distribution, and,
- making a decision as to whether or not said signal blocks are voiced by comparing said normalized standard deviation, said energy distribution and said mean signal level with each corresponding predetermined threshold.
- 8. The method as claimed in claim 7 wherein said spectral intensities at each point of the frequency domain are divided into groups of low-range frequency and high-range frequency and wherein said energy distribution is found based on a ratio between energies of the respective groups.
- 9. The method as claimed in claim 8 wherein said low frequency range energies and said high frequency range energies are demarcated from each other at a demarcation frequency which is between 0 kHz and 3.4 kHz.
- 10. An apparatus for discriminating a digital speech sound by dividing digital speech signals into signal blocks each including a predetermined number of samples, and making a decision for each of said signal blocks, as to whether or not the speech sound is voiced, said apparatus comprising:
- frequency data calculating means for transforming signals of each of said signal blocks into frequency-domain data,
- means for finding low frequency range energies based on said frequency-domain data,
- means for finding high frequency range energies based on said frequency-domain data,
- means for finding a mean signal level of each of said signal blocks from said low frequency range energies and said high range energies,
- means for dividing signals of said signal block into plural sub-blocks,
- means for analyzing said sub-blocks for finding statistical characteristics of each of said sub-blocks,
- means for calculating a bias of said statistical characteristics of said signals in the time domain, and,
- decision means for making a decision as to whether or not said signal blocks are voiced by comparing said mean signal level with a first predetermined threshold and by further comparing said bias of said statistical characteristics in the time domain with a second predetermined threshold.
- 11. The apparatus as claimed in claim 10 wherein said decision means decides whether or not said signal blocks are voiced further based on a ratio between said low frequency range energies and said high frequency range energies.
- 12. The apparatus as claimed in claim 10 further comprising:
- means for finding a ratio between said low frequency range energies and said high frequency range energies based on said low frequency range energies and said high frequency range energies wherein said decision means decides whether or not said signal blocks are voiced by further comparing said ratio with a predetermined threshold.
- 13. The apparatus as claimed in claim 10 wherein said low frequency range energies and said high frequency range energies are demarcated from each other at a demarcation frequency which is between 0 kHz and 3.4 kHz.
- 14. The apparatus as claimed in claim 10 further comprising:
- means for finding a ratio between said low frequency range energies and said high frequency range energies, said ratio being used as basis in deciding whether or not said signal blocks are voiced.
- 15. The apparatus as claimed in claim 10 further comprising:
- means for finding a ratio between said low frequency range energies and said high frequency range energies, wherein said decision means decides whether or not said signal blocks are voiced by further comparing said ratio with a predetermined threshold.
- 16. An apparatus for discriminating a digital speech sound by dividing digital speech signals into signal blocks each including a predetermined number of samples, and making a decision for each of said signal blocks as to whether or not the speech sound is voiced, said apparatus comprising:
- means for finding an effective value of signals in each of a plurality of sub-blocks divided from each of said signal blocks,
- means for finding a standard deviation and a mean value of said signals of each signal block based on an effective value as found for each of said sub-blocks,
- means for finding a normalized standard deviation in the time domain based on said standard deviation and said mean value,
- means for frequency-analyzing signals of each of said signal blocks to find spectral intensities at a plurality of frequencies,
- means for finding energy distribution based on said spectral intensity at each of said plurality of frequencies,
- means for finding a mean signal level of signals of each of said signal blocks from said energy distribution, and,
- decision means for deciding whether or not said signal blocks are voiced by comparing said normalized standard deviation, said energy distribution and said mean signal level with each corresponding predetermined threshold.
- 17. The apparatus as claimed in claim 16 wherein said spectral intensities at each point of the frequency domain are divided into groups of low-range frequency and high-range frequency and wherein said energy distribution is found based on a ratio between energies of the respective groups.
Priority Claims (2)
Number |
Date |
Country |
Kind |
4-121460 |
Apr 1992 |
JPX |
|
5-000828 |
Jan 1993 |
JPX |
|
Parent Case Info
This application is a division of application Ser. No. 08/048,034, filed Apr. 14, 1993, pending, which is hereby incorporated by reference.
US Referenced Citations (9)
Non-Patent Literature Citations (4)
Entry |
"New Feature Extarction Mehods and the Concept of Time-Warped Distance in Speech Processing", 1991, Gordos. |
"Single Channel Adaptive Noise Cancelation for Enhancing Noisy Speech", 1994 International Symposium on Speech, Image Processing, and Neural Networks, Apr. 1994. |
"Harmonic and noise coding of LPC residuals with classified vector quantization", Nishiguchi et al, May 1995. |
"Vector Quantized MBE with Simplified V/UV Division at 3.0 KPBS", Nishiguchi et al, 1993. |
Divisions (1)
|
Number |
Date |
Country |
Parent |
048034 |
Apr 1993 |
|