This application claims the benefit of Korean Patent Application No. 10-2005-0010189, filed on Feb. 3, 2005, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein in its entirety by reference.
1. Field of the Invention
The present invention relates to a speech enhancement apparatus and method, and more particularly, to a speech enhancement apparatus and method for enhancing the quality and naturalness of speech by efficiently removing noise included in a speech signal received in a noisy environment and appropriately processing the peak and valley of a speech spectrum where the noise has been removed.
2. Description of the Related Art
In general, although speech recognition apparatuses exhibit high performance in a clean environment, the performance of speech recognition in an actual environment where the speech recognition apparatus is used, such as in a car, in a display space, or in a telephone booth, deteriorates due to surrounding noise. Thus, the deterioration in the performance of speech recognition by noise has worked as an obstacle to the wide spread of speech recognition technology. Accordingly, many studies have been developed to solve the problem. A spectrum subtraction method to remove additive noise included in a speech signal input to a speech recognition apparatus has been widely used to perform speech recognition which is robust with respect to the noisy environment.
The spectrum subtraction method estimates an average spectrum of noise in a speech absence section, that is, in a period of silence, and subtracts the estimated average spectrum of noise from an input speech spectrum by using a frequency characteristic of noise which changes relatively smoothly with respect to speech. When an error exists in the estimated average spectrum |Ne(ω)| of noise, a negative number may occur in a spectrum obtained by subtracting the estimated average spectrum |Ne(ω)| of noise from the speech spectrum |Y(ω)| input to the speech recognition apparatus.
To prevent the occurrence of a negative number in the subtracted spectrum, in a conventional method (hereinafter, referred to as the “HWR”), a portion 110 having an amplitude less than “0” in the subtracted spectrum (|Y(ω)|−|Ne(ω)|) is adjusted to uniformly have “0” or a very small positive value. In this case, although a noise removal performance is superior, a possibility that distortion of speech occurs during the process of adjusting the portion 110 to have “0” or a very small positive value is increased so that the quality of speech or the performance of recognitiondeteriorate.
In another conventional method (hereinafter, referred to as the “FWR”), in the subtracted spectrum (|Y(ω)|−|Ne(ω)|), a portion having an amplitude less than “0”, for example, an amplitude value of P1, is adjusted to be the absolute value, that is, an amplitude value of P2, as shown in
To solve the above and/or other problems, the present invention provides a speech enhancement apparatus and a method for enhancing the quality and natural characteristics of speech by efficiently removing noise included in a speech signal received in a noisy environment.
The present invention provides a speech enhancement apparatus and a method for enhancing the quality and natural characteristics of speech by efficiently removing noise included in a speech signal received in a noisy environment and appropriately processing the peak and valley of a speech spectrum where the noise has been removed.
The present invention provides a speech enhancement apparatus and method for enhancing the quality and natural characteristics of speech by appropriately processing the peak and valley existing in a speech spectrum received in a noisy existing environment.
According to an aspect of the present invention, there is provided a speech enhancement apparatus comprising: a spectrum subtraction unit generating a subtracted spectrum by subtracting an estimated noise spectrum from a received speech spectrum; a correction function modeling unit modeling a correction function to minimize a noise spectrum using variation of the noise spectrum included in a training data; and a spectrum correction unit generating a corrected spectrum by correcting the subtracted spectrum using the correction function.
According to another aspect of the present invention, a speech enhancement method includes: generating a subtracted spectrum by subtracting an estimated noise spectrum from a received speech spectrum; modeling a correction function to minimize the noise spectrum using variation of a noise spectrum included in a training data; and generating a corrected spectrum by correcting the subtracted spectrum using the correction function.
According to another aspect of the present invention, a speech enhancement apparatus includes: a spectrum subtraction unit generating a subtracted spectrum by subtracting an estimated noise spectrum from a received speech spectrum; a correction function modeling unit modeling a correction function to minimize a noise spectrum using variation of the noise spectrum included in training data; a spectrum correction unit generating a corrected spectrum by correcting the subtracted spectrum using the correction function; and a spectrum enhancement unit enhancing the corrected spectrum by emphasizing a peak and suppressing a valley which exist in the corrected spectrum.
According to another aspect of the present invention, a speech enhancement method includes: generating a subtracted spectrum by subtracting an estimated noise spectrum from a received speech spectrum; modeling a correction function to minimize the noise spectrum using variation of a noise spectrum included in training data; generating a corrected spectrum by correcting the subtracted spectrum using the correction function; and enhancing the corrected spectrum by emphasizing/enlarging a peak and suppressing a valley in the corrected spectrum.
According to another aspect of the present invention, a speech enhancement apparatus includes: a spectrum subtraction unit subtracting an estimated noise spectrum from a received speech spectrum, and generating a subtracted spectrum, in which a negative number portion is corrected; and a spectrum enhancement unit enhancing the corrected spectrum by emphasizing a peak and suppressing a valley in the subtracted spectrum.
According to another aspect of the present invention, a speech enhancement method includes: subtracting an estimated noise spectrum from a received speech spectrum and generating a subtracted spectrum where a negative number portion is corrected; and enhancing a corrected spectrum by emphasizing a peak and suppressing a valley in the subtracted spectrum.
Additional aspects and/or advantages of the invention will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the invention.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawinq(s) will be provided by the U.S. Patent and Trademark Office upon request and payment of the necessary fee. The above and other features and advantages of the present invention will become more apparent by describing in detail embodiments thereof with reference to the attached drawings in which:
Reference will now be made in detail to the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below to explain the present invention by referring to the figures.
Referring to
In
Referring to
J=E└(x−y)2┘ [Equation 1]
When the value of r for classifying the first through third areas A1, A2, and A3 is determined, the correction function g(x) for each area is determined. A decreasing function, generally, a one-dimensional function, is determined for the first area A1, an increasing function, generally, a one-dimensional function, is determined for the second area A2, and a function that g(x)=0 is determined for the third area A3. That is, the correction function g(x) of the first area A1 is −βx(g(x)=−βx) and the correction function g(x) of the second area A2 is β(x+2r)(g(x)=β(x+2r)). The slope β of each correction function is expressed by applying the first error function J to each correction function and is β-partially differentiated and determined to be a value that makes a differential coefficient equal to “0”, which is shown in Equation 2.
In Equation 2, the slope β is greater than 0 and less than 1.
Referring to
That is, when the amplitude value of the current frequency component is greater than the average amplitude value of the adjacent frequency components, the current frequency component is determined as a peak.
The valley detection unit 630 detects valleys with respect to the spectrum corrected by the spectrum correction unit 350. Likewise, the valleys are detected by comparing the amplitude values x(k−1) and x(k+1) of two frequency components proximate to the amplitude value x(k) of a current frequency component sampled from the corrected spectrum provided from the spectrum correction unit 350. When the following Equation 5 is satisfied, the position of the current frequency component is detected as a valley.
That is, when the amplitude value of the present frequency component is less than the average amplitude value of the adjacent frequency components, the current frequency component is determined as a valley.
The peak emphasis unit 650 estimates an emphasis parameter from a second error function K between the spectrum corrected by the spectrum correction unit 350 and the original spectrum of the speech signal and emphasizes/enlarges a peak by applying an estimated emphasis parameter to each peak detected by the peak detection unit 610. When the second error function K is indicated as a sum of errors of the peaks and valleys using an emphasis parameter η and suppression parameter nl as shown in the following Equation 6, the emphasis parameter η is estimated as in Equation 7.
The emphasis parameter p is generally greater than 1.
That is, the amplitude value of each peak is multiplied by the emphasis parameter μ obtained from Equation 7 to enhance the spectrum.
The valley suppression unit 670 estimates a suppression parameter from the second error function K between the spectrum corrected by the spectrum correction unit 350 and the original spectrum of the speech signal and suppresses a valley by applying an estimated suppression parameter to each valley detected by the valley detection unit 630. When the second error function K is indicated as a sum of errors of the peaks and valleys using the emphasis parameter μ and suppression parameter η as shown in the above Equation 6, the suppression parameter η is estimated as in Equation 8.
The suppression parameter η is generally greater than 0 and less than 1.
In the above Equations 6 through 8, “x” denotes the spectrum corrected by the spectrum correction unit 350 and “y” denotes the original spectrum of a speech signal. That is, the amplitude value of each valley is multiplied by the suppression parameter η obtained from Equation 8 to enhance the spectrum.
The synthesis unit 690 synthesizes the peaks emphasized/enlarged by the peak emphasis unit 650 and the valleys suppressed by the valley suppression unit 670 and outputs a finally enhanced speech spectrum.
The invention can also be embodied as computer readable codes on a computer readable recording medium. The computer readable recording medium is any data storage medium or device that can store data which can be thereafter read by a computer system. Examples of the computer readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission through the Internet). The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion. Also, functional programs, codes, and code segments for accomplishing the present invention can be easily constructed by programmers skilled in the art to which the present invention pertains.
As described above, according to the speech enhancement apparatus and method according to the present invention, the portion where a negative number is generated in the subtracted spectrum is corrected using a correction function which optimizes the portion wherein a negative number is generated for a given environment and minimizes distortion in speech. Thus, the noise removal function is improved, and simultaneously, the quality and natural characteristics of speech are improved.
Also, according to the speech enhancement apparatus and method according to the present invention, since a frequency component having a relatively greater amplitude value is emphasized/enlarged and a frequency component having a relatively smaller amplitude value is suppressed in the subtracted spectrum, speech is enhanced without estimating a format.
While this invention has been particularly shown and described with reference to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2005-0010189 | Feb 2005 | KR | national |
Number | Name | Date | Kind |
---|---|---|---|
5742924 | Nakayama | Apr 1998 | A |
5742927 | Crozier et al. | Apr 1998 | A |
5752226 | Chan et al. | May 1998 | A |
5812970 | Chan et al. | Sep 1998 | A |
5943429 | Handel | Aug 1999 | A |
6289309 | deVries | Sep 2001 | B1 |
6757395 | Fang et al. | Jun 2004 | B1 |
6766292 | Chandran et al. | Jul 2004 | B1 |
6778954 | Kim et al. | Aug 2004 | B1 |
7054808 | Yoshida | May 2006 | B2 |
7158932 | Furuta | Jan 2007 | B1 |
7428490 | Xu et al. | Sep 2008 | B2 |
20020128830 | Kanazawa et al. | Sep 2002 | A1 |
20020156623 | Yoshida | Oct 2002 | A1 |
20030078772 | Wu et al. | Apr 2003 | A1 |
20050071156 | Xu et al. | Mar 2005 | A1 |
20070073537 | Jang et al. | Mar 2007 | A1 |
Number | Date | Country |
---|---|---|
0505645 | Sep 1992 | EP |
1416473 | May 2004 | EP |
Number | Date | Country | |
---|---|---|---|
20070185711 A1 | Aug 2007 | US |