The present invention relates generally to audio coding and in particular, to a method and apparatus for coding a noise-suppressed audio signal.
Cellular telephones, speaker phones, and various other communication devices utilize background noise suppression to enhance the quality of a received signal. In particular, the presence of acoustic background noise can substantially degrade the performance of a speech communication system. The problem is exacerbated when a digital speech coder is used in the communication link, since such coders are tuned to specific characteristics of clean speech signals and handle noisy speech and background noise rather poorly.
A simplified block diagram of a basic noise suppression system 100 is shown in
Prior-art noise suppression circuitry 100 additionally includes analysis circuitry 107 and synthesis circuitry 108. These components tend to blend signal discontinuities associated with the dynamics of the noise suppression system. More specifically, as the input speech+noise frames are processed, the filter gain characteristics within channel gain generator 104 change from frame to frame, thus leaving the potential for abrupt changes in output signal content at frame boundaries. Therefore, it is necessary to blend adjacent frames together by adding a decreasing signal envelope from the current frame to an increasing signal envelope for the next frame. Such a technique can be described as “overlap windowing”, and is well known in the prior art. An example of an overlap window is given in equation 4.1.2.1-3 as described in Cellular System Remote unit-Base Station Compatibility Standard of the Electronic Industry Association/Telecommunications Industry Association Interim Standard 127 as:
where g(n) is the windowed, zero-padded input sequence, d(n,m) is the input signal, n is the sample index, m is the frame index, D is the overlap delay, L is the frame length, and M is the FFT length. Here, we are interested in the increasing signal envelope at the beginning of the frame (samples 0 to D−1), and the decreasing signal envelope near the end of the frame (samples L to D+L−1). The significance of these envelopes is that when the signal is reconstructed at the noise suppression output, the output signal with the increasing signal envelope at the beginning of the current frame will be added to the output signal with the decreasing envelope from the previous frame. As one skilled in the art would appreciate, the sum of the two envelopes (windows) yields the trigonometric identity function:
sin2(π(n+0.5)/2D)+cos2(π(n+0.5)/2D)=1
Thus, the signal at the overlap portions of the noise suppression output will be reconstructed properly due to the sum of the overlapping windows having unity weight.
While this method is effective in smoothing frame discontinuities, it also produces an increase in delay through the noise suppression system. This is due to the fact that the samples for the next frame are not yet available for the addition process, so the addition of these samples to the overlap section of the current frame must be delayed until the next frame is processed. Thus, there exists a tradeoff between performance and delay, with greater smoothing intervals leading to better performance and the longer the delays.
The delay problem is compounded when noise suppression is included as part of a speech coding system, as is the case with many wireless digital communications systems. In such systems, the speech coder also adds delay, typically in the form of what is known as linear predictive coding (LPC) “look-ahead” delay. This delay comprises an additional buffering (via buffer 110) that is required to extend speech samples beyond the current frame for the purposes of estimating the short-term spectrum towards the end of the current frame. The reason being is that the spectral parameters (or LP parameters) are interpolated over shorter time intervals (called sub-frames), and it is desirable for the current set of LP parameters to be representative of the center of the last sub-frame of the current frame. This however, requires an LPC analysis buffer that extends beyond the frame currently being coded, which incurs delay. As is the case with noise suppression, there is a tradeoff between performance and delay.
Thus, for typical LPC analysis, analyzer 111 accesses buffer 110. As discussed above, speech samples beyond the current frame are included in the analysis buffer 110. The window that is applied to the current analysis buffer may be symmetric or non-symmetric based on the amount of look-ahead delay that is used and the length of analysis buffer circuitry 111. As is known in the art, autocorrelation analysis is applied, which is followed by a process to solve the autocorrelation “normal equations”, known as the Levinson-Durbin recursion. The result is a set of direct form LP coefficients (A(z)), which are used by the speech coder to represent the short-term spectral envelope.
As is evident, the analysis window overlaps with the previous frame by 40 samples (or 5 ms). This overlap facilitates the inter-frame smoothing as discussed previously, which after noise suppression is applied, produces a corresponding output from the noise suppression synthesis circuitry 303. Although a 40 sample overlap is used, other values (up to 160 samples) are possible. Here it can be seen how the overlapping of the frames contributes to the source of the delay. Particularly, for the given frame m, the corresponding noise suppression output frame represents samples that were received 5 ms earlier. This delay is denoted as Dns on the lower right of the diagram. The noise suppression output is then loaded directly in the LPC analysis buffer 304.
From
Supporting evidence for the first point can be found in
Because in a two-way voice communications system, it is desirable to minimize round-trip delay while maximizing audio quality, there is a need for a method and apparatus for coding a noise-suppressed signal that could consolidate the noise suppression and LPC analysis delays into a lesser net delay, while maintaining the same audio quality, or conversely, maintain a given delay while improving overall audio quality.
To address the above-mentioned need, a method and apparatus for coding a noise suppressed audio signal is described herein. In accordance with the preferred embodiment of the present invention an unfiltered frame portion from a second frame is blended together with a filtered frame portion from a first frame to produce a combined frame portion. The combined frame portion is then buffered along with the filtered frame for LPC analysis.
Since the unfiltered frame portion from a second frame is blended together with a filtered frame portion from a first frame system delay is greatly reduced. More particularly, since the unfiltered frame portion for the next frame is immediately available for combining, the delay incurred by prior-art filtering is eliminated.
The present invention encompasses method comprising the steps of filtering a first frame of data to produce a filtered first frame, combining a portion of the filtered first frame with an unfiltered portion of a second frame to produce a combined portion, and substituting the combined portion for the portion of the filtered first frame.
The present invention additionally encompasses a method for coding a noise-suppressed signal. The method comprises the steps of performing noise suppression on a first frame of data to produce a noise-suppressed first frame, overlapping and adding a portion of the noise-suppressed first frame with a non-noise suppressed portion of a second frame to produce a combined portion, and substituting the combined portion for the portion of the noise-suppressed first frame. Linear predictive coding (LPC) is then performed on the noise-suppressed first frame containing the combined portion.
The present invention additionally encompasses an apparatus comprising a filter having a first frame of data as an input and outputting a filtered first frame. The apparatus additionally encompasses a signal combiner having a portion of the filtered first frame as an input and a portion of an unfiltered second frame as an input and outputting a combined portion, wherein the combined portion comprises an addition of the portion of the filtered first frame with the portion of the unfiltered second frame. Finally, the apparatus comprises a buffer storing the filtered first frame having the combined portion substituted for the portion of the filtered first frame.
Turning now to the drawings, wherein like numerals designate like components,
Since filter 510 performs filtering on frames as a whole, a filtered portion 2 of frame 503 is unavailable until the whole of frame 503 is filtered. Thus a filtered frame portion (2) for the next frame is unavailable for a period of time after the current frame has been filtered. However, this problem is alleviated in the preferred embodiment of the present invention since frame portion 2 (of frame 503) is not filtered prior to addition with frame portion 1 (of frame 501).
A combined signal is produced by adding the outputs of the secondary analysis circuitry and the secondary synthesis circuitry. This combined signal is then loaded into the front of LPC analysis buffer 704. As one skilled in the art may now notice, the noise suppression delay Dns has been eliminated, and the look-ahead delay Dlpc has been increased from 40 samples (5 ms) to 80 samples (10 ms). This is important in the sense that, despite using a sub-optimal auxiliary signal in the LPC look-ahead, a symmetric LPC window 705 may be used to improve quality when compared to the prior art system in
A further embodiment of the present invention is illustrated in
Since the present invention utilizes a linear phase noise suppression circuit, the signals presented to the signal combiner 505 are generally phase aligned, which enables an input signal with relatively high SNR to be reconstructed very readily for use in the LPC analysis buffers. But in the cases where noisy (i.e., lower SNR) signals are encountered, the preceding embodiments may suffer in that the auxiliary output signal is comprised of both noise suppressed and non-noise suppressed audio samples. In this case it is beneficial to employ the circuit given in
As shown in
As one skilled in the art may appreciate, other functions within the gain determiner are possible, including average gain, median gain, etc., without deviating from the scope of the present invention. Additionally, other noise suppression state variables may be used to assist in a variation of the gain determiner output. Furthermore, the preferred embodiment of the present invention has been described using an 8000 Hz sampling rate, a 20 ms frame length, a 5 ms sub-frame length, a 5 ms noise suppression delay, and a 5 ms look-ahead delay. It is obvious to one skilled in the art that other such parameters may be used without departing from the scope of the present invention.
Continuing, at step 1105 an unfiltered portion of the second frame is combined with a filtered portion of the first frame to create combined frame portion 507. As discussed above, the combined frame portion blends signal discontinuities associated with the dynamics of the noise suppression system. More specifically, as the input speech+noise frames are processed, the filter gain characteristics within channel gain generator 104 change from frame to frame, thus leaving the potential for abrupt changes in output signal content at frame boundaries. In order to alleviate this problem, adjacent frames are blended together by adding portions of each frame.
At step 1107 the combined frame portion is output to buffer 110 along with the filtered first frame. In the preferred embodiment of the present invention the filtered portion of the first frame is replaced by the combined frame portion. At step 1109 LPC analysis circuitry 111 performs LPC analysis on filtered first frame containing the combined frame portion.
While the invention has been particularly shown and described with reference to a particular embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention. For example, while the preferred embodiment has specified the use of a noise suppressor with a speech coder that utilizes LPC analysis, certain generic preprocessor and coding methods exists which also use overlap-and-add systems coupled to spectral analysis. Furthermore, any type of signal analysis (not limited to spectral analysis) can be employed, if that analysis allows the extended signal from the preprocessor to be discarded once the true signal becomes available. It is intended that such changes come within the scope of the following claims.
Number | Name | Date | Kind |
---|---|---|---|
4630304 | Borth et al. | Dec 1986 | A |
4771465 | Bronson et al. | Sep 1988 | A |
4937873 | McAulay et al. | Jun 1990 | A |
5012519 | Adlersberg et al. | Apr 1991 | A |
5659622 | Ashley | Aug 1997 | A |
5680508 | Liu | Oct 1997 | A |
5706395 | Arslan et al. | Jan 1998 | A |
5839101 | Vahatalo et al. | Nov 1998 | A |
Number | Date | Country | |
---|---|---|---|
20040083095 A1 | Apr 2004 | US |