Not Applicable.
Certain embodiments of the invention relate to speech communication. More specifically, certain embodiments of the invention relate to a method and system for improving speech quality.
As competition in the mobile device business has increased, manufacturers of mobile devices may have found themselves struggling to differentiate their respective products. Although mobile device styling may have been the preferred way of attracting consumers, manufactures are increasingly turning to adding additional features to increase market share. For example, many cellular telephones run familiar applications such as email applications, calendars, and other personal information management type software. Some may also include speakerphone capabilities, which may enable, for example, a cellular telephone to be utilized as a conference call phone. In addition, some cellular telephones may include hardware and software to support hands-free capability. For example, the phone may be capable of working with a Bluetooth headsets, which may free up the hands of the user.
To improve speech quality, some cellular telephones may include a wind noise filter. These may be needed when the user of a cellular phone is, for example, operating the phone under windy conditions. This may be particularly useful when the speaker-phone and hands free capabilities described above are utilized. Wind noise filters may attenuate the effects of the wind noise by, for example, dynamically activating a filter that may attenuate those frequencies commonly associated with wind noise, such as frequencies below 800 Hz.
In the process, however, application of a wind noise filter may attenuate necessary speech components because the filter may not be capable of discerning between normal speech and wind noise in those frequency regions. The result of this may be that a listener may have difficulty understanding the speaker. This problem may be exacerbated because the wind noise filter may be turning on and off frequently, thus resulting in a less than pleasing communication experience.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with some aspects of the present invention as set forth in the remainder of the present application with reference to the drawings.
A system and/or method is provided for improving speech quality, substantially as shown in and/or described in connection with at least one of the figures, as set forth more completely in the claims.
These and other advantages, aspects and novel features of the present invention, as well as details of an illustrated embodiment thereof, will be more fully understood from the following description and drawings.
Certain embodiments of the invention may be found in a method and system for improving speech quality. The method may include estimating at least one component of a distorted portion of a speech signal from at least one component of an undistorted portion of the speech signal and reinforcing the component of the distorted portion based on the estimating. The components may include the pitch, spectral envelope and spectral energy of the speech signal. The method may also include delaying the undistorted portion of the speech signal and interpolating the components of the distorted portion of the speech signal from the components of a delayed undistorted portion and a current undistorted portion of the speech signal. The components of the distorted portion of the speech signal may be extrapolated from a current undistorted portion of the speech signal. The method may also include estimating the components of the distorted portion of the speech signal from frequency bands other than the frequency band effected by the distortion.
The buffer 405 may comprise suitable logic, circuitry, and/or code that may enable the storage of pitch and spectral envelope samples of the input'signal. In this regard, the buffer 405 may be capable of storing, for example, 10 ms, 15 ms, or 40 ms worth of samples. The samples may be utilized by the signal reconstructor 406 to reconstruct those parts of the input signal affected by wind noise 101.
The wind detector 403 may comprise suitable logic, circuitry, and/or code that may enable detection of wind noise 101 interference produced at a microphone. It may be shown that wind noise 101 may occur in the lower end of the audible frequency spectrum. For example, the wind noise 101 may be present in frequencies below 800 Hz. In this regard, the wind noise 101 may distort those voice signal frequencies below 800 Hz. The wind detector 403 may detect the presence of wind noise 101 by observing sudden changes to the audio spectrum below 800 Hz. For example, it may be shown that changes in the voice spectrum may occur at frequencies above 800 Hz as well as below 800 Hz. By observing a situation where the lower part of the spectrum changes without the upper part of the spectrum changing, the wind detector 403 may detect the presence of wind noise 101 in the voice spectrum.
The high pass filter 400 may comprise suitable logic, circuitry, and/or code that may enable the removal of noise associated with wind noise 101. As described above, wind noise 101 may be predominately present in the lower part of the audio spectrum. For example, it may occur at frequencies below 800 Hz. In this case, the high pass filter 400 may attenuate those frequencies below 800 Hz and allow frequencies above 800 Hz to pass without attenuation.
The correlator 401 may comprise suitable logic, circuitry, and/or code that may enable the detection of the pitch of the input signal. In this regard, the correlator 401 may detect the pitch, as shown in
where xn is the input signal. The pitch samples detected may be stored to the buffer 405.
The linear predictor 402 may comprise suitable logic, circuitry, and/or code that may enable detection of the spectral envelope of the input signal. The linear predictor may estimate future samples as a linear function of previous samples. In this regard, the function performed by the linear predictor 402 may be represented by the following equation:
where ŝn is the predicted sample, sn-i is the previous observed sample, and ai are the predictor coefficients. The transfer function H(z) of this function may correspond to the spectral envelope shown in
The linear predictor may utilize the above functions to compute the spectral envelope of a time slice of a signal and may then store the spectral envelope to the buffer 405. In this regard, the time slices of the spectral envelope may be represented by the spectrogram described in
The signal reconstructor 406 may comprise suitable logic, circuitry, and/or code that may enable the interpolation and reconstruction of the signal when the wind filter may be enabled. In this regard, the signal reconstructor 406 may be activated when the processor 404 has, for example, detected wind noise 101 above a certain threshold or when there has been an abrupt change in the pitch, spectral envelope or spectral energy of the input signal. In this case, the signal reconstructor 406 may utilize samples of the pitch information that occurred before and after the signal in question as well as samples of the spectral envelope of the signal before and after the detection to interpolate for the effects of the wind noise 101.
At step 502, the estimate of the signal energy may be computed as a function of time and/or frequency. This result may be stored to the buffer 405. At step 503, the random noise like component of the speech signal may be computed, for example, every 5 ms and this may be stored to the buffer 405 as well. At step 504, a determination may be made as to whether there has been an abrupt change in the pitch, spectral envelope or spectral energy of the input signal. This may occur, for example, when the high pass filter 400 has been activated. If no change in, for example, the pitch, spectral envelope or spectral energy is detected, the process may go back to step 500 and repeat. If a change in for example, the pitch, spectral envelope or spectral energy has been detected, then at step 505, a determination may be made as to whether all or part of the speech signal is affected by the wind noise 101. This may be accomplished, for example, by comparing the spectral envelope 201 and 204 of the signal before and after the abrupt change.
If only part of the spectrum is affected, then at step 506 a determination may be made as to whether the system has look ahead delay. That is, whether past and future samples of the speech signal are stored in the buffer 405. If look ahead delay is supported, then at step 508, the reconstructor 406 may compensate for the effects of the wind noise 101 by utilizing the information from the unaffected bands as well as the parameters stored in the buffer 405 representing past and/or future parameters of the speech signal that were not affected by the wind noise 101. For example, the pitch, spectral envelope, and signal energy estimates stored in the buffer 405, along with information about the unaffected portion of the speech signal may be utilized to reconstruct the pitch, formants, and spectral envelope of the affected area of the signal. Alternatively, the signal may be compensated by interpolating the frequency spectrum between past and future speech samples or by utilizing an interpolative packet loss concealment method, which may be utilized to mask the effects of lost or discarded packets. In other words, rather than correct the distorted portion of the speech, the previous undistorted portion of the speech may, for example, be repeated.
Referring back to step 506, if look ahead delay is not supported, then at step 509, the reconstructor 406 may compensate for the effects of the wind noise 101 by utilizing the information from the unaffected bands as well as the parameters stored in the buffer 405 representing past parameters of the speech signal that were not affected by the wind noise 101. In this regard, it may be necessary to decay the signal level gracefully. Alternatively, the signal may be compensated by utilizing an interpolative packet loss concealment method as described above.
Referring back to step 505, if the entire spectrum is affected, then at step 507, a determination may be made as to whether the system has look ahead delay. If look ahead delay is supported, then at step 510, the reconstructor 406 may compensate for the effects of the wind noise 101 by utilizing the parameters stored in the buffer 405 representing past and future parameters of the speech signal that were not affected by the wind noise 101. For example, the pitch, spectral envelope, and signal energy estimates stored in the buffer 405 may be utilized to reconstruct the pitch, formants, and spectral envelope of the entire signal. Alternatively, the signal may be compensated by interpolating the frequency spectrum between past and future speech samples or by utilizing an interpolative packet loss concealment method as described above.
Referring back to step 507, if look ahead delay is not supported, then at step 511, the reconstructor 406 may compensate for the effects of the wind noise 101 by utilizing the parameters stored in the buffer 405 representing past parameters of the speech signal that were not affected by the wind noise 101. In this regard, it-may be necessary to decay the signal level gracefully. Alternatively, the signal may be compensated by utilizing an interpolative packet loss concealment method as described above.
In another embodiment of the invention, the steps described herein may be performed in different domains. For example, the speech parameters may be characterized as a frequency domain representation, a prototype waveform representation, or a perceptual domain representation.
Another embodiment of the invention may provide a method for performing the steps as described herein for improving speech quality. For example, the system shown in
In accordance with another embodiment of the invention, a method for processing signals may comprise replacing a frequency component that matches a background noise estimate of a speech signal with an estimate derived from a signal that is characteristic of the background noise estimate. The background noise estimate of the speech signal may comprise a long-term background noise estimate. The signal that is characteristic of the background noise estimate may comprise a frequency component that is derived from a history of background noise estimates. In other words, the background noise estimate may be derived from prior background noise estimates. The signal background noise estimate of the speech signal may comprise comfort noise. One aspect of the invention may comprise detecting when at least a portion of the speech signal is distorted. Accordingly, based on the detection, replacement of the frequency component that matches a background noise estimate and/or reinforcement of one or more components of the distorted portion of the speech based on the estimating may occur.
Accordingly, the present invention may be realized in hardware, software, or a combination of hardware and software. The present invention may be realized in a centralized fashion in at least one computer system, or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited. A typical combination of hardware and software may be a general-purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein.
The present invention may also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which when loaded in a computer system is able to carry out these methods. Computer program in the present context means any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: a) conversion to another language, code or notation; b) reproduction in a different material form.
While the present invention has been described with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the present invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the present invention without departing from its scope. Therefore, it is intended that the present invention not be limited to the particular embodiment disclosed, but that the present invention will include all embodiments falling within the scope of the appended claims.