1. Field of the Invention
The present invention relates to an audio processing system and, more particularly, to an audio processing system for eliminating noise.
2. Description of Related Art
Recently, with the fast development of multimedia techniques, the functions of smart phone, such as video recording or voice recording, are getting more and more powerful, and the requirement for recording voice or video is also greatly increased. However, when a user records voice in an actual application, due to the background circumstance, some additional noises, for example human voice in the background, may appear in the voice recorded by the user, resulting in that the quality of the voice recording is low. Besides, because the use of mobile phone is so popular, users often perform speech communication via the mobile phones when they are moving. However, the quality of such speech communication may be low due to the background noises, and this problem becomes more serious when the hand-free function of mobile phone is used.
For example, it is very dangerous for a driver to use a mobile phone when driving a car, and thus the hand-free function becomes indispensable to the driver. However, the hand-free function is likely to be influenced by lots of background noises, for example, roadwork sound and car horn sound, which may reduce the quality of phone call or even distract the driver's attention, resulting in traffic accidents.
Therefore, there is a need to provide an improved audio processing system, which can effectively suppress background noises and thus provide a better audio signal quality.
An object of the present invention is to provide an audio processing system for eliminating noise in audio signals, which comprises: an audio receiving module for receiving at least two audio signals; a sound source separation module for receiving a plurality of space features of the audio signals and obtaining a main sound source signal separated from the audio signals based on the space features; a noise suppression module for processing the main sound source signal based on an averaged amplitude value of noise in the main sound source signal so as to suppress noise in the main sound source signal; wherein each audio signal of the at least two audio signals includes signals from a plurality of sound sources. Thus, the system can separate a plurality of sound sources from the audio signals, and process each separated sound source based on noise level in each separated sound source to further suppress noise in each separated sound source.
Another object of the present invention is to provide an audio processing method performed on an audio processing system for eliminating noise in audio signals. The method comprises the steps of: (A) receiving at least two audio signals, each including signals from a plurality of sound sources; (B) receiving a plurality of space features of the audio signals, and separating a main sound source signal from the audio signals based on the space features; and (C) processing the main sound source signal based on an averaged amplitude value of noise in the main sound source signal so as to suppress noise in the main sound source signal. Thus, the system executes the method to separate a plurality of sound sources from the audio signals, and to process each separated sound source based on noise level in each separated sound source for further suppressing noise in each separated sound source.
Other objects, advantages, and novel features of the invention will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings.
The audio receiving module 10 is used to receive audio signals from the outside. For example, the audio receiving module 10 receives audio signals through an external microphone, and transmits the received audio signals to other modules of the audio processing system 1 for further processing. More specifically, the audio receiving module 10 can receive audio signals through a plurality of microphones, and the microphones can be disposed on different positions for receiving audio signals, respectively. Thus, the audio receiving module 10 can receive a plurality of audio signals; i.e., a plurality of audio signals can be inputted to the audio processing system 1. Besides, audio signal received by each microphone may include voices from a plurality of sound sources; for example, when a user drives a car and uses the hand-free function of a mobile phone, the microphone of the mobile phone may receive voice of the user and a plurality of background noises.
Then, step S62 is executed, in which the feature extracting module 22 is used to extract the features from the audio signals signal_1(f) and signal_2(f), so as to obtain amplitude ratio information and phase difference information in each frequency band of the audio signals signal_1(f) and signal_2(f), and the amplitude ratio information and the phase difference information are then used as the space features. Subsequently, the feature extracting module 22 makes use of K-Means algorithm to perform clustering to the space features in each frequency band, so as to obtain a plurality of clusters with similar space features from the audio signals signal_1(f) and signal_2(f), wherein each cluster represents one sound source signal. In this embodiment, the audio signals signal_1 and signal_2 are composed by mixing three sound source signals v1, v2 and v3, and thus three clusters can be obtained.
Then, step S63 is executed, in which the mask module 23 is used to generate a binary time frequency mask based on the space features of the cluster of the main sound source signal. The binary time frequency mask makes an intersection with the space features in each frequency band of at least one of the audio signals to remove the cluster without the satisfied space feature, so as to maintain the cluster of the main sound source, thereby forming the main sound source signal v1′. The feature extracting module 22 and the mask module 23 can analyze components of the space features, and determines the cluster of the main sound source based on a predetermined condition. For example, for a mobile phone, the predetermined condition for determining the cluster of the main sound source is to find the cluster with bigger amplitude and stable signal, or to determine the cluster according to the distance between the sound source of a user and the mobile phone, or allow the user to select the cluster of the main sound source from the space features of each cluster displayed by the audio processing system 1.
Then, step S64 is executed, in which the frequency domain to time domain converting module 24 is used to convert the frequency domain main sound source signal v1′ to the time domain sound source signal v1, wherein the frequency domain to time domain converting module 24 and the time domain to frequency domain converting module 21 can be implemented in the same module. As a result, the audio processing system 1 can remove the background sound source signals v2 and v3.
Then, step S72 is executed, in which the rectification module 32 is used to lower the amplitude in the main sound source signal v1′ that is smaller than the noise amplitude average value to be zero thereby obtaining a noise reduction signal v1″, wherein the noise reduction signal v1″ is expressed as
wherein S(ejw) represents the noise reduction signal v1″, X(ejw) represents the main sound source signal v1′, and Navg represents the noise amplitude average value. Thus, the amplitude in the main sound source signal v1′ that is smaller than the noise amplitude average value is lowered to zero.
Due to that the noise suppressed in step S72 is such noise with amplitude being smaller than the noise average value, there are still some remained noises with amplitudes bigger than the noise average value. Therefore, step S73 is executed to use the remained noise eliminating module 33 to determine whether an amplitude in each frequency band of the noise reduction signal v1″ is smaller than the maximum amplitude value Nmax of the noise, wherein the maximum amplitude value Nmax is a maximum amplitude value within 0.3 second period at the beginning of the time domain main sound source signal v1. If the amplitude in the frequency band is smaller than the maximum amplitude value Nmax, the determined amplitude in the noise reduction signal v1″ is replaced with a minimum one of the three amplitudes corresponding to frequency associated with the determined amplitude and frequencies adjacent thereto. Thus, the noises with higher amplitude can be eliminated, and the continuity of real speech can be kept, wherein the aforementioned operation can be expressed as:
wherein S(ejw)′ represents the noise reduction signal without remained noise, and Nmax represents the maximum amplitude value of the noise.
In addition, because real speech in an audio signal may be discontinuous, for example there usually being some conversation pauses in a phone call, the user may listen to some un-removed noises in the conversation pauses. Thus, a mechanism is required to determine whether actual speech is existed and to perform another noise eliminating method for the frequency band with no speech existed. Accordingly, step S74 is further executed, in which the speech existence determining module 45 is used to determine whether an amplitude ratio of the noise reduction signal v1″ to the noise average value Navg is smaller than a predetermined value T. If the amplitude ratio is smaller than the predetermined value T, it indicates that there is no actual speech in the frequency band and thus the speech existence determining module 45 attenuates the min sound source signal corresponding to the frequency band, wherein the attenuation is preferred to be 30 dB and the predetermined value T is preferred to be 12 dB. Thus, the noise reduction signal v1′ can further suppress noise for providing an excellent speech quality.
Furthermore, when executing step S72, some mistakes in continuity may be generated due to each frequency band being separately processed. Therefore, an average value operation can be performed to the amplitude of the main sound source signal v1′ and the amplitudes adjacent thereto, so as to reduce the mistakes in frequency spectrum, wherein the operation can be expressed as:
wherein k represents a current frequency band to be calculated, Xk(ejw) represents the main sound source signal v1′, M is the number adjacent frequency bands, and Xavg(ejw) represents the main sound source signal with reduced mistakes in frequency spectrum. Thus, the main sound source signal of steps S71 to S73 can be replaced by the main sound source signal with reduced mistakes in frequency spectrum, thereby reducing the mistakes in time/frequency domain conversion.
In addition, those skilled in the art can understand that the sequence of executing steps S72 to S74 can be varied or some of the steps can be neglected, and can be aware of the difference of the result obtained therefrom.
In view of the foregoing, it is known that, in the present invention, the sound source separation module 20 of the audio processing system 1 can be employed to remove the background voices and obtain the signal of the main sound source, and the noise suppression module 30 of the audio processing system 1 can be employed to suppress the noise in the main sound source. For example, when a user drives a car and uses the hand-free function of a mobile phone with the audio processing system 1 in accordance with the present invention, the audio separation module 20 can first remove background voices beyond the main speech, and the noise suppression module 30 can further suppress the noise in the main speech, so as to significantly improve the quality of the phone call.
Although the present invention has been explained in relation to its preferred embodiment, it is to be understood that many other possible modifications and variations can be made without departing from the spirit and scope of the invention as hereinafter claimed.
Number | Date | Country | Kind |
---|---|---|---|
104112050 | Apr 2015 | TW | national |