1. Technical Field
The present disclosure relates to the field of signal processing. In particular, to a system and method for noise estimation with music detection.
2. Related Art
Audio signal processing systems such as telephony terminals/handsets use signal processing methods (such as noise reduction, echo cancellation, automatic gain control and bandwidth extension/compression) to improve the transmitted speech quality. These components can be viewed as a chain of audio processing modules in an audio processing subsystem.
These signal processing methods rely on a noise modeling method that continually tries to accurately model the environmental noise in an input signal received from, for example, a microphone. The resulting noise model, or noise estimate, is used to control various feature detectors such as speech detectors, signal-to-noise calculators and other mechanisms. These feature detectors directly affect the signal processing methods (noise suppression, echo cancellation, etc.) and thus directly affect the transmitted signal quality.
Noise modeling methods in audio signal processing systems typically assume that the background noise does not contain significant speech-like content or structure. As such when reasonably loud music is present in the environment (that does contain speech-like components) these algorithms act unpredictably causing potentially drastic decreases in transmitted signal quality.
The system may be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the disclosure. Moreover, in the figures, like referenced numerals designate corresponding parts throughout the different views.
Other systems, methods, features and advantages will be, or will become, apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included with this description, be within the scope of the invention, and be protected by the following claims.
In a system and method for noise estimation with music detection described herein provides for generating a music classification for music content in an audio signal. A music detector may classify the audio signal as music or non-music. The non-music signal may be considered to be signal and noise. An adaption rate may be adjusted responsive to the generated music classification. A noise estimate is calculated applying the adjusted adaption rate. The system and method described herein provides for adapting a noise estimate quickly when the noise content changes, while mitigating adaption of the noise estimation in response to the presence of music. Unlike typical noise estimation methods, the system and method for noise estimation with music detection described herein may not attempt to model the music component, instead the system and method may mitigate the noise modeling algorithms being misled by the music components.
The signal quality of many audio signal-processing methods may rely on the accuracy of a noise estimate. For example, a signal-to-noise ratio may be calculated using the magnitude of an input audio signal divided by the noise level. The noise level is typically estimated because the exact noise characteristics are unknown. Errors in the estimated noise level, or noise estimate, may result in further errors in the signal-to-noise calculation that may be utilized in many audio signal-processing methods.
Noise modeling methods in speech systems typically assume that the noise estimate does not contain significant speech-like content or structure. An example noise modeling method that does not include speech-like content in the noise estimate may classify the current audio input signal as speech or noise. When the current audio signal is classified as noise the noise estimate is updated with a processed version of the current audio signal. Typically, noise modeling methods are more complicated, for example, in one implementation, the background noise level estimate is calculated using the background noise estimation techniques disclosed in U.S. Pat. No. 7,844,453, which is incorporated herein by reference, except that in the event of any inconsistent disclosure or definition from the present specification, the disclosure or definition herein shall be deemed to prevail. In other implementations, alternative background noise estimation techniques may be used, such as a noise power estimation technique based on minimum statistics
Noise modeling methods in audio signal processing systems may handle environmental noise as well as speech and noise in the audio signal. Music may be considered another environmental noise and as such when reasonably loud music is present in the environment (that does contain speech-like components) the noise modeling methods act unpredictably causing potentially drastic decreases in transmitted signal quality.
Herein are described the system and method for noise estimation with music detection. This document describes an audio signal processing system with a noise estimator and a music detector that can model environmental noise in the presence of music as well as when no music is present to produce a noise estimate. The system and method for noise estimation with music detection may be applied to, for example, telephony use cases where there is speech in a noisy environment or where there is speech and music (aka media) in a noisy environment. The first use case is referred to as (signal+noise) and the second use case as (signal+music+noise). It may be desirable to remove the noise component regardless of whether music is present or not. Typical audio processing systems may not handle removing the noise component in the (signal+noise+music) use case without negatively impacting signal quality. The music may be modeled as having a steady-state music component and a transient music component. Typical noise estimation techniques will attempt to model both (noise+steady-state music). When the noise estimation models transient components then it may also attempt to model the transient music components. This will typically cause feature detectors and audio processing algorithms to fail, by over-attenuating, distorting, temporally clipping speech or by passing bursts of distorted music. The system and method for noise estimation with music detection may provide a conservative noise estimate such that noise is removed during the (signal+noise) case and noise, or a fraction of noise, is removed during the (signal+music+noise) case. In the latter case, modeling only a fraction of the noise as the music component often masks any residual noise that is passed.
The memory 110 may comprise a device for storing and retrieving data or any combination thereof. The memory 110 may include non-volatile and/or volatile memory, such as a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a flash memory. The memory 110 may comprise a single device or multiple devices that may be disposed on one or more dedicated memory devices or on a processor or other similar device. Alternatively or in addition, the memory 110 may include an optical, magnetic (hard-drive) or any other form of data storage device.
The memory 110 may store computer code, such as a voice detector 114, a music detector 116, a rate adaptor 118, a noise estimator 120 and/or any other module. The computer code may include instructions executable with the processor 108. The computer code may be written in any computer language, such as C, C++, assembly language, channel program code, and/or any combination of computer languages. The memory 110 may store information in data structures such as the data storage 112 and one or more noise estimates 106. The I/O interface 122 may be used to connect devices such as, for example, microphones, and to other components internal or external to the system.
An example music detector 116 may use algorithms that estimate the presence and amount of music content. One approach may include the use of an autocorrelation-based periodicity detector that identifies periodic audio components including tones and harmonics that are typical of music content. This approach applies to both narrowband and wideband audio signals so the autocorrelation-based periodicity detector may be preceded by several other components. For example, a “sloppy” downsampler without an anti-alias filter may be used to increase the computational efficiency in the autocorrelation but allowing aliasing to increase partial content. An example “sloppy” downsampler may half the sample rate by discarded every other sample or mixing every other sample. Another example approach may comprise one or more filters to remove common periodic components (e.g. 60 Hz). The autocorrelation-based periodicity detector works well for certain types of music, but for other types, the inclusion of other detectors to recognize musical content (such as beat detectors or other methods) may be used to indicate the presence of music components.
A noise estimate 106 may be calculated using the adjusted adaption rate. The noise estimate calculation may be continuous, periodic or aperiodic. The adaption rate 204 may be used in the calculation of the new noise estimate 106. The noise estimator 120 may use the adaption rate 204 to generate the noise estimate 106. The adaption rate 204 may govern the noise estimator 120 where no adaption is made to the noise estimate 106 if music is present through to full adaption if no music is present. Other embodiments comprise techniques that may allow the noise estimator 120 to adapt in the presence of music. The music detector 116 may be incorporated in the noise estimator 120 or may alternatively be a cooperating component separate from the noise estimator 120.
The rate adaptor 118 may include the output of the music detector 116 and other detectors that may contribute to setting the adaption rate 204. In one embodiment the rate adaptor 118 may set the adaption rate 204 for the noise estimator 120 based only on the output of the music detector 116. In a second embodiment the rate adaptor 118 may set the adaption rate 204 for the noise estimator 120 based on multiple detectors including the music detector 116 and the voice detector 114.
A subband filter may process the received audio signal 102 to extract frequency information. The subband filter may be accomplished by various methods, such as a Fast Fourier Transform (FFT), critical filter bank, octave filter band, or one-third octave filter bank. Alternatively, the subband analysis may include a time-based filter bank. The time-based filter bank may be composed of a bank of overlapping bandpass filters, where the center frequencies have non-linear spacing such as octave, 3rd octave, bark, mel, or other spacing techniques.
The system and method for noise estimation with music detection described herein provides for generating a music classification for music content in an audio signal. The music detector may classify the audio signal as music or non-music. The non-music signal may be considered to be signal and noise. An adaption rate may be adjusted responsive to the generated music classification. A noise estimate is calculated applying the adjusted adaption rate.
All of the disclosure, regardless of the particular implementation described, is exemplary in nature, rather than limiting. The systems 100 and 200 may include more, fewer, or different components than illustrated in
The functions, acts or tasks illustrated in the figures or described may be executed in response to one or more sets of logic or instructions stored in or on computer readable media. The functions, acts or tasks are independent of the particular type of instructions set, storage media, processor or processing strategy and may be performed by software, hardware, integrated circuits, firmware, micro code and the like, operating alone or in combination. Likewise, processing strategies may include multiprocessing, multitasking, parallel processing, distributed processing, and/or any other type of processing. In one embodiment, the instructions are stored on a removable media device for reading by local or remote systems. In other embodiments, the logic or instructions are stored in a remote location for transfer through a computer network or over telephone lines. In yet other embodiments, the logic or instructions may be stored within a given computer such as, for example, a CPU.
While various embodiments of the system and method for maintaining the spatial stability of a sound field have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the present invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.
This application claims priority from U.S. Provisional Patent Application Ser. No. 61/599,767, filed Feb. 16, 2012, the entirety of which is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
5778335 | Ubale et al. | Jul 1998 | A |
7844453 | Hetherington | Nov 2010 | B2 |
20030128851 | Furuta | Jul 2003 | A1 |
Number | Date | Country |
---|---|---|
0 939 401 | Sep 1999 | EP |
WO 02091570 | Nov 2002 | WO |
WO 02091570 | Nov 2002 | WO |
WO 2008143569 | Nov 2008 | WO |
Entry |
---|
Codec-Independent Sound Activity Detection Based on the Entropy with Adaptive Noise Update. pp. 549-552. 2008. IEEE. |
European Search Report for corresponding European Application EP 13 15 5352.1, dated Jan. 7, 2014, pp. 1-10. |
Jarina, Roman et al., “Rhythm Detection for Speech-Music Discrimination in MPEG Compressed Domain,” IEEE, Digital Signal Processing, 14th International Conference, vol. 1 (2002) pp. 129-132, U.S. |
Martin, Rainer, “Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics,” IEEE Transactions on Speech and Audio Processing, vol. 9, No. 5 (Jul. 2001) pp. 504-512, U.S. |
Thoshkahna, Balaji et al., “A Speech-Music Discriminator Using HILN Model Based Features,” IEEE Acoustics, Speech and Signal Processing, vol. 5 (2006) pp. V-425-V-428. |
Wang, Jun et al., “Codec-independent Sound Activity Detection Based on the Entropy with Adaptive Noise Update,” IEEE Signal Processing, ICSP Guide, 9th International Conference (2008) pp. 549-552, U.S. |
Examination Report for corresponding to European Application No. 13 155 352.1 dated Aug. 7, 2015, 5 pages. |
Number | Date | Country | |
---|---|---|---|
20130226572 A1 | Aug 2013 | US |
Number | Date | Country | |
---|---|---|---|
61599767 | Feb 2012 | US |