1. Technical Field
The present disclosure relates to the field of signal processing. In particular, to a system and method for dynamic residual noise shaping.
2. Related Art
A high frequency hissing sound is often heard in wideband microphone recordings. While the high frequency hissing sound, or hiss noise, may not be audible when the environment is loud, it becomes noticeable and even annoying when in a quiet environment, or when the recording is amplified. The hiss noise can be caused by a variety of sources, from poor electronic recording devices to background noise in the recording environment from air conditioning, computer fan, or even the lighting in the recording environment.
The system may be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like referenced numerals designate corresponding parts throughout the different views.
Disclosed herein are a system and method for dynamic residual noise shaping. Dynamic shaping of residual noise may include, for example, the reduction of hiss noise.
U.S. patent application Ser. No. 11/923,358 filed Oct. 24, 2007 and having common inventorship, the entirety of which is incorporated herein by reference, describes a system and method for dynamic noise reduction. This document discloses principles and techniques to automatically adjust the shape of high frequency residual noise.
In a classical additive noise model, a noisy audio signal is given by
y(t)=x(t)+n(t) (1)
where x(t) and n(t) denote a clean audio signal, and a noise signal, respectively.
Let |Yi,k|, |Xi,k|, and |Ni,k| designate, respectively, the short-time spectral magnitudes of the noisy audio signal, the clean audio signal, and noise signal at the ith frame and the kth frequency bin. A noise reduction process involves the application of a suppression gain Gi,k to each short-time spectrum value. For the purpose of noise reduction the clean audio signal and the noise signal are both estimates because their exact relationship is unknown. As such, the spectral magnitude of an estimated clean audio signal is given by:
|{circumflex over (X)}i,k|=Gi,k·|Yi,k| (2)
Where Gi,k are the noise suppression gains. Various methods are known in the literature to calculate these gains. One example further described below is a recursive Wiener filter.
A typical problem with noise reduction methods is that they create audible artifacts such as musical tones in the resulting signal, the estimated clean audio signal |{circumflex over (X)}i,k|. These audible artifacts are due to errors in signal estimates that cause further errors in the noise suppression gains. For example the noise signal |Ni,k| can only be estimated. To mitigate or mask the audible artifacts, the noise suppression gains may be floored (e.g. limited or constrained):
Ĝi,k=max(σ,Gi,k) (3)
The parameter σ in (3) is a constant noise floor, which defines a maximum amount of noise attenuation in each frequency bin. For example, when σ is set to 0.3, the system will attenuate the noise by a maximum of 10 dB at frequency bin k. The noise reduction process may produce limited noise suppression gains that will range from 0 dB to 10 dB at each frequency bin k.
The conventional noise reduction method based on the above noise suppression gain limiting applies the same maximum amount of noise attenuation to all frequencies. The constant noise floor in the noise suppression gain limiting may result in good performance for conventional noise reduction in narrowband communication. However, it is not ideal for reducing hiss noise in high fidelity audio recordings or wideband communications. In order to remove the hiss noise, a lower constant noise floor in the suppression gain limiting may be required but this approach may also impair low frequency voice or music quality. Hiss noise may be caused by, for example, background noise or audio hardware and software limitations within one or more signal processing devices. Any of the noise sources may contribute to residual noise and/or hiss noise.
Unlike conventional noise reduction methods that do not change the overall shape of background noise after processing, a dynamic residual noise shaping method may automatically detects hiss noise 106 and once hiss noise 106 is detected, may apply a dynamic attenuation floor to adjust the high frequency noise shape so that the residual noise may sound more natural after processing. For lower frequencies or when no hiss noise is detected in an input signal (e.g. a recording), the method may apply noise reduction similar to conventional noise reduction methods described above. Hiss noise as described herein comprises relatively higher frequency noise components of residual or background noise. Relatively higher frequency noise components may occur, for example, at frequencies above 500 Hz in narrowband applications, above 3 kHz in wideband applications, or above 5 kHz in fullband applications.
The frequency transformation of the audio signal 102 may be processed by a subband signal power module 204 to produce the spectral magnitude of the audio signal |Yi,k|. The subband signal power module 204 may also perform averaging of frequency bins over time and frequency. The averaging calculation may include simple averages, weighted averages or recursive filtering.
A subband background noise power module 206 may calculate the spectral magnitude of the estimated background noise |{circumflex over (N)}i,k| in the audio signal 102. The background noise estimate may include signal information from previously processed frames. In one implementation, the spectral magnitude of the background noise is calculated using the background noise estimation techniques disclosed in U.S. Pat. No. 7,844,453, which is incorporated in its entirety herein by reference, except that in the event of any inconsistent disclosure or definition from the present specification, the disclosure or definition herein shall be deemed to prevail. In other implementations, alternative background noise estimation techniques may be used, such as a noise power estimation technique based on minimum statistics.
A noise reduction module 208 calculates suppression gains Gi,k using various methods that are known in the literature to calculate suppression gains. An exemplary noise reduction method is a recursive Wiener filter. The Wiener suppression gain, or noise suppression gains, is defined as:
Where S{circumflex over (N)}Rpriori
S{circumflex over (N)}Rpriori
S{circumflex over (N)}Rpost
Where |{circumflex over (N)}i,k| is the background noise estimate.
A hiss detector module 210 estimates the amount of hiss noise in the audio signal. The hiss detector module 210 may indicate the presence of hiss noise 106 by analyzing any combination of the audio signal, the spectral magnitude of the audio signal |Yi,k|, and the background noise estimate |{circumflex over (N)}i,k|. An exemplary hiss detector method utilized by the hiss detector module 210 first may convert the short-time power spectrum of a background noise estimation, or background noise level, into the dB domain by:
B(f)=20 log10|N(f)|. (7)
The background noise level may be estimated using a background noise level estimator. The dB power spectrum B(f) may be further smoothed in frequency to remove small dips or peaks in the spectrum. A pre-defined hiss cutoff frequency f0 may be chosen to divide the whole spectrum into a low frequency portion and a high frequency portion. The dynamic hiss noise reduction may be applied to the high frequency portion of the spectrum.
Hiss noise 106 is usually audible in high frequencies. In order to eliminate or mitigate hiss noise after noise reduction, the residual noise may be constrained to have a target noise shape, or have certain colors. Constraining the residual noise to have certain colors may be achieved by making the residual noise power density to be proportional to 1/fβ. For instance, white noise has a flat spectral density, so β=0, while pink noise has β=1, and brown noise has β=2. The greater the β value, the quieter the noise in high frequencies. In an alternative embodiment, the residual noise power density may be a function that has flatter spectral density at lower frequencies and a more slopped spectral density at higher frequencies.
The target residual noise dB power spectrum is defined by:
T(f)=B(f0)−10β log10(f/f0) (8)
The difference between the background noise level and the target noise level at a frequency may be calculated with a difference calculator. Whenever the difference between the noise estimation and the target noise defined by:
D(f)=B(f)−T(f) (9)
is greater than a hiss threshold δ, hiss noise is detected and a dynamic floor may be used to do substantial noise suppression to eliminate hiss. A detector may detect when the residual background noise level exceeds the hiss threshold. The dynamic suppression factor for a given frequency above the hiss cutoff frequency f0 may be given by:
Alternatively, for each bin above the hiss cutoff frequency bin k0 the dynamic suppression factor may be given by:
The dynamic noise floor may be defined as:
By combining the dynamic floor described above with the conventional noise reduction method, the color of residual noise may be constrained by a pre-defined target noise shape, and the quality of the noise-reduced speech signal may be significantly improved. Below the hiss cutoff frequency f0, a constant noise floor may be applied. The hiss cutoff frequency f0 may be a fixed frequency, or may be adaptive depending on the noise spectral shape.
A suppression gain limiting module 212 may limit the noise suppression gains according to the result of the hiss detector module 210. In an alternative to flooring the noise suppression gains by a constant floor as in equation (3), the dynamic hiss noise reduction approach may use the dynamic noise floor defined in equation (12) to estimate the noise suppression gains:
Ĝi,k=max(η(k),Gi,k). (13)
A noise suppression gain applier 214 applies the noise suppression gains to the frequency transformation of the audio signal 102.
The method according to the present description may be implemented by computer executable program instructions stored on a computer-readable storage medium. A system for dynamic hiss reduction may comprise electronic components, analog and/or digital, for implementing the processes described above. In some embodiments the system may comprise a processor and memory for storing instructions that, when executed by the processor, enact the processes described above.
The memory 708 may comprise a device for storing and retrieving data or any combination thereof. The memory 708 may include non-volatile and/or volatile memory, such as a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a flash memory. The memory 708 may comprise a single device or multiple devices that may be disposed on one or more dedicated memory devices or on a processor or other similar device. Alternatively or in addition, the memory 708 may include an optical, magnetic (hard-drive) or any other form of data storage device.
The memory 708 may store computer code, such as the hiss detector 210, the noise reduction filter 208 and/or any component. The computer code may include instructions executable with the processor 704. The computer code may be written in any computer language, such as C, C++, assembly language, channel program code, and/or any combination of computer languages. The memory 708 may store information in data structures such as the calculated noise suppression gains 402 and the modified noise suppression gains 406.
The memory 708 may store instructions 710 that when executed by the processor, configure the system to enact the system and method for reducing hiss noise described herein with reference to any of the preceding
All of the disclosure, regardless of the particular implementation described, is exemplary in nature, rather than limiting. The system 200 may include more, fewer, or different components than illustrated in
The functions, acts or tasks illustrated in the figures or described may be executed in response to one or more sets of logic or instructions stored in or on computer readable media. The functions, acts or tasks are independent of the particular type of instructions set, storage media, processor or processing strategy and may be performed by software, hardware, integrated circuits, firmware, micro code and the like, operating alone or in combination. Likewise, processing strategies may include multiprocessing, multitasking, parallel processing, distributed processing, and/or any other type of processing. In one embodiment, the instructions are stored on a removable media device for reading by local or remote systems. In other embodiments, the logic or instructions are stored in a remote location for transfer through a computer network or over telephone lines. In yet other embodiments, the logic or instructions may be stored within a given computer such as, for example, a central processing unit (“CPU”).
While various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the present invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.
This application is a continuation of and claims the benefit of priority to U.S. patent application Ser. No. 13/768,108 and further claims priority to U.S. Provisional Patent Application Ser. No. 61/599,762, filed Feb. 16, 2012, the entirety of both applications are incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
3921077 | Suzuki | Nov 1975 | A |
4641344 | Kasai | Feb 1987 | A |
5926334 | Suzuki | Jul 1999 | A |
6523003 | Chandran | Feb 2003 | B1 |
7844453 | Hetherington | Nov 2010 | B2 |
8015002 | Li | Sep 2011 | B2 |
20060251268 | Hetherington et al. | Nov 2006 | A1 |
20070170992 | Cho | Jul 2007 | A1 |
20080075300 | Isaka | Mar 2008 | A1 |
20110125490 | Furuta et al. | May 2011 | A1 |
20140289630 | Duwenhorst | Sep 2014 | A1 |
Number | Date | Country |
---|---|---|
2 056 296 | May 2009 | EP |
Entry |
---|
European Search Report for corresponding European Application No. 15160720.7 dated Jul. 1, 2015, 5 pages. |
Number | Date | Country | |
---|---|---|---|
20150348568 A1 | Dec 2015 | US |
Number | Date | Country | |
---|---|---|---|
61599762 | Feb 2012 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13768108 | Feb 2013 | US |
Child | 14821364 | US |