The present invention relates to digital signal processing of audio and speech, and more particularly to architectures and methods for digital camera front-ends.
Imaging and audio/visual capabilities have become the trend in consumer electronics. Digital cameras, digital camcorders, and camera cellphones are common, and many other new gadgets are evolving in the market. Advances in large resolution CCD/CMOS sensors coupled with the availability of low-power digital signal processors (DSPs) has led to the development of digital cameras with both high resolution image and short audio/visual clip capabilities. The high resolution (e.g., sensor with a 2560×1920 pixel array) provides quality offered by traditional film cameras.
a shows typical functional blocks of digital camera control and image processing (the “image pipeline”). The automatic focus, automatic exposure, and automatic white balancing are referred to as the 3A functions; and the image processing includes functions such as color filter array (CFA) interpolation, gamma correction, white balancing, color space conversion, and JPEG/MPEG compression/decompression (JPEG for single images and MPEG for video clips). A lens stepper motor moves the lens to adjust focus (optical zoom), and a (directional) microphone picks up sounds from the scene being imaged for audio/visual recording.
Typical digital cameras provide a capture mode with full resolution image or audio/visual clip processing plus compression and storage, a preview mode with lower resolution processing for immediate display, and a playback mode for displaying stored images or audio/visual clips.
In movie capture applications, sound is recorded along with and synchronized to the captured video frames. The sound signal is converted to an electrical signal by the microphone and then converted to a digital signal by an ADC. Often, the intent of movie capture is to record speech associated with the video (either verbal comments of the camera operator or speech of the human subjects in the scene under movie capture). While capturing video, it is possible to adjust lens focus (zoom in/zoom out). When active, the lens stepper motor causes audible noise which gets added onto the speech signal that is picked up by the microphone and recorded. The microphone also picks up background noises of various types.
However, digital cameras typically have limited computing power and limited battery life, and this implies a problem for effective noise suppression (both audio and visual).
The present invention provides mitigation of digital camera lens motor noise by activation of bandpass filtering, and cascaded band-pass and notch filtering to enhance speech intelligibility and/or use of different filter bank based on camera activity or nature of noise (e.g. zoom in and zoom out) and/or use of Automatic Level Controller (ALC) to maintain signal energy during filter operations and/or marking the audio recorded during lens motor operation for later noise suppression processing.
a-1c show camera components plus filter and a filter cross-coherence for noisy input.
a-2b are flowcharts.
a-3c show functions of a image pipeline, processor, and internet communication.
a-4c show a lowpass filter characteristics.
a-5c illustrate a highpass filter characteristics.
a-10b show experimental results.
a-12b are block diagrams of hardware implementations.
Preferred embodiment methods of lens motor noise mitigation for digital cameras apply: (1) bandpass filtering to the audio input recorded during camera lens motor operation in order to make speech more intelligible and/or (2) cascaded bandpass and multiple stages of notch filters to get desired magnitude spectrum and/or (3) use of multiple stages of HPF or LPF to get desired attenuation and magnitude curve and/or (4) use of noise masking principles to reduce the number of filter stages and/or (5) Automatic level control to maintain signal energy after filtering noise and/or (6) filter bank selection based on camera activity/noise characteristic and/or (7) hardware and software realization cascaded filter stages and/or (8) marking of such audio segments for later noise suppression processing or bandpass filtering during playback.
Preferred embodiment systems (camera cellphones, PDAs, notebook computers, et cetera) perform preferred embodiment methods with any of several types of hardware: digital signal processors (DSPs), general purpose programmable processors, application specific circuits, or systems on a chip (SoC) such as combinations of a DSP and a RISC processor together with various specialized programmable accelerators.
Preferred embodiment cameras and methods have objectives including:
The effect of lens motor noise audibility, added to a captured speech signal, depends on:
Additive noise has a spectrum which adds onto the speech spectrum:
X
noisy(k)=X(k)+N(k)
and various noise suppression methods are known, such as spectral subtraction.
Noise may be stationary or non-stationary. Stationary noise characteristics remain the same with respect to time and spectrum; whereas, non-stationary noise characteristics vary with time and/or spectrum.
Microphone rumble noise is low frequency sound caused by wind, speaker is close to microphone, and/or mechanical sounds. Rumble noise (<100 Hz) typically lies outside speech spectrum. Thus the fundamental frequency of rumble can be filtered out by highpass filter with a low cut-off frequency.
Lens motor noise is wideband with frequency content existing over the entire speech spectrum. The noise can be considered as segmented stationary noise (i.e. the noise when taken in short time windows remains stationary). The lens motor noise further has the characteristic of having significant power at low frequencies, high frequencies, and distributed narrow-band noise as shown in
By reducing the noise power outside of the speech spectrum, the SNR for the speech signal can be improved. The speech signal bandwidth is about 50-5000 Hz. The prominent speech section is around 150-3500 Hz which is the telephone voice band. By band-limiting (i.e., bandpass filtering) the audio input signal to 100-5000 Hz, noise power can be reduced without adversely affecting the speech signal. This increasing SNR increases speech intelligibility within the noisy input audio signal. Indeed, bandpass filtering to an even narrower band, such as 150-3500 Hz, will further increase speech intelligibility.
Since, the lens motor is controlled within the camera, the start time and duration for which the lens motor is running is known in the camera processor. Thus, lens motor bandpass filtering only needs to be turned on during the operation of lens adjustment. This limited duration bandpass filtering would aid in preserving a natural (e.g., wideband) sound of speech when the lens motor is inactive and speech intelligibility is less of a problem.
Band-limiting the ADC output is effective for speech signals embedded in background noise. Note that anti-aliasing analog filters may precede some types of ADC (and could be part of the microphone high-frequency roll-off), but the filter cut-off frequency would correspond to one-half of the sampling rate regardless of lens motor noise.
Analog microphone output is converted to digital data by analog-to-digital converters (ADCs). ADCs for audio are typically delta-sigma modulators with decimation filters (to convert oversampled digital data to the desired sampling rate), and gain controllers (preamp) and optional anti-aliasing filters (to attenuate high frequency noise).
In-order to prevent aliasing resulting from the downsampling in ADC, the digital data needs to be band-limited to the Nyquist rate (half-sampling rate).
The decimation filter in a delta-sigma ADC would act as a lowpass filter with cut-off at half the sampling rate. Thus in the case of 8 KHz sampling rate for the ADC setting, the speech signal is limited to 4 KHz maximum frequency. However, in case of 16 KHz sampling rate the ADC output contains frequency components up to 8 KHz.
In order to limit the signal bandwidth to that of the prominent speech signal, so as to reduce noise power (increase SNR), a low-pass filter would be needed.
The low frequency noise can be removed by the use of a highpass filter, without affecting signal power. Thus a bandpass filter is suitable for improving SNR and speech intelligibility. The bandpass filter can be realized by cascading the lowpass and highpass filters shown in
As can be seen from
b illustrates the cascading of filter stages, and
In normal recording without zoom operations (lens stepper motor is inactive), 1- or 2-stage highpass filters can be used to eliminate microphone rumble. Additionally, highpass filters can be used to reduce or minimize background noise (stationary and non-stationary).
To facilitate advanced filtering options available on PCs, which typically have much greater processing power than digital cameras, the raw unfiltered audio data would be useful. In this case, the camera will mark the audio segments in the container (e.g. Quicktime) wherein the lens motor noise is present. The bandpass filtering on the camera recorder is either disabled or the noise marking is added in addition to the bandpass filtering. By disabling the bandpass filtering, non-speech data can be recorded in natural form within the allowed frequencies for the selected sampling frequency.
In the playback path of the camera, the bandpass filter is activated if it had been disabled during capture of the audio segment and the audio segment contains noise marking; see
In the case of transfer of movies captured by the digital camera to a PC, a software module running on the PC can be used for post-processing the recorded audio for enhancing SNR by known noise suppression methods, such as spectral subtraction. The enhanced audio can then replace the audio stored within the container.
Second order IIR lowpass and highpass filters can be used in cascade to realize the bandpass filter as shown in
Alternatively, biquad filters can be used for realizing different frequency responses by programming the coefficients. Recall that a biquad filter has a transfer function as a ratio of two quadratics:
H(z)=(b0+b1z1+b2z2)/(a0+a1z1+a2z2)
There are only five independent coefficients, and typically either b0 or a0 is taken equal to 1. Solving Y(z)=H(z)X(z) for y[n] gives the usual IIR filter implementation form (for b0=1):
y[n]=a
0
*x[n]+a
1
*x[n 1]+a2*x[n 2]b1*y[n 1]b2*y[n 2]
a shows the magnitude response of a lowpass biquad with filter coefficients as follows: a0=0.0793, a1=0.1335, a2=0.0793, b1=−1.1064, b2=0.3983. Note that the frequency roll-off in
a shows the magnitude response of a highpass biquad with filter coefficients as follows: a0=0.9617, a1=−1.9233, a2=0.9617, b1=−1.9219, b2=0.9248. The frequency roll-off in
When the noise energy contained in low frequency is removed and noise energy at high frequency is reduced, the increased SNR of the output signal allows for masking of the noise signal while preserving intelligibility of speech. Cascading the filters of
a shows experimental results for a speech signal embedded in motor noise, sneeze, and thud on the microphone. The upper panel shows the histogram prior to filtering, and the lower panel the histogram after speech-intelligibility filtering. The filter was a cascade of a Chebyshev-II second order lowpass filter and a second order IIR highpass filter. Sampling rate of input signal is 16 KHz.
a-6c shows the filter response of HPF with coefficients as follows: a0=0.846459, a1=−1.692918, a2=0.846459, b1=−1.669203, b2=0.716633. The cut-off frequency is at 300 Hz, and attenuation is around 20 dB at 100 Hz. The group delay is small and the phase response is close to linear. A cascade of 2 stages of the same HPF filter would provide attenuation of 40 dB at 100 Hz.
a-7c shows the filter response of LPF with coefficients as follows: a0=0.227117, a1=0.454235, a2=0.227117, b1=−0.276664, b2=0.185136. The cut-off frequency is at 1700 Hz, and attenuation is around 10 dB at 3 KHz. The LPF has slower roll-off compared to HPF in order to maintain the speech signal energy at high frequencies which is important for intelligibility. The group delay is small.
a-8c shows the filter response of notch filter with coefficients as follows: a0=0.910339, a1=−0.925094, a2=0.910339, b1=−0.925094, b2=0.820678. The cut-off frequency is at 1200, 1450 Hz with as much as 40 dB attenuation at the centre of stop-band. The purpose of band-stop or notch filters is to reduce the noise energy by attenuating the frequencies where noise energy is concentrated. The impact on speech signal is minimal with respect to intelligibility and signal energy since speech signal consists of fundamental and harmonics.
a-9c shows the filter response of notch filter with coefficients as follows: a0=0.894168, a1=−0.488815, a2=0.894168, b1=−0.488815, b2=0.788336. The cut-off frequency is at 1500, 1800 Hz with as much as 50 dB attenuation at the centre of stop-band.
The cascade of filters in
b illustrates noise reduction by use of cascaded second order Butterworth filters: highpass filter with cut-off frequency at 300 Hz, lowpass filter with cut-off frequency at 1700 Hz, and two bandstop (notch) filters with cut-off frequencies 1500-1800 Hz and 1200-1450 Hz. The input signal is additive lens motor noise and speech signal sampled at 8 kHz.
Scaling may follow the filtering to achieve unity gain. Biquad filters can easily be implemented in fixed-point software or hardware.
In order to reduce gate count for cascaded filters in hardware, loopback can be used with programmability of coefficients and context (past output and input samples) save/restore features makes only a single hardware stage necessary.
The computational complexity required by spectral domain noise subtraction is not affordable in most digital cameras. Also, the nature of noise is variable as can be seen above.
With the addition of highpass filtering in the case of audio sampled at 8 kHz or a bandpass filter with bandwidth covering the prominent speech spectrum in the case of 16 kHz sampling, the background noise power can be reduced to improve SNR and intelligibility of the speech signal. In the case that the natural sound needs to be preserved and only the lens motor noise is to be eliminated, the bandpass filter can be turned on only during the periods of motor operation.
Filter design can take advantage of Equal Loudness Curves which indicate that the human ear is most sensitive to sound in the 3-4 kHz band. A second order IIR lowpass filter does not have a sharp cut-off, so use gradual attenuation starting around 3 kHz. The low pass filter can be used for signal sampled at rates starting from 4 KHz.
The highpass filter eliminates low frequency noises like rumble and wind noise from the signal captured by the microphone. In the case the noise attenuation is not sufficient with a single stage highpass, use cascaded highpass stages of second order IIR filters.
Narrow band noises (e.g., hum) can be eliminated by the use of notch filters. The biquad filter structure can be programmed for notch filter realizations.
After filter stages, an optional automatic level controller (ALC), as shown in
In one embodiment, the results may be:
This application claims priority from provisional application No. 60/944,158, filed Jun. 15, 2007, which is herein incorporated by reference.
Number | Date | Country | |
---|---|---|---|
60944158 | Jun 2007 | US |