The invention relates generally to audio systems and, more particularly, to a system and method for automatically adjusting the volume of an audio device to compensate only for noise that interferes with the intelligibility of speech or appreciation of music from said audio device.
The automatic volume control (AVC) of this invention is a fully automatic system and method for adjusting the volume of an audio output device, such as a car radio, in accordance with listener preferences, to compensate selectively for changing levels of ambient noise only in the time and frequency domains that interfere with intelligibility of speech or appreciation of music.
An example of an audio device is a car radio. Many sources of noise can interfere with hearing a car radio, including tire (road) noise, wind, engine noise, traffic (highway) noise, the fan of a heater or air conditioner, and noises made by the driver and passengers. The noise levels of all of these sources can change with time, depending on factors like the speed of the car or changing environmental conditions outside or inside the car. The noise levels can change abruptly or quasi-continuously or can be transient. Having repeatedly to manually adjust the volume of an audio device to compensate for changing noise levels is a nuisance, and, in a car, can compromise the safety of the occupants and others.
Not all noise, however, interferes with a listener's understanding or appreciation of the output of an audio device. And not all noise, therefore, would impel a listener to want to change the volume. For example, nearly all the information in speech is contained within the frequency interval 200 Hz to 6 kHz [L. E. Kinsler et al., Fundamentals of Acoustics, Third Ed. (John Wiley & Sons, NY, 1982), p. 283]. Generally, only the frequency components of noise within this interval can detract significantly from intelligibility of speech. Similarly, the intelligibility of full sentences in noise environments is substantially greater than the intelligibility of isolated words. Generally, only noises that persist long enough to mask more than a few words can detract significantly from intelligibility of speech.
Any system that attempts to compensate for all noise, regardless of frequency or duration, will generally overcompensate by raising or lowering the volume of an audio device to adjust for noise that is not significantly interfering with the ability to listen to the audio device. For example, the occurrence of a high-pitched whine above 6 kHz should not generally be cause for the volume of an audio device to be increased automatically, or to be decreased upon its cessation. Similarly, a transient shout within a car, or another car passing at high speed in the opposite direction, should not generally be cause for the volume to be changed.
What is needed, therefore, is not a means for automatically adjusting the volume of an audio device to compensate for changes in all ambient noise, but rather only that noise of a frequency and duration that detracts from the ability to listen to the audio device. That is, the AVC should have some means of discriminating significant noise, which persistently detracts from listening ability, from noise that is less consequential. One means of identifying such significant noise is to measure its interference with the intelligibility of speech. One measure of interference with intelligibility considered suitable for field use is the preferred speech interference level (PSIL), which is the arithmetic average of the noise levels in the three octave bands centered at 500, 1000, and 2000 Hz [ibid., p. 284].
To be fully automatic, an AVC should impose no need for additional manual controls on an audio device, other than possibly an on-off switch for the AVC feature. Listener preferences for volume should be established through normal operation of the audio device and a minimum of manual volume adjustments. The two key listener preferences that should be automatically registered by an AVC are the preferred signal-to-noise ratio and the preferred signal floor. The relevant signal-to-noise ratio is the ratio of the amplifier gain of an audio device to a suitable measure of significant noise, such as the PSIL. The preferred signal floor is the lowest amplifier gain acceptable to the listener, independent of how quiet the environment may be.
For an audio amplifier providing an audio signal to one or more speakers, this invention provides an automatic volume control to compensate for speech interference noise including: a microphone for detecting acoustic waves emanating from the one or more speakers and background noise, and in response for producing a corresponding signal; a phase correlator process for phase correlating the microphone and audio signals; an amplitude correlator process for amplitude correlating the phase correlated microphone and audio signals; a subtraction process for producing a signal corresponding to a difference between the phase and amplitude correlated microphone and audio signals; a transform process for producing over a period of time a signal corresponding to the amplitude of each frequency component of the difference signal within the spectrum of said transform process; a bandpass filter process for filtering the transform process produced signal to pass only those frequency components within selected bands; a speech-interference level calculation process for receiving the bandpass filtered frequency components and responsive to produce a signal corresponding to a combination of the amplitudes of the bandpass filtered frequency components; and a solver process for receiving the combined signal and responsive to produce according to an algorithm a signal for controlling the gain of the audio amplifier. Preferably the selected bands include the three octave bands centered at 500, 1000 and 2000 Hz. Preferably the transform process comprises a fast Fourier transform module. Preferably the combination of the amplitudes of the bandpass filtered frequency components is an arithmetic average of the noise levels in the octave bands. Preferably some or all the processes, algorithms and filtering are performed in and by a digital signal processor that receives both the digitized microphone signal and the digitized audio signal.
a) through 3(d) are exemplary representations of an audio amplifier signal and a corresponding AVC microphone signal at certain stages of processing according to this invention.
Referring to
As described thus far, the automatic control of the volume of the audio device by the AVC 2 is within the state of the art.
Referring to
The correlators, 11 and 12, and the signal subtraction process 13 cooperate to separate the sound of the speakers from the background noise so that the background noise can be processed separately. The correlators correlate the digitized inputs, 21 and 22, from the two A/Ds, 8B and 10B, so that they can be subtracted from each other by the signal subtraction process 13 with the remainder being the background noise, as illustrated by
It might be possible, using factory settings, to subtract the inputs 21 and 22 directly without first correlating them, but the tolerance for jitter between the inputs 21 and 22 is so demanding that over time the system characteristics may drift and detune. Components 11 and 12 can correlate the inputs 21 and 22 continuously in near real time, if necessary, or only at each start-up of the audio device, if such is sufficient. Both the phase and amplitude can be correlated with respect to inputs 21 and 22 over multiple processing periods for greater accuracy.
Referring again to
The operating characteristics of a preferred embodiment of an FFT module 14 can be best described as follows. Let the sampling rate of the A/D converters 8B and 10B be s samples/second. Let the number of samples to be processed in each processing period of the FFT module be N, where N must be an integer-power of 2. Then each processing period is N/s, and the time from receiving the first sample to the last in each processing period is
T=(N−1)/s. (1)
The frequency resolution of the Fourier transform is
Δf=1/T=s/(N−1). (2)
The highest frequency component of the Fourier transform is
fm=NΔf/2=[N/(N−1)]s/2. (3)
In the preferred embodiment the FFT module described below is particularly well suited to calculating the preferred speech interference level (PSIL) from the noise background. The PSIL is the arithmetic average of the noise levels in the three octave bands centered at 500, 1000, and 2000 Hz, that is, the three octave bands from 354 to 707 Hz, from 707 to 1414 Hz, and from 1414 to 2828 Hz, respectively.
The following design guidelines are preferred for an accurate calculation of the PSIL:
(a) The frequency resolution of the Fourier transform should be finer than about 40 Hz, that is,
Δf=s/(N−1)≦40 Hz, (4)
in order to get good statistics on the noise level by having at least of the order of 10 frequency components, even in the lowest octave band.
(b) The processing period of the FFT module should be no longer than about 25 ms, that is,
T=(N−1)/s≦25 ms, (5)
in order to provide at least of the order of 10 PSIL calculations to the solver 17 every quarter second or so. A quarter second is less than or about the time over which the AVC should begin to respond to a rapidly changing noise background.
(c) The highest frequency component of the Fourier transform should be at least about 2800 Hz, that is,
fm=[N/(N−1)]s/2≧2800 Hz, (6)
in order to get good statistics on the noise level in the highest octave band by populating it fully.
Combining these design guidelines, Equations (4)-(6), leads to the following point design as an example of an FFT module that is particularly well suited to calculating the PSIL for an AVC: N=128; s=5600 Hz; T=22.7 ms; Δf=44.1 Hz; fm=2822 Hz.
After each processing period, the FFT module 14 sends a signal as an input 28 to the bandpass filters 15, the signal comprising an amplitude for each of the frequency components of the FFT spectrum. With the point design in the preferred embodiment, the FFT calculates 65 amplitudes each processing period for the frequency components fj=jΔf=j(44.1 Hz), where j=0, 1, 2, . . . , 64. In the preferred embodiment, the 8 frequency components, f9=397 Hz through f16=706 Hz, populate the lowest octave of the PSIL. The 16 frequency components, f17=750 Hz through f32=1411 Hz, populate the middle octave of the PSIL. The 32 frequency components, f33=1455 Hz through f64=2822 Hz, populate the highest octave of the PSIL.
The bandpass filters 15 pass only those frequency components within bands 29 that are used by the speech-interference noise level (SIL) calculator 16. In the preferred embodiment as described above, in which 16 calculates the PSIL, the bands 29 include the 56 frequency components from f9 through f64. The SIL calculator 16 calculates the arithmetic average (in dB) of the noise levels in the three (octave) frequency bands 29 passed by the filters 15 and sends as an input 30 to the solver 17 a single PSIL value (in dB) every processing period (N/s=22.9 ms in the preferred embodiment).
The solver 17 calculates a gain control signal 19, subject to certain constraints 32 to be sent to the audio amplifier 4B every processing period. The purpose of the solver 17 is to calculate a gain control signal 19 that responds proportionately to changing noise levels of a duration sufficient to interfere with intelligibility of speech or appreciation of music, and that responds negligibly to fluctuations of noise levels at the processing cycle frequency, s/N, or to brief noise transients. The response of the gain control signal 19 must be somewhat dilatory to allow the solver 17 to distinguish SIL changes of significant duration from insignificant transients. But it should not be so dilatory as to seem to the listener to be unresponsive to substantial changes of SIL.
In the preferred embodiment, the model used for the solver 17 is that of a driven damped harmonic oscillator. The gain control signal 19 (in dB), a(t), as a function of time t satisfies the second-order differential equation,
a″(t)+bω0a′(t)+ω02a(t)=ω02[S(t)+R0], (7)
where a prime denotes a derivative with respect to time, b is a damping constant, ω0 is a constant frequency indicative of the ‘stiffness’ of the response, S(t) is the SIL (in dB), and R0 is the listener's preferred signal-to-SIL ratio (in dB). (R0 is one of the constraints 32 imposed on the solver 17 by user interaction through the manual volume control 6.)
In terms of a normalized gain control signal, A(t)≡a(t)−R0, Equation (7) may be written as
A″(t)+bω0A′(t)+ω02[A(t)−S(t)]=0. (8)
For the ith processing cycle, this model is implemented in the solver 17 by the following algorithm:
Ai+1′=Ai′+(N/s)Ai″; (9a)
if |Ai−S1|≧r0, then Ai+1=Ai+(N/s)Ai″; (9b)
otherwise Ai+1″=Ai; (9c)
Ai+1″=ω02Si+1−bω0Ai+1′−ω02Ai+1; (9d)
if Ai+1≦Amin, then Ai+1=Amin. (9e)
The constant r0 (in dB) is a threshold difference of the normalized gain control signal, A(t), from the SIL, S(t), below which the gain control signal remains unchanged. The constant Amin (in dB) is the user-preferred floor of the normalized gain control signal, A(t).
The constant r0 is intended to desensitize the algorithm to most of the high-frequency fluctuations of the SIL in an otherwise constant noise background, and to keep A(t) constant in such an environment. A typical factory setting for r0 might be about 1 dB. The constant r0 could also be made adaptive by making it proportional to the root-mean-square fluctuation of the SIL, for example, at the cost of additional processing.
The constant Amin is the listener's preferred minimum normalized gain control signal, which is generally independent of how quiet the environment may become. The listener establishes or re-establishes Amin through the manual volume control 6 by adjusting the volume higher in quiet environments.
The initial conditions for the algorithm in Equations (9) at system start-up (t=0), or whenever the user establishes new constraints 32 through the manual volume control 6 (
Constraints 18 are applied as inputs 32 to the solver 17. Generally, it is preferable to apply at least two constraints: (1) R0, the listener's preferred signal-to-SIL ratio (in dB); and (2) Amin, the listener's preferred floor for the normalized gain control signal (in dB). There are many variations of algorithms for providing these and other constraints 32 from the constraint module 18. One example follows.
Any time the manual volume control 6 is adjusted (including at start-up of the audio device in
where wi is a normalized weighting function. An example of a normalized weighting function that weights SILs in processing periods near the end of an adjustment more heavily is w1=2i/(m+1). A typical time for calculating a weighted average of SILs might be about a quarter second, or about 11 processing periods in the example given above.
Any time a weighted average of SILs is below some threshold value SILt, and the manual volume control 6 is adjusted upward, a new value of Amin is calculated by module 18 and sent as an input to the solver 17. (The threshold SILt may be, for example, the lowest weighted average of SILs since start-up that did not prompt a manual volume adjustment during some latency period.) The new value of Amin is the normalized gain control signal established manually by the end of each such adjustment. When these conditions are met for establishing a new Amin, a new R0 is not also calculated. That is, if Amin is changed by a manual volume adjustment, R0 remains unchanged by that adjustment.
Any further manual volume adjustments establish new values of Amin and R0, in accordance with the same algorithms.
The foregoing description and drawings were given for illustrative purposes only, it being understood that the invention is not limited to the embodiments disclosed, but is intended to embrace any and all alternatives, equivalents, modifications and rearrangements of elements falling within the scope of the invention as defined by the following claims.
Number | Name | Date | Kind |
---|---|---|---|
4045748 | Filliman | Aug 1977 | A |
4306115 | Humphrey | Dec 1981 | A |
4476571 | Tokumo et al. | Oct 1984 | A |
4479237 | Sugasawa | Oct 1984 | A |
4553257 | Mori et al. | Nov 1985 | A |
4628526 | Germer | Dec 1986 | A |
4864246 | Kato et al. | Sep 1989 | A |
4933987 | Parks | Jun 1990 | A |
5027432 | Skala et al. | Jun 1991 | A |
5081682 | Kato et al. | Jan 1992 | A |
5450494 | Okubo et al. | Sep 1995 | A |
5509081 | Kusama | Apr 1996 | A |
5666426 | Helms | Sep 1997 | A |
5668744 | Varadan et al. | Sep 1997 | A |
5706354 | Stroehlein | Jan 1998 | A |
6700441 | Zhang et al. | Mar 2004 | B1 |
6868162 | Jubien et al. | Mar 2005 | B1 |
20040047480 | Roeck et al. | Mar 2004 | A1 |
20040076302 | Christoph | Apr 2004 | A1 |
20050163331 | Gao et al. | Jul 2005 | A1 |
20070003078 | Escott et al. | Jan 2007 | A1 |
20070121979 | Zhu et al. | May 2007 | A1 |
20070291959 | Seefeldt | Dec 2007 | A1 |