1. Field of the Invention
The present invention relates to sound signal enhancement.
2. State of the Art
For the hearing impaired, clearly hearing speech is very difficult for hearing aid wearers, especially in noisy locations. Discrimination of the speech signal is confused because directional cues are not well received or processed by the hearing impaired, and the normal directional cues are poorly preserved by standard hearing aid microphone technologies. For this reason, electronic directionality has been shown to be very beneficial, and directional microphones are becoming common in hearing aids. However, there are limitations to the amount of directionality achievable in microphones alone. Therefore, further benefits are being sought by the use of beamforming techniques, utilizing the multiple microphone signals available for example from a binaural pair of hearing aids.
Beamforming is a method whereby a narrow (or at least narrower) polar directional pattern can be developed by combining multiple signals from spatially separated sensors to create a monaural, or simple, output signal representing the signal from the narrower beam. Another name for this general category of processing is “array processing,” used, for example, in broadside antenna array systems, underwater sonar systems and medical ultrasound imaging systems. Signal processing usually includes the steps of adjusting the phase (or delay) of the individual input signals and then adding (or summing) them together. Sometimes predetermined, fixed amplitude weightings are applied to the individual signals prior to summation, for example to reduce sidelobe amplitudes.
With two sensors, it is possible to create a direction of maximum sensitivity and a null, or direction of minimum sensitivity.
One known beamforming algorithm is described in U.S. Pat. No. 4,956,867, incorporated herein by reference. This algorithm operates to direct a null at the strongest noise source. Since it is assumed that the desired talker signal is from straight ahead, a small region of angles around zero degrees is excluded so that the null is never steered to straight ahead, where it would remove the desired signal. Because the algorithm is adaptive, time is required to find and null out the interfering signal. The algorithm works best when there is a single strong interferer with little reverberation. (Reverberant signals operate to create what appears to be additional interfering signals with many different angles of arrival and times of arrival—i.e., a reverberant signal acts like many simultaneous interferers.) Also, the algorithm works best when an interfering signal is long-lasting—it does not work well for transient interference.
The prior-art beamforming method suffers from serious drawbacks. First, it takes too long to acquire the signal and null it out (adaptation takes too long). Long adaptation time creates a problem with wearer head movements (which change the angle of arrival of the interfering signal) and with transient interfering signals. Second, it does not beneficially reduce the noise in real life situations with numerous interfering signals and/or moderate-to-high reverberation.
A simpler beamforming approach is known from classical beamforming. With only two signals (e.g., in the case of binaural hearing health care, one from the microphone at each ear) classical beamforming simply sums the two signals together. Since it is assumed that the target speech is from straight ahead (i.e., that the hearing aid wearer is looking at the talker), the speech signal in the binaural pair of raw signals is highly correlated, and therefore the sum increases the level of this signal, while the noise sources, assumed to be off-axis, create highly uncorrelated noise signals at each ear. Therefore, there is an enhancement of the desired speech signal over that of the noise signal in the beamformer output. This enhancement is analogous to the increased sensitivity of a broadside array to signals coming from in front as compared to those coming from the side.
This classical beamforming approach still does not optimize the signal-to-noise (voice-to-background) ratio, however, producing only a maximum 3 dB improvement. It is also fixed, and therefore cannot adjust to varying noise conditions.
The present invention, generally speaking, picks up a voice or other sound signal of interest and creates a higher voice-to-background-noise ratio in the output signal so that a user enjoys higher intelligibility of the voice signal. In particular, beamforming techniques are used to provide optimized signals to the user for further increasing the understanding of speech in noisy environments and for reducing user listening fatigue. In one embodiment, signal-to-noise performance is optimized even if some of the binaural cues are sacrificed. In this embodiment, an optimum mix ratio or weighting ratio is determined in accordance with the ratio of noise power in the binaural signals. Enhancement circuitry is easily implemented in either analog or digital form and is compatible with existing sound processing methods, e.g., noise reduction algorithms and compression/expansion processing. The sound enhancement approach is compatible with, and additive to, any microphone directionality or noise cancelling technology.
The present invention may be further understood from the following description in conjunction with the appended drawing. In the drawing:
Underlying the present invention is the recognition that, for any ratio of noise power in the binaural signals, for example, there is an optimum mix ratio or weighting ratio that optimizes the SNR of the output signal. For example, if the noise power is equal in each signal, such as in a crowded restaurant with people all around, moving chairs, clattering plates, etc., then the optimum weighting is 50%/50%. In other environments, the noise power in the two signals will be quite unequal, e.g., on the side of a road. If there is more noise in one signal by, for example 10 dB, the optimum mix is not 50/50, but moves toward including a greater amount of the quieter signal. In the case of a 10 dB noise differential, the optimum noise mix is 92% quieter signal and 8% noisier signal. Such a result is counterintuitive, where intuition would suggest simply using the quieter signal. Simply using the quieter signal would be optimal only if the noise and voice both had the same amount of correlation. However, in nearly all real-world situations, the voice signals are highly correlated, while the noise signals are not. This disparity biases the optimum point.
Referring now to
1−WR=WL
Corresponding control signals are applied to the respective attenuators to cause the input signals to be attenuated in proportion to the input signal's weighting ratio. For example, for a 60/40 weight, the left input signal is attenuated to 60% of its input value while the right input signal is attenuated to 40% of its input value. Attenuated versions of the input signals, attenuated by the optimum amount, are then applied to a summing block, which sums the attenuated signals to produce an output signal that is then applied to both ears.
Noise measurement may be performed as described in U.S. application Ser. No. 09/247,621 filed Feb. 10, 1999, incorporated herein by reference. Generally speaking, a noise measurement is obtained by squaring the instantaneous signal and filtering the result using a low-pass filter or valley detector (opposite of peak detector).
One suitable control function for the power ratio block is shown in
The resulting SNR improvement over classical 50/50 beamforming achieved using the foregoing control strategy is shown in
Assuming that the signal of interest to the listener is straight ahead, then the signal of interest will be equal in both ears. Signals from other directions, which because of head shadowing are not equal in both ears, may therefore be considered to be noise. If a signal is equal in both ears, then beamforming has no effect on it. Therefore, although noise power detectors may be used as shown in
As a further improvement, the foregoing approach to beamforming is not limited to simultaneous operation on the signals over their entire bandwidths. Rather, the same approach can be implemented on a frequency-band-by-frequency-band basis. Existing sound processing algorithms of the assignee divide the audio frequency bandwidth into multiple separate, narrower bands. By applying the current method separately to each band, the optimum SNR can be achieved on a band-by-band basis to further optimize the voice-to-noise ratio in the overall output.
Referring more particularly to
The multiband beamformer has the advantage of optimally reducing background noises from multiple sources with different spectral characteristics, for example a fan with mostly low-frequency rumble on one side and a squeaky toy on the other. As long as the interferers occupy different frequency bands, this multiband approach improves upon the single band method discussed above.
As a further enhancement, some binaural cues can be left in the final output by biasing the weightings slightly away from the optimum mix. For example, the right ear output signal might be weighted N % (say, 5–10%) away from the optimum toward the right ear signal, and the left ear output signal might be weighted N % away from the optimum toward the left ear signal. To take a concrete example, if the optimum mix was 60% left and 40% right, then the right ear would get 55% L+45% R and the left ear would get 65% L+35% R (with N=5%). This arrangement helps to make a more comfortable sound and “externalizes the image,” i.e., causes the user to perceive an external aural environment containing discernible sound sources. Furthermore, this arrangement entails some but very little compromise of SNR. Referring again to
More generally, N may be regarded as a “binaurality coefficient” that controls the amount of binaural information retained in the output. Such a binaurality coefficient may be used to control the beamformer smoothly between full binaural (N=100%; no beamforming) to full beamforming (N=0%; no binaural). This binaurality parameter can be tailored for the individual. As this parameter is varied, there is little loss of directionality until after the binaural cues are significantly restored, so the directionality and noise reduction benefits of the beamformer's signal processing can still be realized even with a usable level of binaural cue retention.
Furthermore, human binaural processing tends to be lost in proportion to hearing deficit. So those individuals most needing the benefits that can be provided by the beamforming algorithm tend to be those who have already lost the ability to beneficially utilize their natural binaural processing for extracting a voice from noise or babble. Thus, the algorithm can provide the greatest directionality benefit for those needing it the most, but can be adjusted, although with a loss of directionality, for those with better binaural processing who need it less.
The foregoing approach to beamforming is simple and therefore easy to implement. Whereas the adaptive method can take seconds to adapt, the present method can react nearly instantaneously to changes in noise or other varying environmental conditions such as the user's head position, since there is no adaptation requirement. The present method, thus, can remove impulse noise such as the sound of a fork dropped on a plate at a restaurant or the sound of a car door being closed. Furthermore, noise power detectors are already provided in some binaural hearing aid sets for use in noise-reduction algorithms. The simple addition of two multipliers (attenuators) and an additional processing step enables dramatically improved results to be achieved. An important observation is that the improvement in voice-to-background noise that the invention provides is in addition to that of the noise-reduction created by pre-existing noise-reduction algorithms—further improving the SNR.
Moreover, the foregoing methods all lend themselves to easy implementation in digital form, especially using a digital signal processor (DSP). In a DSP implementation, all of the blocks are realized in the form of DSP code. Most of the required software functions are simply multiplications (e.g., attenuators) or additions (summing blocks). To do frequency band implementations, FFT methods may be employed. Outputs from FFT processes are easily analyzed as power spectra for implementing the noise power detectors. One such implementation divides the sound spectrum into 64 FFT bins and processes all 64 bins simultaneously every 3.5 ms. Thus, the beamformer is able to adjust for various noise conditions in 64 separate frequency bands at approximately 300 times each second.
Referring to
The circuit 909 calculates attenuation ratios for the left and right ears by forming the sum S of the squares of the signals and by forming 1) the ratio L/S of the square of the left ear signal to the sum; and 2) the ratio R/S of the square of the right ear signal to the sum. The operations for forming these ratios are represented as an addition (931) and two divisions (933, 935). The resulting attenuation factors are coupled in cross-over fashion to the multipliers; that is, the signal L/S is used to control the multiplier for the right ear, and the signal R/S is used to control the multiplier for the left ear. Hence, as a noise source increases the signal level in one ear, the signal of the other ear is emphasized and the signal of the ear most influenced by the noise source is de-emphasized.
The circuitry may be simplified to conserve compute power by, instead of performing two divisions, performing a single division and a subtraction as illustrated in
An embodiment of a corresponding binaural DSP-based beamformer is shown in
The remainder of the arrangement of
To take a particular example of the operation of the arrangement of
Now assume a noisy situation in which the ratio L/S is 0.6. To obtain the signal at node Y, L/S is decreased by 10% to 0.54. At the same time, to obtain the signal at node Z, L/S is increased by 10% to 0.66. In the output processing stage, to form the left output signal, the left input signal is multiplied by a factor 1−0.54=0.46, and the right output signal is multiplied by 0.54. To form the right output signal, the left input signal is multiplied by a factor 1−0.66=0.44, and the right output signal is multiplied by 0.66. In both output signals, the right (quieter) input signal is weighted more heavily, but in the left output signal, the left input signal is weighted more heavily than it would otherwise be, and in the right output signal, the right input signal is weighted more heavily than it would otherwise be for optimum noise reduction.
In accordance with a further aspect of the invention, beamforming can be performed selectively within one or more frequency ranges. In particular, since most binaural directionality cues are carried by the lower frequencies (typically below 1000 Hz), an enhancement to the beamformer would be to pass the frequencies below, say, 1000 Hz directly to their respective ears, while beamforming only those frequency bins above that frequency in order to achieve better SNR in the higher frequency band where directionality cues are not needed.
In one implementation, the beamforming algorithm is simply applied only to the higher frequencies as stated.
In another implementation, a look-up table is provided having a series of “binaurality” coefficients, one for each frequency bin, to control the amount of binaural cues retained at each frequency. The use of such a “binaurality coefficient” to control the beamformer smoothly between full binaural (no beamforming) to full beamforming (no binaural) has been previously described. By extending this concept to provide for per-bin binaurality coefficients, the coefficients for each low frequency bin may be biased far toward, or even at, full binaural processing, while the coefficients for each high frequency bin may be biased toward, or completely at, full beamforming, thus achieving the desired action. Although the coefficients could abruptly change at some frequency, such as 1000 Hz, more preferably, the transition occurs gradually over, say, 800 Hz to 1200 Hz, where the coefficients “fade” smoothly from full binaural to full beamforming.
Note that other beamforming methods, although inferior to those disclosed, may also be used to enhance sound signals. In addition, a beamformer as described herein can be used in products other than hearing aids, i.e., anywhere that a more “focused” sound pickup is desired.
The foregoing beamforming methods demonstrate very high directionality, and enable the user of a binaural hearing aid product to be provided with a “super directionality” mode of operation for those noisy situations where conversation is otherwise extremely difficult. Second-order microphone technology may be used to further enhance directionality.
The described beamformer was modeled in the dSpace/MatLab environment, and the MLSSA method of directionality measurement was implemented in the same environment. The MLSSA method, which uses signal autocorrelation, is quite immune to ambient noises and gives very clean results. Only data for the usual 500, 1000, 2000 and 4000 Hz frequencies was recorded. Two BZ5 first-order directional microphones were placed in-situ on a KEMAR mannequin, and the 0× axis was taken to be a line straight in front of the mannequin as is standard practice. Measurements were taken at 3.75× increments between +30× and at 15× increments elsewhere. Care was taken to ensure that the system was working well above the noise floor and below saturation or clipping.
As compared to DI values for a single microphone, shown in
Directionality can be improved further still using second-order microphones. Since the second-order microphones have superior directionality, as compared to first-order designs, especially with respect to their front-to-back ratio, this property of the second-order microphone complements the beamformer's processing algorithm, which is limited to side-to-side enhancement. Thus, the combined result is a very narrow, forward-only beam pattern as shown in
Unlike prior art beamformers, the present beamforming technique is based upon Head Related Transfer Functions (HRTFs) documented in the paper by E.A.G. Shaw. HRTFs describe the effects of the head upon signal reception at the ears, and include what is called “head shadowing.” In particular, the present method uses the head shadowing effect to optimize SNR.
Furthermore, whereas prior art beamforming systems usually include delay or phase shift of signals in addition to amplitude-based operations, the foregoing embodiments of the present beamformer do not. Only amplitudes are adjusted or modified—thereby making the present beamformer simpler and less costly to implement.
In other embodiments, however, phase adjustment may be used to provide a more natural sound quality and in fact to further improve the directionality of the beamformer. Note that in the pattern of
As previously described, the basic beamformer algorithm has the attribute of matching (in amplitude) the contribution from each ear's signal to the output. Accordingly, an N×180 degree phase shift will create a deep null, i.e. nearly perfect cancellation, and an N×360 degree phase shift will create a +6-dB peak. This is one reason why the beamformer polar pattern shows such distinct peaks and nulls. If the amplitudes weren't well matched, the peaks and nulls would be much less distinct, although there would still be as many and at the same angle locations.
Due to the relatively large spacing between the two ear microphones (sensors), a large path length difference for the two signals exists. In turn, this creates a large phase shift for relatively small off-axis (azimuthal) angles, and thus, enough phase shift to reach 180, 360, 540, 720, etc. electrical degrees for arrival angles between 0 and 90 azimuthal degrees, especially at the higher frequencies. This is the second reason that the beamformer pattern shows numerous peaks and nulls. A closer spacing (a pin head, for example) would move the peaks and nulls azimuthally toward 90 degrees, so that fewer would show up. If the spacing were small enough, no peaks or nulls would show up at all, except at very high frequencies.
The most desirable response pattern in
Referring to
Since IAD already forms the basis of the beamformer as previously described, it is desirable to, for each frequency, obtain a phase correction factor in terms of IAD (measured in dB) to be applied to the signal at that frequency to bring that signal substantially into phase with the 1 kHz signal. These correction factors may be obtained in the manner shown in
Using the correction values of
Referring to
The expected results of phase correction are shown in
Although the present invention has been described primarily in a hearing health care context, the principles of the invention can be applied in any situation in which an obstacle to energy propagation is present between sensors or is provided to create a shadowing effect like the head shadowing effect in hearing health care applications. The energy may be acoustic, electromagnetic, or even optical. The invention should therefore be understood to be applicable to sonar applications, medical imaging applications, etc.
It will be appreciated by those of ordinary skill in the art that the invention can be embodied in other specific forms without departing from the spirit or essential character thereof. The presently disclosed embodiments are therefore considered in all respects to be illustrative and not restrictive. The scope of the invention is indicated by the appended claims rather than the foregoing description, and all changes which come within the meaning and range of equivalents thereof are intended to be embraced therein.
Number | Name | Date | Kind |
---|---|---|---|
3992584 | Dugan | Nov 1976 | A |
4956867 | Zurek et al. | Sep 1990 | A |
5228093 | Agnello | Jul 1993 | A |
5414776 | Sims, Jr. | May 1995 | A |
5764778 | Zurek | Jun 1998 | A |
6240192 | Brennan et al. | May 2001 | B1 |
6697494 | Klootsema et al. | Feb 2004 | B1 |