This application is a National Stage Application of PCT/AU2009/001566, filed 1 Dec. 2009, which claims benefit of Serial No. 2008905703, filed 5 Nov. 2008 in Australia and which applications are incorporated herein by reference. To the extent appropriate, a claim of priority is made to each of the above disclosed applications.
The present invention relates to processing of sound signals and more particularly to bilateral beamformer strategies suitable for binaural assistive listening devices such as hearing aids, earmuffs and cochlear implants.
When at least one microphone signal is available from each side of the head it is possible to optimally combine the microphone outputs to produce a super-directional response. Most well known binaural directional processors achieving a directional response are based on broadside array configurations, adaptive Least Minimum Square (LMS) or more sophisticated Blind Source Separation (BSS) strategies.
Broadside array configurations produce efficient directional responses when the wavelength of the sound sources is relatively larger than the spacing between microphones. As a result broadside array techniques are only effective for the low-frequency component of sounds when used in binaural array configurations.
Unlike broadside array designs Least Minimum Square (LMS) systems efficiently produce directionality independently of frequency or spacing between microphones. In such systems Voice Active Detectors (VAD) are needed to capture a desired signal during times where the ratio between signal level and noise level is relatively large. This captured desired signal, typically referred to as the estimated desired signal is compared to filtered outputs from the microphones, thus producing an estimated error signal. The objective of the LMS is to minimize the square of the estimated error signal by iteratively improving the filter weights applied to the microphone output signals. However, the estimated desired signal may not entirely reflect the real desired signal, and therefore the adaptation of the filter weights may not always minimize the true error of the system. The optimization largely depends on the efficiency of the VAD employed. Unfortunately, most VADs work well in relatively high signal-to-noise ratio environments but their performance significantly degrades as the signal-to-noise ratio decreases.
Blind Source Separation (BSS) schemes operate by efficiently computing a set of phase cancelling filters producing directional responses in all spatial locations where sound sources are present. As a result, the system produces as many outputs as there are sound sources present without specifically targeting a desired sound source. BSS schemes also require post-filtering algorithms in order to select an output with a desired target signal. The problems with BSS approaches are; the excessive computational overload required for efficiently computing phase cancelling filters, dependence of the filters on reverberation and on small movements of the source or listener, and the identification of the one output related to the target signal, which in most cases is unknown and the prior identification of the number of sound sources present in the environment to guarantee separation between sound sources.
There remains a need to provide improved or alternative methods and systems for producing directional output signals.
An alternative approach to binaural beamformer designs is to exploit the natural spatial acoustics of the head to directly use interaural time and level differences to produce directional responses. The interaural time difference, arising from the spacing between microphones on each side of the head (ranging from 18 to 28 cm), can be used to cancel relatively low frequency sounds, depending on the direction of arrival, as in a broadside array configuration. On the other hand, the head shadowing provides a natural level suppression of contralateral sounds (i.e. sounds presented from each side of the head), often leading to a much greater signal-to-noise ratio (SNR) in one ear than in the other. As a result the interaural level difference (ranging from 0 to 18 dB), can be used to cancel high frequency sounds depending on their direction of arrival in a weighted sum configuration. This low and high pass binaural beamformer topology is superior to conventional broadside array alone and LMS systems relying on VADs, and it is less computationally demanding than most BSS techniques. In addition, due to the novel design, the binaural beamformer operates in complex listening environments, e.g. low signal-to-noise ratios, and it provides rejection to such complex unwanted sounds as wind noise.
In a first aspect the present invention provides a method of producing a directional output signal including the steps of: detecting sounds at the left and rights sides of a person's head to produce left and right signals; determining the similarity of the signals; modifying the signals based on their similarity; and combining the modified left and right signals to produce an output signal.
The signals may be modified by attenuation and/or by time-shifting.
The attenuation and/or time-shifting may be frequency specific.
The attenuation and/or time-shifting may be carried out by way of a filter block and filter weights for the filter block are based on the similarity of the signals.
The step of determining the similarity of the signals may include the step of comparing their cross-power and auto-power, or comparing their cross-correlation and auto-correlation.
The step of comparing may include the steps of adding the cross-power to the auto-power and dividing the cross-power by the result.
The step of comparing may include the steps of adding the cross-correlation to the auto-correlation and dividing the cross-correlation by the result.
The method may further include the step of processing the right or left signals prior to determining their similarity to thereby control the direction of the directional output signal.
The step of processing may include the step of applying a head-related transfer function or an inverse head-related transfer function.
The step of detecting sounds at the left and right sides of the head may be carried out using directional microphones, or directional microphone arrays.
The direction of the left and right directional microphones or microphone arrays may be directed outwardly from the lateral plane of the head.
The degree of modification that takes place during the step of modifying may be smoothed over time.
The step of modifying may further include the step of further enhancing the similarities between the signals.
In a second aspect the present invention provides a system for producing a directional output signal including: detection devices for detecting sounds at the left and right sides of a person's head to produce left and right signals; a determination device determining the similarity of the signals; a modifying device for modifying the signals based on their similarity; and a combining device for combining the modified left and right signals to produce an output signal.
Each detection device may include at least one microphone.
The determination device may include a computing device.
The modifying device may include a filter block.
The combining device may include a summing block.
The system may further include a processing device for processing the left or right signals and wherein the processing device is arranged to apply one or more head-related transfer functions or inverse head-related transfer functions.
The present invention exploits the interaural time and level difference of spatially separated sound sources. The system operates in the low frequencies as an optimal broadside beamformer, a technique well known to those skilled in the art. In the high frequencies the system operates as an optimal weighted sum configuration where the weights are selected based on the relative placement of sounds around the head. In embodiments of the invention the optimum filter weights are computed by examining the ratio of the cross-correlation of microphone output signals from opposite sides of the head to the auto-correlation of microphone output signals from the same side of the head. Thus, at any frequency, when the cross-correlation is equal to the auto-correlation outputs it is highly likely that sound sources are equally present at both sides of the head, hence located near or close to the medial plane relative to the listeners head. On the other hand, when any of the auto-correlations is higher than the cross-correlation outputs it is highly likely that sound sources are located at the one side of the head. That is, laterally placed relative to the listeners head. The invention relates to a novel and efficient method of combining these correlation functions to estimate directional filter weights.
The circuit according to the invention is used in an acoustic system with at least one microphone located at each side of the head producing microphone output signals, a signal processing path to produce an output signal, and optional means to present this output signal to the auditory system. Preferably, the signal processing path includes a multichannel processing block to efficiently compute the optimum filter weights at different frequency bands, a summing block to combine the left and right microphone filtered outputs, and a post filtering block to produce an output signal.
The present invention finds application in methods and system for enhancing the intelligibility of sounds such as those described in International Patent Application No PCT/AU2007/000764 (WO2007/137364), the contents of which are herein incorporated by reference.
An embodiment of the present invention will now be described, by way of example only, with reference to the accompanying drawings, in which:
The preferred embodiment of the invention is discussed below with reference to all figures. However, those skilled in the art will appreciate that the detailed description given herein with respect to all figures is for explanatory purpose as the invention extends beyond the limited disclosed embodiment.
The binaural beamformer is intended to operate in complex acoustic environments. Referring to
The microphone outputs xl, xr are transformed into the frequency domain using Fast Fourier Transform (FFT) analysis 103, 104. Then these signals XL,XR are processed through processing devices in the form of steering vector blocks 105, 106 to produce steered signals {circumflex over (X)}L,{circumflex over (X)}R as denoted in Eq. 1. Steering vector blocks include the inverse of Head-related transfer Functions (HRTF) denoted as HdL−1,HdR−1 corresponding to either synthesized or pre-recorded impulse response measures from an equivalent desired point source location to the microphone input ports preferably located around the head, as further denoted in
{circumflex over (X)}L(k)=XL(k)·HdL−1(k) Eq. 1
{circumflex over (X)}R(k)=XR(k)·HdR−1(k) Eq. 2
The steered signals {circumflex over (X)}L,{circumflex over (X)}R are combined 107, 108 to compute the optimum set of directional filter weights WL,WR. The computation of the filter weights requires estimates of cross-power Eq. 3 and auto-power Eq. 4-5 over time, where the accumulation operation is denoted by E{ }. It should be obvious to those skilled in the art that the ratios of accumulated spectra power estimates is equivalent to the ratio of time-correlation estimates, thus the alternative operations lead to the same outcome.
where the accumulation is performed over N frames, and * denotes complex conjugate.
The directional filter weights are produced by calculating the ratio between the cross-over power and the auto-power estimates on each side of the head as given by Eq. 6 and Eq. 7
where the power g is a numerical value typically set to 1, but it can be any value greater or less than one.
Those skilled in the art will realise that the value of {circumflex over (X)}L relative to {circumflex over (X)}R and hence the values of WL(k) and WR(k) will be unchanged if processing block 105 consists of response HdL instead of HdR−1, and processing block 106 consists of response HdR instead of HdL−1.
A post-filtering stage (not shown) may be provided whereby the filter weights WL,WR are enhanced according to Eq. 8 to Eq. 10
where η is a numerical value typically ranging from 1 to 100, q is a numerical value typically ranging from 1 to 10, and κ is a numerical value typically set to 2.0.
The optimum directional filter weights WLNew,WRNew are transformed back to the time domain wL,wR using Inverse Fast Fourier Transform blocks (IFFT) analysis 109, 110. Preferably, the FFT transform includes zero padding and cosine time windowing, and the IFFT operation further includes an overlap and adds operation. It should be obvious to those skilled in the art that the FFT and IFFT are just one of many different techniques that may be used to perform multi-channel analyses.
The computed filter weights wL,wR can be updated 111, 112 by smoothing functions as given in Eq. 11 and Eq. 12. In the preferred embodiment the smoothing coefficient α is selected as an exponential averaging factor. Optionally, the smoothing coefficient α may be dynamically selected based on a cost function criterion derived from an estimated SNR or a statistical measure.
wL(n)=α·wLold(n)+(1−α)·wLnew(n) Eq. 11
wR(n)=α·wRold(n)+(1−α)·wRnew(n) Eq. 12
The directional filters are applied 111, 112 directly to the microphone outputs as given in Eq. 13 and Eq. 14. Optionally the direction filters may be applied to delayed microphone output signals. Optionally the delay blocks 113, 114 may use zero delay. Optionally 113 and 114 may used the same delay greater than zero. Optionally 113 and 114 may have different delays to account for asymmetrical placements of microphones on each side of the head. Optionally the directional filters may be applied to directional microphone output signals from directional microphone arrays operating at each side of the head. Optionally the directional filters may be applied to delayed directional microphone output signals from directional microphone arrays operating at each side of the head.
yL(n)=xL(n−pL)wL(n) Eq 13
yR(n)=xR(n−pR)wR(n) Eq. 14
where pL and pR are introduced delays, typically set to 0.
The filtered outputs are combined 115 to produce a binaural directional response as given in Eq. 15.
z(n)=yR(n)+yL(n) Eq. 15
Now referring to
Referring to
Now referring to
Now referring to
Now referring to
Now referring to
As explained above, embodiments of the invention produce a single channel output signal that is focused in a desired direction. This single channel signal includes sounds detected at both the left and right microphones. At the time of reproducing the signal for presentation to the auditory system of a user, the directional signal is used to prepare left and right channels, with localisation cues being inserted according to head-related transfer functions to enable a user to perceive an apparent direction of the sound.
Since numerous modification and changes will readily occur to those skilled in the art, it is not desired to limit the invention as illustrated and described. Hence, suitable modifications and equivalents may be resorted to as falling within the scope of the invention.
Any reference to prior art contained herein is not to be taken as an admission that the information is common general knowledge, unless otherwise indicated.
Finally, it is to be appreciated that various alterations or additions may be made to the parts previously described without departing from the spirit or ambit of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
2008905703 | Nov 2008 | AU | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/AU2009/001566 | 12/1/2009 | WO | 00 | 8/2/2011 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2010/051606 | 5/14/2010 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
4024344 | Dolby et al. | May 1977 | A |
5434924 | Jampolsky | Jul 1995 | A |
6222927 | Feng et al. | Apr 2001 | B1 |
20040057591 | Beck et al. | Mar 2004 | A1 |
20050069162 | Haykin et al. | Mar 2005 | A1 |
20050271215 | Kulkarni | Dec 2005 | A1 |
Number | Date | Country |
---|---|---|
2002 0078100 | Mar 2002 | JP |
WO 2007028250 | Mar 2007 | WO |
WO 2007137364 | Dec 2007 | WO |
Entry |
---|
Supplementary European Search Report for EP 09 82 4292, dated Dec. 3, 2012. (2 pages). |
Number | Date | Country | |
---|---|---|---|
20110293108 A1 | Dec 2011 | US |