The present invention relates generally to hearing aids, and particularly to devices and methods for improving directional hearing.
Speech understanding in noisy environments is a significant problem for the hearing-impaired. Hearing impairment is usually accompanied by a reduced time resolution of the sensorial system in addition to a gain loss. These characteristics further reduce the ability of the hearing-impaired to filter the target source from the background noise and particularly to understand speech in noisy environments.
Some newer hearing aids offer a directional hearing mode to improve speech intelligibility in noisy environments. This mode makes use of an array of microphones and applies beamforming technology to combine multiple microphone inputs into a single, directional audio output channel. The output channel has spatial characteristics that increase the contribution of acoustic waves arriving from the target direction relative to those of the acoustic waves from other directions.
For example, PCT International Publication WO 2017/158507, whose disclosure is incorporated herein by reference, describes hearing aid apparatus, including a case, which is configured to be physically fixed to a mobile telephone. An array of microphones are spaced apart within the case and are configured to produce electrical signals in response to acoustical inputs to the microphones. An interface is fixed within the case, along with processing circuitry, which is coupled to receive and process the electrical signals from the microphones so as to generate a combined signal for output via the interface.
As another example, PCT International Publication WO 2021/074818, whose disclosure is incorporated herein by reference, describes apparatus for hearing assistance, which includes a spectacle frame, including a front piece and temples, with one or more microphones mounted at respective first locations on front piece and the configured to output electrical signals in response to first acoustic waves that are incident on the microphones. A speaker mounted at a second location on one of the temples outputs second acoustic waves. Processing circuitry generates a drive signal for the speaker by processing the electrical signals output by the microphones so as to cause the speaker to reproduce selected sounds occurring in the first acoustic waves with a delay that is equal within 20% to a transit time of the first acoustic waves from the first location to the second location, thereby engendering constructive interference between the first and second acoustic waves.
Embodiments of the present invention that are described hereinbelow provide improved devices and methods for hearing assistance.
The present invention will be more fully understood from the following detailed description of the embodiments thereof, taken together with the drawings in which:
Despite the need for directional hearing assistance and the theoretical benefits of microphone arrays in this regard, in practice the directional performance of hearing aids falls far short of that achieved by natural hearing. In general, good directional hearing assistance requires a relatively large number of microphones, spaced well apart, in a design that is unobtrusive while enabling the user to aim the directional response of the hearing aid easily toward a point of interest, such as toward a conversation partner in noisy environment. Processing circuitry applies a beamforming filter to the signals output by the microphones in response to incident acoustic waves to generate an audio output that emphasizes sounds that impinge on the microphone array within an angular range around the direction of interest while suppressing background noise. The audio output should reproduce the natural hearing experience as nearly as possible while minimizing bothersome artifacts.
One of these artifacts is the user's own voice. The proximity of the user's mouth to the microphones causes the processing circuitry to capture and amplify the user's voice together with the sounds received from the direction of interest in the environment. Users find the amplified sound of their own voice to be unnatural and disturbing. In some cases, amplification of the user's voice may mask sounds from the environment that the user wishes to hear.
Embodiments of the present invention that are described herein address this problem using a novel beamforming filter, which emphasizes sounds that impinge on an array of microphones within a selected angular range while suppressing sounds that are spoken by the user. In the disclosed embodiments, an array of microphones, mounted in proximity to the head of a user, outputs electrical signals in response to incoming acoustic waves that are in incident on the microphones. A speaker, mounted:
In some embodiments, the processing circuitry detects that the user is speaking and applies the beamforming filter to suppress the sounds spoken by the user in response to such detection. For this purpose, for example, the processing circuitry may apply two different beamforming filters: one that suppresses the sounds spoken by the user and another that does not. Both filters emphasize sounds within the selected angular range, but the filter that does not suppress sounds spoken by the user typically gives superior directivity in the beamforming pattern, as well as reduced white noise gain relative to the filter that suppresses the user's own voice. Therefore, the filter that does not suppress sounds spoken by the user is preferred for use as long as the user is not actually speaking. The processor applies the beamforming filter that suppresses the user's own voice upon detecting that the user is speaking. The processing circuitry may detect which of the beamforming filters should be used at any given time by computing and comparing the relative power levels output by the beamforming filters.
In some embodiments, the microphones and speaker are mounted on a frame that is mounted on the user's head. In the embodiments that are described below, the microphones and speakers are mounted on a spectacle frame. To mitigate the effect of the user's own voice, the processing circuitry applies a beamforming filter that suppresses sounds that originate from a location below the spectacle frame. Alternatively, the microphones and speaker can be mounted on other sorts of frames, such as a Virtual Reality (VR) or Augmented Reality (AR) headset, or in other sorts of mounting arrangements.
Processing circuitry 26 is fixed within or otherwise connected to spectacle frame e 22 and is coupled by electrical wiring 27, such as traces on a flexible printed circuit, to receive the electrical signals output from microphones 23, 24. Although processing circuitry 26 is shown in
In addition, processing circuitry 26 detects when the user is speaking and applies a beamforming function that suppresses sounds spoken by the user, while emphasizing the sounds that impinge on the array of microphones within a selected angular range. These signal processing functions of circuitry 26 are in described greater detail hereinbelow. Alternatively, processing circuitry 26 may apply a beamforming filter that suppresses sounds spoken by the user at all times, regardless of whether or not the user is speaking.
Processing circuitry 26 may convey the audio output to the user's ear via any suitable sort of interface and speaker. In the pictured embodiment, the audio output is created by a drive signal for driving one or more audio speakers 28, which are mounted on temples 32, typically in proximity to the user's ears. Although only a single speaker 28 is shown on each temple 32 in
In the present embodiment, microphones 23, 24 comprise integral analog/digital converters, which output digital audio signals to processing circuitry 26. Alternatively, processing circuitry 26 may comprise an analog/digital converter for converting analog outputs of the microphones to digital form. Processing circuitry 26 typically comprises suitable programmable logic components 40, such as a digital signal processor (DSP) or a gate array, which implement the necessary filtering and mixing functions to generate and output a drive signal for speaker 28 in digital form.
These filtering and mixing functions typically include application of two beamforming filters 42, 43 with coefficients chosen to create the desired directional responses. Specifically, the coefficients of filter 42 are calculated to emphasize sounds that impinge on frame 22 (and hence on microphones 23, 24) within a selected angular range; while the coefficients of filter 43 are calculated to emphasize sounds that impinge on frame 22 within this selected angular range while suppressing sounds spoken by the user of device 20, which originate from a location below frame 22. Details of filters that may be used for these purposes are described further hereinbelow.
Alternatively or additionally, processing circuitry 26 may comprise a neural network (not shown), which is trained to determine and apply the coefficients to be used in filters 42 and 43. Further alternatively or additionally, processing circuitry 26 comprises a microprocessor, which is programmed in software or firmware to carry out at least some of the functions that are described herein.
Processing circuitry 26 may apply any suitable beamforming functions that are known in the art, in either the time domain or the frequency domain, in implementing filters 42, 43. Beamforming algorithms that may be used in this context are described, for example, in the above-mentioned PCT International Publication WO 2017/158507 (particularly pages 10-11) and in U.S. Pat. No. 10,567,888 (particularly in col. 9).
In one embodiment, processing circuitry 26 applies a Minimum Variance Distortionless Response (MVDR) beamforming algorithm in deriving the coefficients of beamforming filters 42 and 43. This sort of algorithm is advantageous in achieving fine spatial resolution and discriminating between originating from the sounds direction of interest and sounds originating from the user's own speech. The MVDR algorithm maximizes the signal-to-noise ratio (SNR) of the audio output by minimizing the average energy (while keeping the target distortion small). The algorithm can be implemented in frequency space by calculating a vector of complex weights F(ω) for the output signal from each microphone at each frequency as expressed by the following formula:
In this formula, W(ω) is the propagation delay vector between microphones 23, representing the desired response of the beamforming filter as a function of angle and frequency; and Szz(ω) is the cross-spectral density matrix, representing a covariance of the acoustic signals in the time-frequency domain. To compute the coefficients of filter 42, Szz(ω) is measured or calculated for isotropic far-field noise. To compute the coefficients of filter 43, the cross-spectral density matrix for the user's own voice is added to the far-field noise.
In an alternative embodiment, processing circuitry 26 applies a Linearly Constrained Minimum Variance (LCMV) algorithm in deriving the coefficients of beamforming filters 42 and 43. LCMV beamforming causes the filters to pass signals from a desired direction with a specified gain and phase delay, while minimizing power from interfering signals and noise from all other directions. An additional constraint is imposed in computing filter 43 to specifically nullify output power coming from the direction of the user's mouth.
In some embodiments, processing circuitry 26 comprises selection logic 44, which selects the beamforming filter to that is to be applied at any given time. Selection logic 44 chooses the beamforming filter based on whether or not the user is speaking. The selection decision may be based on the outputs of beamforming filters 42 and 43 themselves. For this purpose, processing circuitry 26 applies both of beamforming filters 42 and 43 to the electrical signals output by microphones 23, 24 and computes respective power levels P1 and P2 output beamforming filters 42 and 43, respectively. Selection logic 44 compares the power levels, for example by application of a threshold to the ratio of the power levels. As long as P1/P2 is less than the threshold, selection logic 44 chooses filter 42. When P1/P2 exceeds the threshold, selection logic 44 chooses filter 43. The threshold may be preset, or it may alternatively be adjusted adaptively and/or according to user preferences.
An audio output circuit 46, for example comprising a suitable codec and digital/analog converter, converts the digital drive signal output from filters 42 and 43 to analog form. An analog filter 48 performs further filtering and analog amplification functions so as to optimize the analog drive signal to speaker 28.
A control circuit 50, such as an embedded microcontroller, controls the programmable functions and parameters of processing circuitry 26, possibly including selection logic 44. A communication interface 52, for example a Bluetooth® or other wireless interface, enables the user and/or an audiology professional to set and adjust these parameters as desired. A power circuit 54, such as a battery inserted into temple 32, provides electrical power to the other components of the processing circuitry.
In preparation for real-time application of the method, beamforming filters 42 and 43 are computed and loaded into device 20, at a filter computation step 60. The filters may be predefined and loaded at the time of manufacture of device 20, based on standardized spatial response parameters. Alternatively or additionally, the filter coefficients may be optimized for the user, taking into account parameters such as the user's physiology, hearing deficiency, and subjective preferences.
Initially, when device 20 is turned on, selection logic 44 chooses filter 42, so that acoustic signals from microphones 23, 24 are processed without own-voice suppression, at a standard beamforming step 62. processing circuitry 26 Periodically, applies both beamforming filters 42 and 43 to the electrical signals output by microphones 23, 24 and compares the respective power levels P1 and P2 that are output by beamforming filters 42 and 43, at a power comparison step 64. As long as the ratio P1/P2 remains less than the applicable threshold, selection logic 44 continues to apply filter 42.
Detection that P1/P2 has exceeded the threshold at step 64 is taken as an indication that the user of device 20 has begun to speak. In this case, selection logic 44 switches to filter 43, at an own-voice suppression step 66. Optionally, to avoid rapid toggling between filters 42 and 43, which may be disturbing to the user, selection logic 44 applies a time lag in applying and removing the beamforming filter after detecting that the user has begun or ceased to speak. For this purpose, for example, the selection logic may apply a smoothing filter to the power values:
In this expression, Pnt is the measured value of P1 or P2 at time t, and the filter coefficient α has a value between zero and one. Selection logic 44 uses the ratio
Additionally or alternatively, upon detecting that the user has begun to speak (or has subsequently ceased to speak), processing circuitry 26 may blend beamforming filters 42 and 43 during a transition period to avoid sharp changes in the acoustic output to speaker 28. For this purpose, the processing circuitry blends the filter coefficients using a blend factor β, which varies gradually over time between zero and one. During the transition period, the vector of filter coefficients F applied by processing circuitry 26 is a linear combination of the respective coefficient vectors F1 and F2 of filters 42 and 43: F=(1−β)F1+BF2. The value of β may be computed, for example, using the following algorithm in a series of incremental steps over time, using small values γ and ε between zero and one:
Returning to
The features of any of examples 2-10, 12 and 13 may similarly be applied to example 14.
The embodiments described above are cited by way of example, and the present invention is not limited to what has been particularly shown and described hereinabove.
Rather, the scope of the present invention includes both combinations and subcombinations of the various features described hereinabove, as well as variations and modifications thereof which would occur to persons skilled in the art upon reading the foregoing description and which are not disclosed in the prior art.