The present application relates to hearing devices, e.g. hearing aids, in particular to the processing of an electric signal representing sound according to a user's needs. A main task of a hearing aid is to increase a hearing impaired user's intelligibility of speech content in a sound field surrounding the user in a given situation. This goal is pursued by applying a number of processing algorithms to one or more electric input signals (e.g. delivered by one or more microphones). Examples of such processing algorithms are algorithms for compressive amplification, noise reduction (including spatial filtering (beamforming)), feedback reduction, de-reverberation, etc. Embodiments of the present disclosure are relevant for normally hearing persons, e.g. for augmenting hearing in difficult listening situations.
In an aspect, the present disclosure deals with optimization of processing of electric input signal(s) from one or more sensors (e.g. sound input transducers, e.g. microphones, and optionally, additionally other types of sensors) with respect to a user's intelligibility of speech content, when the electric input signal(s) have been subject to such processing (e.g. after application of one or more specific processing algorithms to the electric input signal(s)). The optimization with respect to speech intelligibility considers a) the user's hearing ability (e.g. impairment) in interplay with b) the specific processing algorithms, e.g. noise reduction, including beamforming, to which the electric input signal(s) are subject before being presented to the user, and c) an acceptable goal for the user's speech intelligibility (SI, e.g. an SI-measure, e.g. reflecting an estimate of a percentage of words being understood).
The ‘electric input signals from one or more sensors’ may in general originate from identical types of sensors (e.g. sound sensors), or from a combination of different types of sensors, e.g. sound sensors, image sensors, etc. Typically, the ‘one more sensors’ comprise at least one sound sensor, e.g. a sound input transducer, e.g. a microphone.
A Hearing Device, e.g. a Hearing Aid:
In an aspect the present application provides a hearing device, e.g. a hearing aid, adapted for being worn by a user and for receiving sound from the environment of the user and to improve (or process the sound with a view to or in dependence of) the user's intelligibility of speech in said sound, an estimate of the user's intelligibility of speech in said sound being defined by a speech intelligibility measure I of said sound at a current point in time t. The hearing device comprises a) an input unit for providing a number of electric input signals y, each representing said sound in the environment of the user; and b) a signal processor for processing said number of electric input signals y according to a configurable parameter setting Θ of one or more processing algorithms, which when applied to said number of electric input signals y provides a processed signal yp(Θ) in dependence thereof, the signal processor being configured to provide a resulting signal yres. The hearing device may further comprise, c) a controller configured to control the processor to provide said resulting signal yres at a current point in time t in dependence of (at least one of)
Thereby an improved hearing device may be provided.
In case—at a given point in time t—a current value I(y) of the speech intelligibility measure I for at least one of the (unprocessed) electric input signals y is larger than the desired value Ides of the speech intelligibility measure, one or more actions may be taken (e.g. controlled by the controller). An action may e.g. be to skip (bypass) the processing algorithm(s) in question and provide the resulting signal yres(t) as the at least one electric input signals y(t) exhibiting I(y(t))>Ides.
The term ‘characteristics extracted from said electric input signal(s)’ is in the present context taken to include one or more parameters extracted from the electric input signal(s), e.g. a noise covariance matrix Cv and/or a covariance matrix CY of noisy signals y, parameter(s) related to modulation, e.g. a modulation index, etc. The noise covariance matrix Cv may be predetermined in advance of use of the hearing device, or determined during use (e.g. adaptively updated). The speech intelligibility measure may be based on a predefined relationship of function, e.g. be a function of a signal to noise ratio of the input signal(s).
The controller may be configured to control the processor to provide that the resulting signal yres at a current point in time t is equal to a selectable signal ysel, in case the current values I(y) and I(yp(Θ1)) of the speech intelligibility measure I for the number of electric input signals y and the first processed signal yp(Θ1), respectively, are both smaller than said desired value Ides.
In an embodiment, the controller is configured to control the processor to provide that the resulting signal yres at a current point in time t is equal to said first processed signal yp(Θ1) based on said first parameter setting Θ1, in case the current value I(yp(Θ1)) of the speech intelligibility measure I for the first processed signal yp(Θ1) is smaller than or equal to the desired value Ides of the speech intelligibility measure. In other words, the selectable signal ysel is equal to the first processed signal yp(Θ1) (e.g. providing a maximum (but not optimal) SNR of the estimated target signal). In an embodiment, the selectable signal ysel is equal to one of the electric input signals y, e.g. an attenuated version, e.g. comprising an indication that the input signal is presently below normal standard. In an embodiment, the signal is chosen in dependence of a first threshold value Ith of the speech intelligibility measure I, where Ith is smaller than Ides. In an embodiment, ysel=yp(Θ1) when Ith<I(yp(Θ1)<Ides. In an embodiment, the selectable signal ysel is equal to or contains an information signal yinf indicating that the current input signal(s) is(are) too noisy to provide an acceptable speech intelligibility of the target signal. In an embodiment, ysel=yinf (or ysel=yinf+yp(Θ1)*G, where G is a gain factor, e.g. 0≤G≤1, or G<1), when I(yp(Θ1)<Ith.
The controller may be configured to control the processor to provide that the resulting signal yres at a current point in time t is equal to the second, optimized, processed signal yp(Θ′) exhibiting the desired value Ides of the speech intelligibility measure, in case the current value I(yp(Θ1)) of the speech intelligibility measure I for the first processed signal yp(Θ1) is larger than the desired value Ides of the speech intelligibility measure. In this case the processing parameter setting is modified (from Θ1 to Θ′) to provide a reduced speech intelligibility measure (Ides) compared to the speech intelligibility measure I(yp(Θ1)) of the first parameter setting (Θ1).
In an embodiment, the controller is configured to provide that the resulting signal yres is equal to the second processed signal yp(Θ′) in case A) I(y) is smaller than the desired value Ides, and B) I(yp(Θ1)) is larger than the desired value Ides of the speech intelligibility measure I. In an embodiment, the controller is configured to determine the second parameter setting Θ′ under the constraint that the second processed signal yp(Θ′) exhibits the desired value Ides, of the speech intelligibility measure.
In an embodiment, the first parameter setting Θ1 is a default setting. The first parameter setting Θ1 may be a setting that maximizes a signal to noise ratio (SNR) or the speech intelligibility measure I of the first processed signal yp(Θ1). In an embodiment, the second (optimized) parameter setting Θ′ is used by the one or more processing algorithms to process the number of electric input signal(s), and to provide a second (optimized) processed signal yp(Θ′) (yielding the desired level of speech intelligibility to the user, as reflected in the desired value Ides of the speech intelligibility measure). The SNR may preferably be determined in a time-frequency framework, e.g. per TF-unit, cf. e.g.
The one or more processing algorithms may comprise a single channel noise reduction algorithm. The single channel noise reduction algorithm may be configured to receive a single electric signal (e.g. a signal from a (possibly omni-directional) microphone, or a spatially filtered signal (e.g. from a beamformer filtering unit)).
The input unit may be configured to provide a multitude of electric input signals yi, i=1, . . . , M, each representing said sound in the environment of the user, and where the one or more processing algorithms comprises a beamformer algorithm for receiving said multitude of electric input signals, or processed versions thereof, and providing a spatially filtered, beamformed, signal, the beamformer algorithm being controlled by beamformer settings, and where said first parameter setting Θ1 of said one or more processing algorithms comprise a first beamformer setting, and where said second parameter setting Θ′ of said one or more processing algorithms comprises a second beamformer setting.
The first beamformer settings are e.g. determined based on the multitude of electric input signals and one or more control signals, e.g. from one or more sensors (e.g. including a voice activity detector), without specifically considering a value of the speech intelligibility measure of the current beamformed signal. The first parameter setting Θ1 may constitute or comprise a beamformer setting that maximizes a (target) signal to noise ratio (SNR) of the (first) beamformed signal.
In an embodiment, the hearing device comprises a memory, wherein the desired value Ides of said speech intelligibility measure is stored. In an embodiment, the desired value Ides of said speech intelligibility measure is an average value (e.g. averaged over a large number of persons (e.g. >10)), e.g. empirically determined, or an estimated value. The desired speech intelligibility value Ides may be specifically determined or selected for the user of the hearing device. The desired value Ides of the speech intelligibility measure may be a user specific value, e.g. predetermined, e.g. measured or estimated in advance of the use of the hearing device. In an embodiment, the hearing device comprises a memory, wherein a desired speech intelligibility value (e.g. a percentage of intelligible words, e.g. 95%) Ides for the user is stored.
In an embodiment, the controller is configured to aim at determining the second optimized parameter setting Θ′ to provide said desired speech intelligibility value Ides of said speech intelligibility measure for the user. The term ‘aim at’ is intended to indicate that such desired speech intelligibility value Ides may not always be achievable (e.g. due to one or more of poor listening conditions (e.g. low SNR), insufficient available gain in the hearing device, feedback howl, etc.
The input unit may be configured to provide the number of electric input signals in a time-frequency representation Yr(k′,m), r=1, . . . , M, where M is the number of electric input signals, k′ is frequency index, and m is a time index. In an embodiment, the input unit comprises a number of input transducers, e.g. microphones, each providing one of the electric input signals yr(n), where n represents time. In an embodiment, the input unit comprises a number of time to time-frequency conversion units, e.g. analysis filter banks, e.g. short-time Fourier transform (STFT) units, for converting a time-domain electric input signal yr(n) to a time-frequency domain (sub-band) electric input signal yr(k′,m). In an embodiment, the number of electric input signals is one. In an embodiment, the number of electric input signals is larger than or equal to two, e.g. larger than or equal to three or four.
The hearing device, e.g. the controller, may be configured to receive further electric input signals from a number of sensors, and to influence the control of the processor in dependence thereof. In an embodiment, the number of sensors comprises one or more of an external sound sensor, an image sensor, e.g. a camera (e.g. directed to the face (mouth) of a current target speaker, e.g. for providing alternative (SNR-independent) information about the target signal, e.g. for voice activity detection), a brain wave sensor (e.g. for identifying a sound source of current interest to the user), a movement sensor (e.g. a head tracker for providing head orientation for indication of direction of arrival (DoA) of a target signal), an EOG-sensor (e.g. for identifying DoA of a target signal, or indicating most probable DoAs). In an embodiment, the controller is configured give a higher weight to inputs from sensors, e.g. image sensors, the smaller the current apparent SNR or estimate of speech intelligibility is. Lip reading (e.g. based on an image sensor) may e.g. be increasingly relied on in difficult acoustic situations.
The controller is configured to provide that the speech intelligibility measure Ayres) of the resulting signal yres is smaller than or equal to the desired value Ides, unless a value of the speech intelligibility measure I(y) of one or more of the number of electric input signal(s) is larger than the desired value Ides. In the latter case, the controller is configured to maintain such speech intelligibility measure I(y) without trying to further improve it by applying said one or more processing algorithms. In such case, the controller is configured to bypass the one or more processing algorithms, and to provide one of the input signals y exhibiting I(y)>Ides, as the resulting signal yres. In such case, the resulting signal is thus unprocessed by the one or more processing algorithms in question (but possibly processed by one or more other processing algorithms).
In an embodiment, the speech intelligibility measure I is a measure of a target signal to noise ratio, where the target signal represents a signal containing speech that the user currently intends to listen to, and the noise represents all other sound components in said sound in the environment of the user.
The hearing device may be adapted to a users' hearing profile, e.g. to compensate for a hearing impairment of the user. The hearing profile of the user may be defined by a parameter set Φ. The parameter set Φ may e.g. define the user's (frequency dependent) hearing thresholds (or their deviation from normal; e.g. reflected in an audiogram). In an embodiment, one of the ‘one or more processing algorithms’, is configured to compensate for a hearing loss of the user. In an embodiment, a compressive amplification algorithm (for adapting the input signal(s) to a user's needs) forms part of the ‘one or more processing algorithms’.
The controller may be configured to determine the estimate of the speech intelligibility measure I for use in determining the second, optimized, parameter setting Θ′(k′,m) with a second frequency resolution k that is lower than a first frequency resolution k′ that is used to determine the first parameter setting Θ1(k′,m) on which the first processed signal Yp(Θ1) is based. In an embodiment, a first part of the processing (e.g. the processing of the electric input signals using first processing settings Θ1(k′,m)) is applied in individual frequency bands with a first frequency resolution, represented by a first frequency index k′, and a second part of the processing (e.g. the determination of the speech intelligibility measure I(k,m,Θ,Φ) of the processed signal for use in modifying the first parameter settings Θ1(k′,m) to optimized parameter settings Θ′(k′,m)) is applied in individual frequency bands with a second (different, e.g. lower) frequency resolution, represented by a second frequency index k (see e.g.
In an embodiment, the hearing device constitutes or comprises a hearing aid.
In an embodiment, the hearing device, e.g. a signal processor, is adapted to provide a frequency dependent gain and/or a level dependent compression and/or a transposition (with or without frequency compression) of one or more frequency ranges to one or more other frequency ranges, e.g. to compensate for a hearing impairment of a user.
In an embodiment, the hearing device comprises an output unit for providing a stimulus perceived by the user as an acoustic signal based on the processed electric input signal. In an embodiment, the output unit comprises a number of electrodes of a cochlear implant or a vibrator of a bone conducting hearing aid. In an embodiment, the output unit comprises an output transducer. In an embodiment, the output transducer comprises a receiver (loudspeaker) for providing the stimulus as an acoustic signal to the user. In an embodiment, the output transducer comprises a vibrator for providing the stimulus as mechanical vibration of a skull bone to the user (e.g. in a bone-attached or bone-anchored hearing aid).
The hearing device comprises an input unit for providing an electric input signal representing sound. In an embodiment, the input unit comprises an input transducer, e.g. a microphone, for converting an input sound to an electric input signal. In an embodiment, the input unit comprises a wireless receiver for receiving a wireless signal comprising sound and for providing an electric input signal representing said sound.
In an embodiment, the hearing device comprises a directional microphone system adapted to spatially filter sounds from the environment, and thereby enhance a target acoustic source among a multitude of acoustic sources in the local environment of the user wearing the hearing device. In an embodiment, the directional system is adapted to detect (such as adaptively detect) from which direction a particular part of the microphone signal originates. This can be achieved in various different ways as e.g. described in the prior art. In hearing aids, a microphone array beamformer is often used for spatially attenuating background noise sources. Many beamformer variants can be found in literature. The minimum variance distortionless response (MVDR) beamformer is widely used in microphone array signal processing. Ideally the MVDR beamformer keeps the signals from the target direction (also referred to as the look direction) unchanged, while attenuating sound signals from other directions maximally. The generalized sidelobe canceller (GSC) structure is an equivalent representation of the MVDR beamformer offering computational and numerical advantages over a direct implementation in its original form.
In an embodiment, the hearing device comprises an antenna and transceiver circuitry (e.g. a wireless receiver) for wirelessly receiving a direct electric input signal from another device, e.g. from an entertainment device (e.g. a TV-set), a communication device, a wireless microphone, or another hearing device. In an embodiment, the direct electric input signal represents or comprises an audio signal and/or a control signal and/or an information signal. In an embodiment, the hearing device comprises demodulation circuitry for demodulating the received direct electric input to provide the direct electric input signal representing an audio signal and/or a control signal e.g. for setting an operational parameter (e.g. volume) and/or a processing parameter of the hearing device. In general, a wireless link established by antenna and transceiver circuitry of the hearing device can be of any type. In an embodiment, the wireless link is established between two devices, e.g. between an entertainment device (e.g. a TV) and the hearing device, or between two hearing devices, e.g. via a third, intermediate device (e.g. a processing device, such as a remote control device, a smartphone, etc.). In an embodiment, the wireless link is used under power constraints, e.g. in that the hearing device is or comprises a portable (typically battery driven) device. In an embodiment, the wireless link is a link based on near-field communication, e.g. an inductive link based on an inductive coupling between antenna coils of transmitter and receiver parts. In another embodiment, the wireless link is based on far-field, electromagnetic radiation. Preferably, communication between the hearing device and other devices is based on some sort of modulation at frequencies above 100 kHz. Preferably, frequencies used to establish a communication link between the hearing device and the other device is below 70 GHz, e.g. located in a range from 50 MHz to 70 GHz, e.g. above 300 MHz, e.g. in an ISM range above 300 MHz, e.g. in the 900 MHz range or in the 2.4 GHz range or in the 5.8 GHz range or in the 60 GHz range (ISM=Industrial, Scientific and Medical, such standardized ranges being e.g. defined by the International Telecommunication Union, ITU). In an embodiment, the wireless link is based on a standardized or proprietary technology. In an embodiment, the wireless link is based on Bluetooth technology (e.g. Bluetooth Low-Energy technology).
In an embodiment, the hearing aid is a portable device, e.g. a device comprising a local energy source, e.g. a battery, e.g. a rechargeable battery, e.g. a hearing aid.
In an embodiment, the hearing device comprises a forward or signal path between an input unit (e.g. an input transducer, such as a microphone or a microphone system and/or direct electric input (e.g. a wireless receiver)) and an output unit, e.g. an output transducer. In an embodiment, the signal processor is located in the forward path. In an embodiment, the signal processor is adapted to provide a frequency dependent gain according to a user's particular needs. In an embodiment, the hearing device comprises an analysis path comprising functional components for analyzing the input signal (e.g. determining a level, a modulation, a type of signal, an acoustic feedback estimate, etc.). In an embodiment, some or all signal processing of the analysis path and/or the signal path is conducted in the frequency domain. In an embodiment, some or all signal processing of the analysis path and/or the signal path is conducted in the time domain.
In an embodiment, an analogue electric signal representing an acoustic signal is converted to a digital audio signal in an analogue-to-digital (AD) conversion process, where the analogue signal is sampled with a predefined sampling frequency or rate fs, fs being e.g. in the range from 8 kHz to 48 kHz (adapted to the particular needs of the application) to provide digital samples xn (or x[n]) at discrete points in time tn (or n), each audio sample representing the value of the acoustic signal at ti, by a predefined number Nb of bits, Nb being e.g. in the range from 1 to 48 bits, e.g. 24 bits. Each audio sample is hence quantized using Nb bits (resulting in 2Nb different possible values of the audio sample). A digital sample x has a length in time of 1/fs, e.g. 50 μs, for fs=20 kHz. In an embodiment, a number of audio samples are arranged in a time frame. In an embodiment, a time frame comprises 64 or 128 audio data samples. Other frame lengths may be used depending on the practical application.
In an embodiment, the hearing device comprise an analogue-to-digital (AD) converter to digitize an analogue input (e.g. from an input transducer, such as a microphone) with a predefined sampling rate, e.g. 20 kHz. In an embodiment, the hearing device comprise a digital-to-analogue (DA) converter to convert a digital signal to an analogue output signal, e.g. for being presented to a user via an output transducer.
In an embodiment, the hearing device, e.g. the microphone unit, and or the transceiver unit comprise(s) a TF-conversion unit for providing a time-frequency representation of an input signal. In an embodiment, the time-frequency representation comprises an array or map of corresponding complex or real values of the signal in question in a particular time and frequency range. In an embodiment, the TF conversion unit comprises a filter bank for filtering a (time varying) input signal and providing a number of (time varying) output signals each comprising a distinct frequency range of the input signal. In an embodiment, the TF conversion unit comprises a Fourier transformation unit for converting a time variant input signal to a (time variant) signal in the (time-)frequency domain. In an embodiment, the frequency range considered by the hearing device from a minimum frequency fmin to a maximum frequency fmax comprises a part of the typical human audible frequency range from 20 Hz to 20 kHz, e.g. a part of the range from 20 Hz to 12 kHz. Typically, a sample rate fs is larger than or equal to twice the maximum frequency fmax, fs≥2fmax. In an embodiment, a signal of the forward and/or analysis path of the hearing device is split into a number NI of frequency bands (e.g. of uniform width), where NI is e.g. larger than 5, such as larger than 10, such as larger than 50, such as larger than 100, such as larger than 500, at least some of which are processed individually. In an embodiment, the hearing device is/are adapted to process a signal of the forward and/or analysis path in a number NP of different frequency channels (NP≤NI). The frequency channels may be uniform or non-uniform in width (e.g. increasing in width with frequency), overlapping or non-overlapping.
In an embodiment, the hearing device comprises a number of detectors configured to provide status signals relating to a current physical environment of the hearing device (e.g. the current acoustic environment), and/or to a current state of the user wearing the hearing device, and/or to a current state or mode of operation of the hearing device. Alternatively or additionally, one or more detectors may form part of an external device in communication (e.g. wirelessly) with the hearing device. An external device may e.g. comprise another hearing device, a remote control, and audio delivery device, a telephone (e.g. a Smartphone), an external sensor, etc.
In an embodiment, one or more of the number of detectors operate(s) on the full band signal (time domain). In an embodiment, one or more of the number of detectors operate(s) on band split signals ((time-) frequency domain), e.g. in a limited number of frequency bands.
In an embodiment, the number of detectors comprises a level detector for estimating a current level of a signal of the forward path. In an embodiment, the predefined criterion comprises whether the current level of a signal of the forward path is above or below a given (L-)threshold value. In an embodiment, the level detector operates on the full band signal (time domain). In an embodiment, the level detector operates on band split signals ((time-) frequency domain).
In a particular embodiment, the hearing device comprises a voice detector (VD) for estimating whether or not (or with what probability) an input signal comprises a voice signal (at a given point in time). A voice signal is in the present context taken to include a speech signal from a human being. It may also include other forms of utterances generated by the human speech system (e.g. singing). In an embodiment, the voice detector unit is adapted to classify a current acoustic environment of the user as a VOICE or NO-VOICE environment. This has the advantage that time segments of the electric microphone signal comprising human utterances (e.g. speech) in the user's environment can be identified, and thus separated from time segments only (or mainly) comprising other sound sources (e.g. artificially generated noise). In an embodiment, the voice detector is adapted to detect as a VOICE also the user's own voice. Alternatively, the voice detector is adapted to exclude a user's own voice from the detection of a VOICE.
In an embodiment, the hearing device comprises an own voice detector for estimating whether or not (or with what probability) a given input sound (e.g. a voice, e.g. speech) originates from the voice of the user of the system. In an embodiment, a microphone system of the hearing device is adapted to be able to differentiate between a user's own voice and another person's voice and possibly from NON-voice sounds.
In an embodiment, the hearing device comprises a language detector for estimating the current language or is configured to receive such information from another device, e.g. from a remote control device, e.g. from a smartphone, or similar device. An estimated speech intelligibility may depend on whether the used language is the listener's native language or a second language. Consequently, the amount of noise reduction needed may depend on the language.
In an embodiment, the number of detectors comprises a movement detector, e.g. an acceleration sensor. In an embodiment, the movement detector is configured to detect movement of the user's facial muscles and/or bones, e.g. due to speech or chewing (e.g. jaw movement) and to provide a detector signal indicative thereof.
In an embodiment, the hearing device comprises a classification unit configured to classify the current situation based on input signals from (at least some of) the detectors, and possibly other inputs as well. In the present context ‘a current situation’ is taken to be defined by one or more of
a) the physical environment (e.g. including the current electromagnetic environment, e.g. the occurrence of electromagnetic signals (e.g. comprising audio and/or control signals) intended or not intended for reception by the hearing device, or other properties of the current environment than acoustic);
b) the current acoustic situation (input level, feedback, etc.), and
c) the current mode or state of the user (movement, temperature, cognitive load, etc.);
d) the current mode or state of the hearing device (program selected, time elapsed since last user interaction, etc.) and/or of another device in communication with the hearing device.
In an embodiment, the hearing device comprises an acoustic (and/or mechanical) feedback suppression system. In an embodiment, the hearing device further comprises other relevant functionality for the application in question, e.g. compression, noise reduction, etc.
In an embodiment, the hearing device is or comprises a hearing aid. In an embodiment, the hearing aid is or comprises a hearing instrument, e.g. a hearing instrument adapted for being located at the ear or fully or partially in the ear canal of a user, or for being fully or partially implanted in the head of a user. In an embodiment, the hearing device is or comprises a headset, an earphone, or an active ear protection device.
In a further aspect, a hearing device, e.g. a hearing aid, adapted for being worn by a user and for receiving sound from the environment of the user and to improve (or process the sound with a view to) the user's intelligibility of speech in said sound is provided by the present disclosure. An estimate of the user's intelligibility of speech in said sound being defined by a speech intelligibility measure I of said sound at a current point in time t. The hearing device comprises
It is intended that some or all of the structural features of the hearing device described above, in the ‘detailed description of embodiments’ or in the claims can be combined with embodiments of the hearing device according to the further aspect.
The number of electric input signals y may be one, or two, or more.
The controller may further be configured to control the processor to provide said resulting signal yres at a current point in time t according to the following scheme
The hearing device may be configured to provide that the first parameter setting Θ1 is a setting that maximizes a signal to noise ratio (SNR) or the speech intelligibility measure I of the first processed signal yp(Θ1).
In a still further aspect, a hearing device, e.g. a hearing aid, is provided. The hearing device comprises
It is intended that some or all of the structural features of the hearing device described above, in the ‘detailed description of embodiments’ or in the claims can be combined with embodiments of the hearing device according to the still further aspect.
The controller may be configured to apply a higher weight to the speech intelligibility estimator the lower the estimated predictability of the sound signal, to thereby provide the modified speech intelligibility estimate.
The hearing device may be configured to control the one or more processing algorithms, e.g.
a beamformer-noise reduction algorithm, in dependence of the modified speech intelligibility estimate (see e.g.
Use:
In an aspect, use of a hearing aid as described above, in the ‘detailed description of embodiments’ and in the claims, is moreover provided. In an embodiment, use is provided in a system comprising one or more hearing aids (e.g. hearing instruments), or headsets, e.g. in handsfree telephone systems, teleconferencing systems, public address systems, karaoke systems, classroom amplification systems, etc.
A Method:
In an aspect, a method of operating a hearing device adapted for being worn by a user and to improve (or to process sound with a view to) the user's intelligibility of speech in sound is furthermore provided by the present application. The method comprises
The method may further comprise
In a further aspect, a method of operating a hearing device, e.g. a hearing aid, adapted for being worn by a user and for receiving sound from the environment of the user and to improve (or to process the sound with a view to) the user's intelligibility of speech in said sound is provided by the present disclosure. An estimate of the user's intelligibility of speech in said sound being defined by a speech intelligibility measure I of said sound at a current point in time t. The method comprises
The number of electric input signals y may be one, or two, or more.
The method may further comprise controlling the processing to provide that said resulting signal yres at a current point in time t is provided according to the following scheme
It is intended that some or all of the structural features of the device described above, in the ‘detailed description of embodiments’ or in the claims can be combined with embodiments of the method, when appropriately substituted by a corresponding process and vice versa. Embodiments of the method have the same advantages as the corresponding devices.
The method is repeated over time, e.g. according to a predefined scheme, e.g. periodically, e.g. every time instance m, e.g. for every time frame of a signal of the forward path. In an embodiment, the method is repeated every Nth time frame, e.g. every N=10 time frames or every N=100 time frames. In an embodiment, N is adaptively determined in dependence of the electric input signal, and/or of one or more sensor signals (e.g. indicative of a current acoustic environment of the user, and/or of a mode of operation of the hearing device, e.g. a battery status indication).
In an embodiment, the first parameter setting Θ1 is a setting that maximizes a signal to noise ratio (SNR) and/or a said speech intelligibility measure I of the first processed signal yp(Θ1).
The method may comprise: providing the number of electric input signals y in a time frequency representation y(k′,m), where k′ and m are frequency and time indices, respectively.
The method may comprise: providing that the speech intelligibility measure I(t) comprises estimating an apparent SNR, SNR (k,m,Φ), in each time frequency tile (k,m). The speech intelligibility measure I(t) may be a function ƒ(⋅) of an SNR, e.g. on a time-frequency tile level. The function ƒ(⋅) may be modeled by a neural network that maps SNR-estimates SNR(k,m) to predicted intelligibility I(k,m). In an embodiment, I=f(SNR(k,m,Φ,Θ)), e.g.:
where m0 represents a current point in time, and M′ represents the number of time frames containing speech considered (e.g. corresponding to a recent syllable, or a word, or an entire sentence), and where is estimated from noisy electric input signals or processed versions thereof (using parameter setting Θ).
In an embodiment, the method comprises: providing that the resulting signal yres at a current point in time t comprises
The one or more processing algorithms may comprise a single channel noise reduction algorithm and/or a multi-input beamformer filtering algorithm. The number of electric input signals y may be larger than one, e.g. two or more. In an embodiment, the beamformer filtering algorithm comprises an MVDR algorithm.
The method may comprise that the second parameter setting Θ′ is determined under a constraint of minimizing a change of said electric input signals y. In the event that the SNR of the electric input signal(s) (e.g. unprocessed inputs signals) corresponds to a speech intelligibility measure I that exceeds the desired speech intelligibility value Ides, the one or more processing algorithms should not be applied to the electric input signals. ‘Minimizing a change of the inputs signals’ may e.g. mean performing as little processing on the signals as possible. ‘Minimizing a change of said number of electric input signals’ may e.g. be evaluated using a distance measure, e.g. an Euclidian distance, e.g. applied to waveforms, e.g. in a time domain or a time-frequency representation.
The method may comprise that the apparent SNR is estimated following a maximum likelihood procedure.
The method may comprise that the second parameter setting Θ′ is estimated with a first frequency resolution k′ that is finer than a second frequency resolution k that is used to determine the estimate of speech intelligibility I.
A Computer Readable Medium:
In an aspect, a tangible computer-readable medium storing a computer program comprising program code means for causing a data processing system to perform at least some (such as a majority or all) of the steps of the method described above, in the ‘detailed description of embodiments’ and in the claims, when said computer program is executed on the data processing system is furthermore provided by the present application.
By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media. In addition to being stored on a tangible medium, the computer program can also be transmitted via a transmission medium such as a wired or wireless link or a network, e.g. the Internet, and loaded into a data processing system for being executed at a location different from that of the tangible medium.
A Computer Program:
A computer program (product) comprising instructions which, when the program is executed by a computer, cause the computer to carry out (steps of) the method described above, in the ‘detailed description of embodiments’ and in the claims is furthermore provided by the present application.
A Data Processing System:
In an aspect, a data processing system comprising a processor and program code means for causing the processor to perform at least some (such as a majority or all) of the steps of the method described above, in the ‘detailed description of embodiments’ and in the claims is furthermore provided by the present application.
A Hearing System:
In a further aspect, a hearing system comprising a hearing aid as described above, in the ‘detailed description of embodiments’, and in the claims, AND an auxiliary device is moreover provided.
In an embodiment, the hearing system is adapted to establish a communication link between the hearing aid and the auxiliary device to provide that information (e.g. control and status signals, possibly audio signals) can be exchanged or forwarded from one to the other.
In an embodiment, the hearing system comprises an auxiliary device, e.g. a remote control, a smartphone, or other portable or wearable electronic device, such as a smartwatch or the like.
In an embodiment, the auxiliary device is or comprises a remote control for controlling functionality and operation of the hearing aid(s). In an embodiment, the function of a remote control is implemented in a SmartPhone, the SmartPhone possibly running an APP allowing to control the functionality of the audio processing device via the SmartPhone (the hearing aid(s) comprising an appropriate wireless interface to the SmartPhone, e.g. based on Bluetooth or some other standardized or proprietary scheme).
In an embodiment, the auxiliary device is or comprises an audio gateway device adapted for receiving a multitude of audio signals (e.g. from an entertainment device, e.g. a TV or a music player, a telephone apparatus, e.g. a mobile telephone or a computer, e.g. a PC) and adapted for selecting and/or combining an appropriate one of the received audio signals (or combination of signals) for transmission to the hearing aid.
In an embodiment, the auxiliary device is or comprises another hearing aid. In an embodiment, the hearing system comprises two hearing aids adapted to implement a binaural hearing system, e.g. a binaural hearing aid system.
In an embodiment, binaural noise reduction (comparing and coordinating noise reduction between the two hearing aids of the hearing system) is only enabled in the case where the monaural beamformers (the beamformers of the individual hearing aids) do not provide a sufficient amount of help (e.g. cannot provide a speech intelligibility measure equal to Ides). Hereby also the amount of transmitted data between the ears depend on the estimated speech intelligibility (and can thus be decreased).
An APP:
In a further aspect, a non-transitory application, termed an APP, is furthermore provided by the present disclosure. The APP comprises executable instructions configured to be executed on an auxiliary device to implement a user interface for a hearing aid or a hearing system described above in the ‘detailed description of embodiments’, and in the claims. In an embodiment, the APP is configured to run on cellular phone, e.g. a smartphone, or on another portable device allowing communication with said hearing aid or said hearing system.
In the present context, a ‘hearing device’ refers to a device, such as a hearing aid, e.g. a hearing instrument, or an active ear-protection device, or other audio processing device, which is adapted to improve, augment and/or protect the hearing capability of a user by receiving acoustic signals from the user's surroundings, generating corresponding audio signals, possibly modifying the audio signals and providing the possibly modified audio signals as audible signals to at least one of the user's ears. A ‘hearing device’ further refers to a device such as an earphone or a headset adapted to receive audio signals electronically, possibly modifying the audio signals and providing the possibly modified audio signals as audible signals to at least one of the user's ears. Such audible signals may e.g. be provided in the form of acoustic signals radiated into the user's outer ears, acoustic signals transferred as mechanical vibrations to the user's inner ears through the bone structure of the user's head and/or through parts of the middle ear as well as electric signals transferred directly or indirectly to the cochlear nerve of the user.
The hearing device may be configured to be worn in any known way, e.g. as a unit arranged behind the ear with a tube leading radiated acoustic signals into the ear canal or with an output transducer, e.g. a loudspeaker, arranged close to or in the ear canal, as a unit entirely or partly arranged in the pinna and/or in the ear canal, as a unit, e.g. a vibrator, attached to a fixture implanted into the skull bone, as an attachable, or entirely or partly implanted, unit, etc. The hearing device may comprise a single unit or several units communicating electronically with each other. The loudspeaker may be arranged in a housing together with other components of the hearing device, or may be an external unit in itself (possibly in combination with a flexible guiding element, e.g. a dome-like element).
More generally, a hearing device comprises an input transducer for receiving an acoustic signal from a user's surroundings and providing a corresponding input audio signal and/or a receiver for electronically (i.e. wired or wirelessly) receiving an input audio signal, a (typically configurable) signal processing circuit (e.g. a signal processor, e.g. comprising a configurable (programmable) processor, e.g. a digital signal processor) for processing the input audio signal and an output unit for providing an audible signal to the user in dependence on the processed audio signal. The signal processor may be adapted to process the input signal in the time domain or in a number of frequency bands. In some hearing devices, an amplifier and/or compressor may constitute the signal processing circuit. The signal processing circuit typically comprises one or more (integrated or separate) memory elements for executing programs and/or for storing parameters used (or potentially used) in the processing and/or for storing information relevant for the function of the hearing device and/or for storing information (e.g. processed information, e.g. provided by the signal processing circuit), e.g. for use in connection with an interface to a user and/or an interface to a programming device. In some hearing devices, the output unit may comprise an output transducer, such as e.g. a loudspeaker for providing an air-borne acoustic signal or a vibrator for providing a structure-borne or liquid-borne acoustic signal. In some hearing devices, the output unit may comprise one or more output electrodes for providing electric signals (e.g. a multi-electrode array for electrically stimulating the cochlear nerve).
In some hearing devices, the vibrator may be adapted to provide a structure-borne acoustic signal transcutaneously or percutaneously to the skull bone. In some hearing devices, the vibrator may be implanted in the middle ear and/or in the inner ear. In some hearing devices, the vibrator may be adapted to provide a structure-borne acoustic signal to a middle-ear bone and/or to the cochlea. In some hearing devices, the vibrator may be adapted to provide a liquid-borne acoustic signal to the cochlear liquid, e.g. through the oval window. In some hearing devices, the output electrodes may be implanted in the cochlea or on the inside of the skull bone and may be adapted to provide the electric signals to the hair cells of the cochlea, to one or more hearing nerves, to the auditory brainstem, to the auditory midbrain, to the auditory cortex and/or to other parts of the cerebral cortex.
A hearing device, e.g. a hearing aid, may be adapted to a particular user's needs, e.g. a hearing impairment. A configurable signal processing circuit of the hearing device may be adapted to apply a frequency and level dependent compressive amplification of an input signal. A customized frequency and level dependent gain (amplification or compression) may be determined in a fitting process by a fitting system based on a user's hearing data, e.g. an audiogram, using a fitting rationale (e.g. adapted to speech). The frequency and level dependent gain may e.g. be embodied in processing parameters, e.g. uploaded to the hearing device via an interface to a programming device (fitting system), and used by a processing algorithm executed by the configurable signal processing circuit of the hearing device.
A ‘hearing system’ refers to a system comprising one or two hearing devices, and a ‘binaural hearing system’ refers to a system comprising two hearing devices and being adapted to cooperatively provide audible signals to both of the user's ears. Hearing systems or binaural hearing systems may further comprise one or more ‘auxiliary devices’, which communicate with the hearing device(s) and affect and/or benefit from the function of the hearing device(s). Auxiliary devices may be e.g. remote controls, audio gateway devices, mobile phones (e.g. SmartPhones), or music players. Hearing devices, hearing systems or binaural hearing systems may e.g. be used for compensating for a hearing-impaired person's loss of hearing capability, augmenting or protecting a normal-hearing person's hearing capability and/or conveying electronic audio signals to a person. Hearing devices or hearing systems may e.g. form part of or interact with public-address systems, active ear protection systems, handsfree telephone systems, car audio systems, entertainment (e.g. karaoke) systems, teleconferencing systems, classroom amplification systems, etc.
Embodiments of the disclosure may e.g. be useful in applications such as hearing aid systems, or other portable audio processing systems.
The aspects of the disclosure may be best understood from the following detailed description taken in conjunction with the accompanying figures. The figures are schematic and simplified for clarity, and they just show details to improve the understanding of the claims, while other details are left out. Throughout, the same reference numerals are used for identical or corresponding parts. The individual features of each aspect may each be combined with any or all features of the other aspects. These and other aspects, features and/or technical effect will be apparent from and elucidated with reference to the illustrations described hereinafter in which:
The figures are schematic and simplified for clarity, and they just show details which are essential to the understanding of the disclosure, while other details are left out. Throughout, the same reference signs are used for identical or corresponding parts.
Further scope of applicability of the present disclosure will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the disclosure, are given by way of illustration only. Other embodiments may become apparent to those skilled in the art from the following detailed description.
The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. Several aspects of the apparatus and methods are described by various blocks, functional units, modules, components, circuits, steps, processes, algorithms, etc. (collectively referred to as “elements”). Depending upon particular application, design constraints or other reasons, these elements may be implemented using electronic hardware, computer program, or any combination thereof.
The electronic hardware may include microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described throughout this disclosure. Computer program shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.
The present application relates to the field of hearing devices, e.g. hearing aids. A main task of a hearing aid is to increase a hearing impaired user's intelligibility of speech content in a sound field surrounding the user in a given situation. This goal is pursued by applying a number of processing algorithms to one or more electric input signals (e.g. delivered by one or more microphones). Examples of such processing algorithms are algorithms for compressive amplification, noise reduction (including spatial filtering), feedback reduction, de-reverberation, etc.
EP3057335A1 deals with a binaural hearing system wherein processing of audio signals of respective left and right hearing devices is controlled in dependence of a (binaural) speech intelligibility measure of the processed signal. US20050141737A1 deals with a hearing aid comprising a speech optimization block adapted for selecting a gain vector representing levels of gain for respective frequency band signals, for calculating, based on the frequency band signals and the gain vector, a speech intelligibility index, and for optimizing the gain vector through iteratively varying the gain vector, calculating respective indices of speech intelligibility and selecting a vector that maximizes the speech intelligibility index. WO2014094865A1 deals with a method of optimizing a speech intelligibility measure by iteratively varying the applied gain in individual frequency bands of a hearing aid until a maximum is reached.
The hearing aid (HD) comprises an input unit (IU) for providing a number (e.g. a multitude, here one) of electric input signals, y, each representing sound in the environment of the user. The hearing aid (HD) further comprises a configurable signal processor (HAPU) for processing the electric input signal(s) according to a configurable parameter setting Θ of one or more processing algorithms, and providing a resulting (preferably optimized, e.g. processed) signal yres. The hearing aid (HD) comprises an output unit (OU) for providing stimuli representative of the (resulting) processed signal and perceivable as sound by the user. The input unit (IU), the signal processor (HAPU) and the output unit (OU) are operationally connected and form part of a forward path of the hearing aid. In the embodiment of
The hearing aid (HD) further comprises a controller (CONT, cf. dashed outline in
The embodiment of a hearing aid shown in
In the embodiment of
The embodiment of a controller (CONT) illustrated in
A speech intelligibility measure of one or more processed or un-processed signals is determined at successive points in time t. As indicated in
The controller is configured to control the processor to provide that the resulting signal yres at a current point in time t is equal to one of the electric input signals y, in case a current value I(y) of the speech intelligibility measure I for the electric input signal y in question (in
In case the statement ‘I(y(t))>Ides?’ is false (branch ‘No’), i.e. if the speech intelligibility measure I of the number of electric input signals y is smaller than the desired value Ides, the controller is further configured to control the processor to provide that the resulting signal yres at the current point in time t in dependence of a predefined criterion. The predefined criterion is related to characteristics of a first processed signal yp(Θ1) based on a first parameter setting Θ1 of the processing algorithm in question, e.g. a parameter setting that maximizes an SNR or an intelligibility measure. In case, for example, that the current value I(yp(Θ1)) of the speech intelligibility measure I for the first processed signal yp(Θ1) is smaller than or equal to the desired value Ides of the speech intelligibility measure I (cf. respective units or process steps, ‘Determine I(yp(Θ1,t))’, ‘I(yp(Θ1,t))≤Ides?’, (i.e. branch ‘Yes’), in other words in case that the processing algorithm cannot compensate sufficiently for noise in the input signal, the unit or process step ‘Chose appropriate signal ysel. Set yres(t)=ysel(t)’, e.g. according to a predefined criterion, e.g. in dependence of the size of the difference of Ides−I(yp(Θ1,t)), and advance time to the next time index ‘t=t+1’). The selectable signal ysel may e.g. comprise or be an information signal indicating to the user that the target signal is of poor quality (and difficult to understand). The controller may e.g. be configured to control the processor to provide that (the selectable signal ysel and thus) the resulting signal yres at the current point in time t is equal to one of the electric input signals y, or equal to the first processed signal yp(Θ1), e.g. attenuated and/or superposed by an information signal (cf. e.g. yinf in
In case the statement ‘I(yp(Θ1,t))≤Ides?’ is false (branch ‘No’), i.e. if the speech intelligibility measure I of the processed signal yp(Θ1,t) is larger than the desired value Ides, the controller is further configured to determine a second parameter setting Θ′ of the processing algorithm under the constraint that the second processed signal yp(Θ′) exhibits the desired value Ides of the speech intelligibility measure, and to control the processor to provide that the resulting signal yres at the current point in time t is equal to the second, optimized, processed signal yp(Θ′) (cf. respective units or process steps, ‘Find Θ’ providing I(yp(Θ,t)=Ides. Set yres=yp(Θ′,t)’, and advance time to the next time index ‘t=t+1’).
The first parameter setting Θ1 may e.g. be a setting that maximizes a signal to noise ratio (SNR) and/or the speech intelligibility measure I of the first processed signal yp(Θ1). The second (optimized) parameter setting Θ′ is e.g. a setting that (when applied by the one or more processing algorithms to process the number of electric input signal(s)) provides the second (optimized) processed signal yp(Θ′), which yields the desired level of speech intelligibility to the user, as reflected in the desired value Ides of the speech intelligibility measure).
The one or more processing algorithms may e.g. be constituted by or comprise a single channel noise reduction algorithm. The single channel noise reduction algorithm is configured to receive a single electric signal, e.g. a signal from a (possibly omni-directional) microphone, or a spatially filtered signal, e.g. from a beamformer filtering unit. Alternatively or additionally, the one or more processing algorithms may be constituted by or comprise a beamformer algorithm for receiving a multitude of electric input signals, or processed versions thereof, and providing a spatially filtered, beamformed, signal. The controller (CONT) is configured to control the beamformer algorithm using specific beamformer settings. The first parameter setting Θ1 comprise a first beamformer setting, and the second parameter setting Θ′ comprises a second (optimized) beamformer setting. The first beamformer settings are e.g. determined based on the multitude of electric input signals and one or more control signals, e.g. from one or more sensors (e.g. including a voice activity detector), without specifically considering a value of the speech intelligibility measure of the current beamformed signal. The first parameter setting Θ1 may constitute or comprise a beamformer setting that maximizes a (target) signal to noise ratio (SNR) of the (first) beamformed signal.
In the following, the problem is illustrated by a beamforming (spatial filtering) algorithm. Beamforming/spatial filtering techniques provide the most efficient method for improving the speech intelligibility for hearing aid users in acoustically challenging environments. However, despite the benefits of beamformers in many situations, they come with negative side effects in other situations. The side effects include:
In the following, we use the term “beamforming” to cover any process, where multiple sensor signals (microphones or otherwise) are combined (linearly or otherwise) to form an enhanced signal with more desirable properties than the input signals. We are also going to use the terms “beamforming” and “noise reduction” interchangeably.
It is known that the problems above involve a trade-off between the amount of noise reduction and the amount of side effects.
For example, for an acoustic situation with a single point target signal source and a single point-like noise source, a maximum-noise-reduction beamformer is able to essentially eliminate the noise source by placing a spatial zero in its direction. Hence, the noise is removed maximally, but the end-user experiences a loss of loudness and a loss of “connectedness” to the acoustic world, because the point noise source is not only suppressed to a level that e.g. allows easy speech comprehension, but is completely eliminated.
Similarly, for a binaural beamforming setup with a point target source in an isotropic (diffuse) noise field, a minimum-variance-distortion-less-response (MVDR) binaural beamformer is going to reduce the noise level quite significantly, but the spatial cues of the processed noise are modified in the process. Specifically, whereas the original noise sounds as if originating from all directions, the noise experienced after beamforming sounds as if originating from a single direction, namely the target direction.
The proposed solution to these problems lies in the observation that often, maximum-noise-reduction is an overkill in terms of speech comprehension. The end-user might have been able to understand the target speech without difficulty, even if a milder noise reduction scheme had been applied and a milder noise reduction scheme would have caused much fewer of the side effects described above. Specifically, in the example with a target point source and an additive, point noise source, it could be sufficient to suppress the point noise source by 6 dB, say, to achieve a speech intelligibility of essentially 100%, rather than completely eliminating the noise point source. The idea of the proposed solution is to have the beamformer automatically find this desirable tradeoff and apply a noise reduction of 6 dB (for this situation) rather than eliminating the noise source. Furthermore, in situations where the general signal-to-noise ratio is already high enough that the user would understand speech without problems, the proposed beamformer would automatically detect this, and apply no spatial filtering.
In summary, the solution to the problem is to (automatically) find an appropriate tradeoff, namely the beamformer settings which lead to an acceptable speech intelligibility, but without overdoing the noise suppression.
In order to develop an algorithm that automatically determines the amount of spatial filtering/noise reduction necessary to achieve a sufficient speech intelligibility, a method is needed for judging the intelligibility of the signal to be presented for the user. To do so, the proposed solution relies on the very general assumption that the speech intelligibility I experienced by a (potentially hearing impaired) listener, is some function ƒ( ) of the signal-to-noise ratios SNR(k,m,Φ,Θ) in relevant time-frequency tiles of the signal. The parameters k,m denote frequency and time, respectively. The variable Θ represents beamformer settings (or generally ‘processing parameters of a processing algorithm’), e.g. the beamformer weights W used to linearly combine microphone signals. Obviously, the SNR of the output signal of a beamformer is a function of the beamformer settings. The parameter Φ represents a model/characterization of the auditory capabilities of the individual in question. Specifically, Φ could represent an audiogram, i.e., the hearing loss of the user, measured at pre-specified frequencies. Alternatively, it could represent the hearing threshold as a function of time and frequency, e.g. as estimated by an auditory model. The fact that the SNR is defined as a function of Φ anticipates that a potential hearing loss may be modelled as an additive noise source (in addition to any acoustic noise) which also degrades intelligibility hence, we often refer to the quantity SNR(k,m,Φ,Θ) as an apparent SNR [5].
Hence, we have
I=ƒ(SNR(k,m,Φ,Θ))
Generally, the function ƒ( ) is monotonically increasing with the SNR (SNR(k,m,Φ,Θ)) in each of the time-frequency tiles.
A well-known special case of this expression is the Extended Speech Intelligibility Index (ESII) [10], which may be approximated as (cf. [2]):
where
denote so-called band-importance functions, SNR(k,m,Φ,Θ) is the (apparent) SNR in time-frequency tile (k,m), and where M′ represents the number of time frames containing speech considered (e.g. corresponding to a recent syllable, or a word, or an entire sentence), and where K is the number of frequency bands considered, k=1, . . . , K. The frames containing speech may e.g. be identified by a voice (speech) activity detector, e.g. applied to one or more of the electric input signals.
In an embodiment, a first part of the processing (e.g. the processing of the electric input signals to provide first beamformer settings Θ(k′,m)) is applied in individual frequency bands with a first frequency resolution, represented by a first frequency index k′, and a second part of the processing (e.g. the determination of a speech intelligibility measure I for use in modifying the first beamformer settings Θ(k′,m) to optimized beamformer settings Θ′(k′,m), which provide a desired speech intelligibility Ides) is applied in individual frequency bands with a second (different, e.g. lower) frequency resolution, represented by a second frequency index k (see e.g.
The basic idea is based on the following observations:
Should it happen that the apparent SNR of the unprocessed signal (the electric input signal(s)) exceeds the desired speech intelligibility value Ides, no beamforming should be applied.
In the following, an example of a particular implementation of the basic idea described above. First, we outline, by way of example, how to compute SNR (k,m,Φ,Θ) for a given beamformer setting (section 1). To be able to explain this idea clearly, we use a simple example beamformer. The output of this example beamformer is a linear combination of the output of a minimum variance distortion-less response (MVDR) beamformer, and the noisy signal as observed at a pre-defined reference microphone. The coefficient in the linear combination controls the “aggressiveness” of the example beamformer. It is emphasized that this simple beamformer only serves as an example. The proposed idea is much more general and can be applied to other beamformer structures and to combinations of beamformers and single-microphone noise reduction systems, and to other processing algorithms, etc.
Next, we outline how to find the beamformer settings Θ, which achieve a pre-specified, desired intelligibility level, without unnecessarily over-suppressing the signal (section 2). As before, this description uses elements of the example beamformer introduced in section 1.
However, as before, the basic idea applies in a more general setting involving other types of beamformers, single-microphone noise reduction systems, etc.
1. SNR as Function of Beamformer Setting—Example
In this section we outline, by way of example, how to compute SNR (k,m,Φ) for a given beamformer setting.
Let us assume that an M—microphone hearing aid system is operated in a noisy environment. Specifically, let us assume that the r'th microphone signals is given by
y
r(n)=xr(n)+vr(n), r=1,KM,
where yr(n), xr(n) and vr(n) denote the noisy, clean target, and noise signal, respectively, observed at the rth microphone. Let us assume that each microphone signal is passed through some analysis filterbank, leading to filter bank signals Y(k,m)=[Y1(k,m)ΛYM(k,m)]T, where k and m denote a subband index and a time index, respectively, and superscript T denotes transposition. We define the vectors X(k,m)=[X1(k,m)ΛXM(k,m)]T and V(k,m)=[V1(k,m)ΛV(k,m)]T in a similar manner.
Let us, for the sake of the example, assume that we are going to apply a linear beamformer W(k,m)=[W1(k,m)ΛWM(k,m)]T to the noisy observations Y(k,m)=[Y1(k,m)ΛYM(k,m)]T to form an enhanced output
{circumflex over (X)}(k,m)=WH(k,m)Y(k,m).
Let d′(k,m)=[d′1(k,m)Λd′M(k,m)] denote the acoustic transfer function from the target source to each microphone, and let
d(k,m)=[d′1(k,m)/d′i(k,m)Λd′M(k,m)/d′i(k,m)]
denote the relative acoustic transfer function wrt. the ith (reference) microphone [1]. Furthermore, let
C
V(k,m)=E└V(k,m)V(k,m)H┘
denote the cross-power spectral density matrix of the noise. For later convenience, let us factorize CV (k,m) as [6],
C
V(k,m)=λV(k,m)ΓV(k,m)
where λV(k,m) is the power spectral density of the noise at the reference microphone (the ith microphone), and ΓV(k,m) is the noise covariance matrix, normalized so that element (i,i) equals one, cf. [6].
With these definitions, we are in a position to specify in further detail our example beamformer. Let us assume that our example beamformer W(k,m) is of the form,
W(k,m,αk,m)=αk,mWMVDR(k,m)+(1−αk,m)ei,
where
denotes the weight vector of a minimum variance distortion-less response beamformer, and the vector
e
i=[0 . . . 1 . . . 0],
where the 1 is located at index i (corresponding to the reference microphone), and 0≤αk,m≤1 is a trade-off parameter, which determines the “aggressiveness” of the beamformer. Instead of the linear combination of the MVDR beamformer (WMVDR) with an omni-directional beamformer (e) as proposed in this example, the aggressiveness of the beamformer may alternatively e.g. be defined by different sets of beamformer weights (Wz, z=1, . . . , Nz, where Nz is the number of different degrees of aggressiveness of the beamformer). With αk,m=1, W(k,m) is identical to an MVDR beamformer (i.e., the most “aggressive” beamformer that can be used in this example), while with αk,m=0, W(k,m) does not apply any spatial filtering, so that the output of the beamformer is identical to the signal at the reference microphone (e.g. corresponding to the electric input signal from an omni-directional microphone).
With this example beamformer system in place, we can find the link between the beamformer settings (αk,m in this example) and the resulting SNR(k,m,Φ,Θ). Here, we have introduced the additional parameter Θ, which represents the parameter set of the beamformer system, i.e., Θ={αk,m}, to indicate explicitly that the resulting SNR is a function of the beamformer setting.
To estimate SNR(k,m,Φ,Θ), the following procedure may be applied (we are applying specific maximum likelihood estimates below obviously, many other options exist).
SNR(k,m,Φ)=max({circumflex over (λ)}x,MLin(k,m)/{circumflex over (λ)}v,MLin(k,m),ε),
{circumflex over (λ)}x,MLout(k,m)={circumflex over (λ)}v,MLin|WH(k,m,αk,m)d(k,m)|2.
{circumflex over (λ)}v,MLout(k,m)={circumflex over (λ)}v,MLinWH(k,m,αk,m)ΓV(k,m)W(k,m,αk,m).
{circumflex over (λ)}v,Appout(k,m)=max({circumflex over (λ)}v,MLout(k,m),T(k,m)),
or
{circumflex over (λ)}v,Appout(k,m)={circumflex over (λ)}v,MLout(k,m)+T(k,m).
SNR(k,m,Φ,Θ)=max({circumflex over (λ)}x,MLout(k,m)/{circumflex over (λ)}v,Appout(k,m),ε),
2. How to Find the Beamformer Settings, which Achieve a Pre-Specified, Desired Intelligibility Level, without Unnecessarily Over-Suppressing the Signal. Example
We now outline a procedure to find the desired beamformer settings Θ which achieve a desired speech intelligibility level. In principle, the search for these settings may be divided into the following three situations:
Let us assume that a value Idesired reflecting the desired level of speech intelligibility is available. This value could, for example, have been established when the hearing aid system was fitted by the audiologist. Then, the proposed approach may be outlined as follows.
1)
2)
3)
The general function of these elements are as discussed in connection with
The input unit (IU) comprises a multitude (≥2) of microphones (M1, . . . , MM), each providing an electric input signal yr, r=1, . . . , M, each representing sound in the environment of the hearing aid (or the user wearing the hearing aid). The input unit (IU) may e.g. comprise analogue to digital converters and time domain to frequency domain converters (e.g. filter banks) as appropriate for the processing algorithms and analysis and control thereof.
The signal processor (HAPU) is configured to execute one or more processing algorithms. The signal processor (HAPU) comprises a beamformer filtering unit (BF) and is configured to execute a beamformer algorithm. The beamformer filtering unit (BF) receives the multitude of electric input signals yr, r=1, . . . , M from the input unit (IU), or processed versions thereof, and is configured to provide a spatially filtered, beamformed, signal yBF. The beamformer algorithm and thus the beamformed signal, is controlled by beamformer parameter settings Θ. A default first parameter setting Θ1 of the beamformer algorithm is e.g. determined based on the multitude of electric input signals yr, r=1, . . . , M, and optionally one or more control signals (det1, det2, . . . , detND), e.g. from one or more sensors (e.g. including a voice activity detector), to maximize a signal to noise ratio of the beamformed signal yBF, with or without specifically considering a value of the speech intelligibility measure I of the current beamformed signal yBF. The first parameter setting Θ1, and/or the beamformed signal yBF(Θ1) based thereon, is/are fed to the control unit (CONT) together with at least one (here all) of the electric input signals yr, r=1, . . . , M. An estimate of the intelligibility I(yBF(Θ)) of the beamformed signal yBF(Θ) based on the first parameter setting Θ1 (and the user's hearing profile, e.g. reflecting an impairment, Φ) is provided by the speech intelligibility estimator (ESI, cf.
The signal processor (HAPU) of the embodiment of
The signal processor (HAPU) of the embodiment of
The signal processor (HAPU) of the embodiment of
Examples of Processing Algorithms that May Benefit from the Proposed Scheme:
Beamforming (e.g. monaural beamforming) is—as described in the above example—an important candidate for use of the processing optimization scheme of the present disclosure. The first parameter setting Θ and the optimized parameter setting Θ′ (incurred by the proposed scheme) typically include frequency and time dependent beamformer weights W(k,m).
Another processing algorithm is binaural beamforming, where beamformer weights WL and WR for a left and right hearing aid, respectively, are optimized according to the present disclosure, e.g. according to the present scheme:
W
L=αk,mWL,mvdr+(1−αk,m)eL
W
R=αk,mWR,mvdr+(1−αk,m)eR
where WL,mvdr and WR,mvdr denote the weight vector of a minimum variance distortion-less response beamformer or the left and right hearing aids, respectively, and the vectors eL and eR have the form
e
x,i=[0 . . . 1 . . . 0],
where x=L, R, and the 1 is located at index i (corresponding to a reference microphone), and where 0≤αk,m≤1 is a trade-off parameter, which determines the “aggressiveness” of the beamformer.
Still another processing algorithm is single channel noise reduction, where relevant parameter settings (Θ, Θ′) would include weights gk′,m, applied to each time frequency tile, e.g. of a beamformed signal, where the frequency index k′ has a finer resolution than the frequency index k (e.g. of speech intelligibility estimate I, cf. e.g.
In an analogue to digital (AD) process, a digital sample y(n) has a length in time of 1/fs, e.g. 50 μs, for fs=20 kHz. A number of (audio) samples Ns are e.g. arranged in a time frame, as schematically illustrated in the lower part of
In the leftmost axis of
The two frequency index scales k and k′ represent two different levels of frequency resolution (a first, higher (index k′), and a second, lower (index k) frequency resolution). The two frequency scales may e.g. be used for processing in different parts of the processor or controller. In an embodiment, the controller (CONT in
The first (upper) signal path of the forward path in
y
p(k′,m,Θ1)=Y(k′,m)*g(k′,m,Θ1).
The second (middle) signal path of the forward path in
Y
p(k′,m,Θ′)=Y(k′,m)*g(k′,m,Θ′).
A given parameter setting Θ (comprising individual g(k′,m,Θ)=gΘ(k′,m)) is thus calculated in each time-frequency unit (k′,m), cf. hatched rectangle in
The third (lower) signal path of the forward path in
The controller (CONT), cf. dashed outline comprising two separate analysis paths, and adjustment unit (ADJ), provides the second (optimized) parameter setting Θ′ to the processor (HAPU). Each analysis path comprises ‘band sum’ unit (BS) for converting K′ frequency sub-bands to K frequency sub-bands (indicated by K′->K), thus providing respective input signals in K frequency bands (TF-units (k,m)). Each analysis path further comprises a speech intelligibility estimator ESI for providing an estimate of a user's intelligibility of speech I (in K frequency sub-bands) in the input signal in question. The first (leftmost in
The information unit (INF) (e.g. forming part of the signal processor (HAPU)) provides an information signal y inf (either as a time domain signal, or as a time-frequency domain (frequency sub-band) signal Yinf), which is configured to indicate to the user a status of the present acoustic situation regarding the estimated speech intelligibility I, in particular (or solely) in case the intelligibility is estimated to be sub-optimal (e.g. below the desired speech intelligibility measure Ides, or below a (first) threshold value Ith). The information signal may contain a spoken message (e.g. stored in a memory of the hearing device or generated from an algorithm).
The further processing unit (FP) provides further processing of the resulting signal Yres(k′,m) and provides a further processed signal Y′res(k′,m) in K′ frequency sub-bands. The further processing may e.g. comprise the application of a frequency and/or level dependent gain (or attenuation) g(k′,m) of the resulting signal Yres(k′,m) to compensate for a hearing impairment of the user (or to further compensate for a difficult listening situation of a normally hearing user), according to a hearing profile Φ of the user.
The additional inputs from internal or external sensors (e.g. speech (voice) activity detectors, and or other, e.g. optical, detectors, or bio-sensors) are not indicated in
A1. Determine SNR for an electric input signal yref received at a reference microphone;
A2. Determine a measure I of a users' speech intelligibility I(yref) of the unprocessed electric input signal yref;
A3. If I(yref)>Ides, where Ides is a desired value of the speech intelligibility measure I, set yres=yref, and don't apply the processing algorithm;
otherwise
B1. Determine beamformer filtering weights w (M×1) (first parameter setting Θ1) for a maximum SNR beamformer (e.g. an MVDR beamformer):
Where
(A beamformed signal (processed signal yp(Θ1)=yp(w)), representing an estimate Ŝ(1×1) of the target (speech) signal S of current interest to the user may then be determined by Ŝ=wH Y, where Y is the noisy input signal (M×1). The expression for the (maximum SNR) estimate Ŝ of the target signal may e.g. be provided in a time-frequency representation, i.e. a value of Ŝ for each time frequency tile (k′,m)).
B2. Determine output SNR of maximum SNR beamformer (processed signal yp(Θ1))
max−=ƒ(
Where
B3. Determine an estimated speech intelligibility
I
max-SNR
=f′(max−)
Where f′(⋅) represents a functional relationship.
B4. If Imax-SNR(=I(yp(Θ1))≤Ides (path ‘Yes’ in
C1. If Imax-SNR (=(yp(Θ1))≥Ides (path ‘No’ in
C2. Set yres=yp(Θ′).
Preferably, the parameter setting Θ′ (k′,m) is determined in a finer frequency resolution k′ than the speech intelligibility measure I(k,m).
Example, Noise Reduction Control Based on an Estimate of Speech Intelligibility:
In an aspect of the present disclosure, wherein the speech intelligibility measure is based on predictability. Highly predictable parts of an audio signal carry less information than parts of the audio signal with a lower predictability. One way to estimate intelligibility based on predictability is to weight frames in time and frequency higher, if the frames are less predictable from the surrounding frames.
A conceptual block diagram of the proposed joint design is shown in
At any given time, only one beamformer-postfilter pair is connected to the electric input signals in the circuit (cf. ‘microphone array signals’ connected to (Beamformer 1, Postfilter 1 via switch in
In practice, it may not be desirable to implement several beamformers and postfilters in hardware. A more practical block diagram that encompasses the above idea is shown in
The avoid unpleasant artifacts during switching from one beamformer-postfilter pair to another (
It is intended that the structural features of the devices described above, either in the detailed description and/or in the claims, may be combined with steps of the method, when appropriately substituted by a corresponding process.
As used, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well (i.e. to have the meaning “at least one”), unless expressly stated otherwise. It will be further understood that the terms “includes,” “comprises,” “including,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will also be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element but intervening elements may also be present, unless expressly stated otherwise. Furthermore, “connected” or “coupled” as used herein may include wirelessly connected or coupled. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. The steps of any disclosed method is not limited to the exact order stated herein, unless expressly stated otherwise.
It should be appreciated that reference throughout this specification to “one embodiment” or “an embodiment” or “an aspect” or features included as “may” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the disclosure. The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects.
The claims are not intended to be limited to the aspects shown herein, but is to be accorded the full scope consistent with the language of the claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more.
Accordingly, the scope should be judged in terms of the claims that follow.
Number | Date | Country | Kind |
---|---|---|---|
17195685.7 | Oct 2017 | EP | regional |