The present disclosure relates to a hearing system, e.g. comprising one or more hearing devices, e.g. headsets, earphones or hearing aids, in particular to individualization of a multi-channel noise reduction system exploiting and extending a database comprising a dictionary of acoustic transfer functions, e.g. relative acoustic transfer functions (RATF). The present disclosure further relates to an equivalent method of operating a hearing system.
An essential part of a multi-channel noise reduction systems (such as minimum variance distortionless response (MVDR), Multichannel Wiener Filter (MWF), etc.) in hearing devices is to have access to relative acoustic transfer function RATF for the source of interest. Any mismatch between the true RATF and the RATF employed in the noise reduction system may lead to distortion and/or suppression of the signal of interest.
A Hearing System:
In an aspect of the present application, a hearing system (e.g. comprising at least one hearing device, e.g. a hearing aid) configured to be worn by a user. The hearing system comprises
The processor may be configured
The processor may be further configured to
Thereby an improved noise reduction system may be provided.
The present disclosure relates to dynamically estimating appropriate acoustic transfer functions during use of a hearing device, e.g. to account for possible changes in distances between microphones, different placement of the hearing device on the user's head, resulting in different locations of the microphones relative to a target sound source (e.g. the use user's mouth), etc. The term ‘current values of the electric input signals’ is intended to mean values of the signals during (normal) use of the hearing system.
The term ‘unconstrained’ is in the present context taken to mean that the estimate of a current value of an acoustic transfer function vector (ATFuc,cur) is independent of the stored (previously determined) values of acoustic transfer function vectors (ATFpd) of the dictionary (Δpd). The unconstrained estimate of a current acoustic transfer function vector (ATFuc,cur) depends on current values of at least one (e.g. all) of the current electric input signals from the M microphones, and optionally on current values of other signals (e.g. from a contralateral hearing device, and/or from one or more detectors or sensors).
The term ‘constrained’ is in the present context taken to mean that the estimate of a current value of an acoustic transfer function vector (ATFpd,cur) is dependent of stored (previously determined) values of acoustic transfer function vectors (ATFpd) of the dictionary (Δpd).
The ‘unconstrained’ estimate of a current value of an acoustic transfer function vector (ATFuc,cur) as well as the ‘constrained’ estimate of a current value of an acoustic transfer function vector (ATFpd,cur) are in the present context both (automatically) determined by the hearing device during (normal) use of the hearing device (e.g. when mounted on the user as intended, and powered up in a mode intended for use).
The confidence measure may be related to the target sound signal impinging on the microphone system, e.g. to an estimated quality of the target sound signal. The confidence measure may be related to the target signal (as captured from the target sound source by the microphone system), e.g. to an estimated quality of the target signal.
The confidence measure is intended to be automatically determined by the hearing aid during (normal) use.
The (hearing system may be configured to provide that the) confidence measure may comprise at least one of
The hearing system may comprise a target signal quality estimator configured to provide said target-signal-quality-measure indicative of a signal quality of a target signal from said target sound source in dependence of (current values of) at least one of said M electric input signals or a signal or signals originating therefrom.
The signal quality estimator may be constituted by or comprises a signal-to-noise-ratio estimator. The target signal quality measure may be a signal-to-noise-ratio (SNR) of at least one of the (current values of the) M electric input signals or a signal or signals originating therefrom (e.g. a beamformed signal). The signal-to-noise-ratio (SNR) estimator may e.g. rely on the identification of a target signal source, e.g. comprising speech (e.g. from a particular direction). The signal-to-noise-ratio (SNR) estimator may e.g. comprise a voice activity detector, allowing to estimate whether or not (or with what probability) an input signal comprises a voice signal (at a given point in time). Thereby a noise level can be estimated during speech pauses. A signal-to-noise-ratio (SNR) estimator is e.g. disclosed in US20190378531A1.
Other signal quality estimators may e.g. be based on signal level estimation, speech intelligibility estimation, modulation index estimation, etc.
The hearing system may comprise an ATF-vector-comparator configured to provide an acoustic-transfer-function-vector-matching-measure indicative of a degree of matching of the constrained estimate and the unconstrained estimate of a current acoustic transfer function vector (ATFpd,cur, ATFuc,cur), respectively. The ATF-vector-comparator may be configured to apply a distance measure (e.g. an Euclidian distance) to the respective ATF-vectors, e.g. to compare a distance between coordinates of their end-points assuming identical starting points of the two vectors (or vice versa).
The hearing system may comprise a location estimator configured to provide said target-sound-source-location-identifier. The location estimator may be configured to provide the target-sound-source-location-identifier in dependence of at least one of
The unconstrained estimate of the current acoustic transfer function vector (ATFuc,cur) may be used as the resulting acoustic transfer function vector (ATF*) (for said user), if a first criterion depending on said target-signal-quality-measure is fulfilled. The hearing device may be configured to provide that the constrained estimate of the current acoustic transfer function vector (ATFpd,cur) is used as the resulting acoustic transfer function vector (ATF*) for the user, if the first criterion depending on said target-signal-quality-measure is NOT fulfilled.
The first criterion may e.g. comprise that the target signal quality measure (TQM) is larger than a first threshold value (TQMth1).
The unconstrained estimate of the current acoustic transfer function vector (ATFuc,cur) may be used as the resulting acoustic transfer function vector (ATF*) (for said user), if a first criterion depending on said acoustic-transfer-function-vector-matching-measures is fulfilled. The first criterion may e.g. comprise that the acoustic-transfer-function-vector-matching-measure (ATF-MMuc) for the unconstrained estimate of a current acoustic transfer function vector is larger than the acoustic-transfer-function-vector-matching-measure (ATF-MMpd) for the constrained estimate of a current acoustic transfer function vector, e.g. the difference is larger than a minimum value (e.g. ΔATF=ATF-MMuc−ATF-MMpd)≥10%, e.g. 10% of ATF-MMpd). A large value of a respective acoustic-transfer-function-vector-matching-measure (ATF-MMuc, ATF-MMpd) is intended to reflect a high degree of matching. The acoustic-transfer-function-vector-matching-measure(s) may assume values between 0 and 1 and reflect a degree of matching (‘1’ being e.g. associated with perfect matching).
The first criterion may depend on the target-signal-quality-measure AND the acoustic-transfer-function-vector-matching-measures.
The first criterion may depend on the target-signal-quality-measure AND the target-sound-source-location-identifier.
The first criterion may depend on the acoustic-transfer-function-vector-matching-measures AND the target-sound-source-location-identifier.
The first criterion may depend on the target-signal-quality-measure AND the acoustic-transfer-function-vector-matching-measures AND the target-sound-source-location-identifier.
The resulting acoustic transfer function vector (ATF*) for the user may be determined as a mixture of said constrained estimate of the current acoustic transfer function vector (ATFpd,cur) and said unconstrained estimate of the current acoustic transfer function vector (ATFuc,cur) in dependence of said target signal quality measure and/or said acoustic-transfer-function-vector-matching-measure. The mixture may be a weighted mixture. The target signal quality measure (TQM) and/or the acoustic-transfer-function-vector-matching-measures (ATF-MMuc, ATF-MMpd) may be normalized (N) to take on values only in an interval between 0 and 1 (i.e. 0≤TQMN≤1; 0≤ATF-MMuc,N≤1; 0≤ATF-MMpd,N≤1), where 1 represents a high signal quality or degree of matching and 0 represents a low target signal quality or degree of matching, respectively. The resulting acoustic transfer function vector (ATF*) (for given electric input signals at a given point in time) may e.g. be determined as ATF*=ATFuc,cur·TQMN+ATFpd,cur·(1−TQMN), when the mixture is exemplified by a dependence of the target signal quality measure (TQMN) (only).
The database (Θ) may comprise a sub-dictionary (Δpd,std) of previously determined, standard acoustic transfer function vectors (ATFpd,std). The sub-dictionary (Δpd,std) of previously determined, standard acoustic transfer function vectors (ATFpd,std) may e.g. comprise non-personalized acoustic transfer function vectors, e.g. from a standard database (like the KEMAR HRTF database of [Gardner and Martin, 1994]), e.g. recorded using a model of a human head (e.g. the Head and Torso Simulator (HATS) 4128C from Brüel & Kjær Sound & Vibration Measurement A/S, or the head and torso model KEMAR from GRAS Sound and Vibration A/S, or recorded on one or more natural persons (e.g. not including the user), or a mixture thereof.
The unconstrained estimate of the current acoustic transfer function vector (ATFuc,cur) may be stored in a sub-dictionary (Δpd,tr) of said database, if a second criterion is fulfilled. The second criterion may e.g. depend on the target signal quality measure and/or the acoustic-transfer-function-vector-matching-measure (and possibly further parameters, e.g. the target-sound-source-location-identifier). The second criterion may e.g. comprise that the target signal quality measure is larger than a second threshold value (TQMth2). The first and second criteria may be identical (e.g. in that TQMth2=TQMth1). The first and second criteria may, however, be different. The second criterion may e.g. be more restrictive than the first criterion (e.g. in that the second threshold value is larger than the first threshold value, TQMth1>TQMth1). The unconstrained estimate of a current acoustic transfer function vector (ATFuc,cur) is not stored in the database in case the criterion, e.g. the criterion depending on the target signal quality measure (TQM), is not fulfilled (e.g. if the target signal quality measure (TQM) is smaller than the second threshold value). The unconstrained estimate of a current acoustic transfer function vector (ATFuc,cur), e.g. a relative acoustic transfer function, RATFuc,cur, e.g. estimated at high SNRs (e.g. SNR>30 dB, or SNR>40 dB), may e.g. be stored as a new dictionary element ATFpd,tr, which will then be available as a plausible acoustic transfer function (ATF, e.g. a RATF) in the dictionary Δpd of stored (previously determined) acoustic transfer function vectors (ATFpd). The dictionary Δpd hence comprises sub-dictionaries (Δpd,std) (standard (std), non-personalized) and (Δpd,tr) (which are personalized, ‘trained’ (tr), cf. e.g.
The dictionary elements that are allowed to be updated (trained (ATFpd,tr)) can hence be regarded as additional dictionary elements (of an (adaptively changing) sub-dictionary (Δpd,tr)). In other words, a base of (possibly predetermined, standard) dictionary elements (ATFpd,std) of a sub-dictionary (Δpd,std) may always kept, while dictionary elements (ATFpd,tr) of a sub-dictionary (Δpd,tr) are allowed to be updated/generated. The keeping of the elements of sub-dictionary (Δpd,std) may be practical in order to guarantee reasonable performance, even if erroneous dictionary elements are included in the adaptively updated (personalized) sub-dictionary (Δpd,tr).
The unconstrained estimate of the current acoustic transfer function vector (ATFtr,cur) may be assigned a target location (θ*j) in dependence of its proximity to the existing dictionary elements (ATFpd(θj)). The unconstrained estimate of the current acoustic transfer function vector (ATFtr,cur) may e.g. be assigned the target location (θ*j) of the existing dictionary element (ATFpd(θ*j)) that has the smallest difference to the unconstrained estimate of the current acoustic transfer function vector (ATFtr,cur). The distance may e.g. be determined as or based on the mean-square error (MSE), or other distance measures allowing a ranking of vectors in order of similarity (proximity). The current acoustic transfer function vector (ATFtr,cur) may be assigned a target location (θ*j) in dependence of its proximity to the existing dictionary elements (ATFpd(θ*j)) being smaller than a threshold value.
A target location (θ*) of the target sound source of current interest to the user may be independently estimated for the unconstrained estimate of the current acoustic transfer function vector (ATFtr,cur). The target location (θ*) of the target sound source of current interest to the user may be estimated by prior art sound source location algorithms. The target location (θ*) of the target sound source of current interest to the user may alternatively or additionally be indicated by the user via a user interface. The target location (θ*) may be fed to one or more algorithms of the processor.
The previously determined acoustic transfer function vectors (ATFpd) of the dictionary (Δpd) may be ranked in dependence of their frequency of use. The processor may be configured to log the use of the previously determined acoustic transfer function vectors (ATFpd) of the dictionary (Δpd) (and thus be able to provide a (historic) frequency of use at a given time). The processor may be configured to log the use of the previously determined (personalized) additional dictionary elements (ATFpd,tr) of the sub-dictionary (Δpd,tr) (and thus be able to provide a (historic) frequency of use at a given time). Thereby an improved scheme for storing new dictionary elements in the sub-dictionary (Δpd,tr) may be provided. The lowest ranking elements may e.g. be deleted, when a certain number of elements have been included in the personalized sub-dictionary (Δpd,tr). Thereby a qualified criterion may be provided to limit the number of additional elements in the personalized sub-dictionary (Δpd,tr). The processor may further be configured to provide a frequency of use of the previously determined (standard) dictionary elements (ATFpd,std) of the sub-dictionary (Δpd,std). A comparison of the frequency of use of corresponding dictionary elements of the standard and personalized sub-dictionaries (Δpd,std, Δpd,tr) may be provided (e.g. logged). Based thereon conclusions regarding the relevance of the standard and/or personalized elements can be drawn.
The number of elements in the standard and personalized sub-dictionaries (Δpd,std, Δpd,tr) may e.g. be controlled via the ranking procedure. The lowest ranking elements (e.g. elements being ranked below a certain number of maximum stored elements (either in total, or per sub-dictionary)) may e.g. be deleted. This clean-up process may be automatically or manually executed, the latter e.g. performed by the user or by a hearing care professional).
Frequency of use (or ranking based thereon) may be used for labelling the dictionary elements the standard and personalized sub-dictionaries (Δpd,std, Δpd,tr), e.g. instead of or in addition to the location parameter (θ).
Other measures for labelling the dictionary elements may be used, however. Such other measure may e.g. be proximity to existing dictionary elements. Proximity between acoustic transfer function vectors may e.g. be determined by comparing their respective ‘directions’ and possibly length (in an M-dimensional space, e.g. 2 or 3 or higher for M=2 or 3 or higher). Criteria for including or not may relate to a degree of diversity (vectors that are parallel to an existing vector and possibly having the same length may e.g. not stored, whereas vectors that are orthogonal to existing vectors of the dictionary may be stored. Criteria therebetween for storing or not storing new dictionary elements may be envisioned.
In an embodiment, the (standard) dictionary (Δpd) may be empty from the beginning of its use, so that all dictionary elements are learned during use. This may e.g. be relevant for applications for which an estimated ‘personalization’ is difficult to provide, e.g. for a speakerphone that should be adapted to a specific location (e.g. a room).
The acoustic transfer function vectors (ATF) of the database (Θ) may be or comprise relative acoustic transfer function vectors (RATF).
The hearing system may comprise at least one hearing device configured to be worn on the head at or in an ear of a user of the hearing system. The hearing system may be wearable by the user, e.g. adapted to be worn on the head of the user.
The hearing system or the hearing device may be constituted by or comprise an air-conduction type hearing aid, a bone-conduction type hearing aid, a cochlear implant type hearing aid, or a combination thereof. The output unit may comprise an output transducer, e.g. a loudspeaker of an air-conduction type hearing aid, or a vibrator of a bone conduction type hearing aid. The output unit may comprise a multi-electrode of a cochlear implant type hearing aid for electric stimulation of the cochlear nerve.
The hearing system or the hearing device may be constituted by or comprise a hearing aid or a headset, or a combination thereof. The output unit may be configured to provide a stimulus perceivable by the user as an acoustic signal in dependence of the processed signal (e.g. in a hearing aid). The output unit may comprise a transmitter for transmitting the processed signal to another device or system (e.g. in a headset, or in a telephone mode of a hearing aid).
The hearing system may comprise left and right hearing devices and comprise antenna and transceiver circuitry configured to allow an exchange of data between the left and right hearing devices. The hearing system may comprise or constitute a binaural hearing system, e.g. a binaural hearing aid system.
The unconstrained estimate of the current acoustic transfer function vector (ATFuc,cur) is determined in each of the left and right hearing devices and stored in said database(s) jointly in dependence of a common criterion regarding at least one of said target signal quality measure(s), said acoustic-transfer-function-vector-matching-measure, and said target-sound-source-location-identifier.
In a further aspect, a hearing system comprising a hearing device as described above, in the ‘detailed description of embodiments’, and in the claims, AND an auxiliary device is moreover provided.
The hearing system may be adapted to establish a communication link between the hearing device and the auxiliary device to provide that information (e.g. control and status signals, possibly audio signals) can be exchanged or forwarded from one to the other.
The auxiliary device may comprise a remote control, a smartphone, or other portable or wearable electronic device, such as a smartwatch or the like.
The auxiliary device may be constituted by or comprise a remote control for controlling functionality and operation of the hearing device(s). The function of a remote control may be implemented in a smartphone, the smartphone possibly running an APP allowing to control the functionality of the hearing device or hearing system via the smartphone (the hearing device(s) comprising an appropriate wireless interface to the smartphone, e.g. based on Bluetooth or some other standardized or proprietary scheme).
The auxiliary device may be constituted by or comprise an audio gateway device adapted for receiving a multitude of audio signals (e.g. from an entertainment device, e.g. a TV or a music player, a telephone apparatus, e.g. a mobile telephone or a computer, e.g. a PC) and adapted for selecting and/or combining an appropriate one of the received audio signals (or combination of signals) for transmission to the hearing device.
The hearing device, e.g. a hearing aid, may be adapted to provide a frequency dependent gain and/or a level dependent compression and/or a transposition (with or without frequency compression) of one or more frequency ranges to one or more other frequency ranges, e.g. to compensate for a hearing impairment of a user. The hearing device may comprise a signal processor for enhancing the input signals and providing a processed output signal.
The hearing device may comprise an output unit for providing a stimulus perceived by the user as an acoustic signal based on a processed electric signal. The output unit may comprise a number of electrodes of a cochlear implant (for a CI type hearing aid) or a vibrator of a bone conducting hearing aid. The output unit may comprise an output transducer. The output transducer may comprise a receiver (loudspeaker) for providing the stimulus as an acoustic signal to the user (e.g. in an acoustic (air conduction based) hearing aid or a headset). The output transducer may comprise a vibrator for providing the stimulus as mechanical vibration of a skull bone to the user (e.g. in a bone-attached or bone-anchored hearing aid).
The hearing device may comprise an input unit for providing an electric input signal representing sound. The input unit may comprise an input transducer, e.g. a microphone, for converting an input sound to an electric input signal. The input unit may comprise a wireless receiver for receiving a wireless signal comprising or representing sound and for providing an electric input signal representing said sound. The wireless receiver may e.g. be configured to receive an electromagnetic signal in the radio frequency range (3 kHz to 300 GHz). The wireless receiver may e.g. be configured to receive an electromagnetic signal in a frequency range of light (e.g. infrared light 300 GHz to 430 THz, or visible light, e.g. 430 THz to 770 THz).
The hearing device may comprise a directional microphone system adapted to spatially filter sounds from the environment, and thereby enhance a target acoustic source among a multitude of acoustic sources in the local environment of the user wearing the hearing device. The directional system may be adapted to detect (such as adaptively detect) from which direction a particular part of the microphone signal originates. This can be achieved in various different ways as e.g. described in the prior art. In hearing aids or headsets, a microphone array beamformer is often used for spatially attenuating background noise sources. Many beamformer variants can be found in literature. The minimum variance distortionless response (MVDR) beamformer is widely used in microphone array signal processing. Ideally the MVDR beamformer keeps the signals from the target direction (also referred to as the look direction) unchanged, while attenuating sound signals from other directions maximally. The generalized sidelobe canceller (GSC) structure is an equivalent representation of the MVDR beamformer offering computational and numerical advantages over a direct implementation in its original form.
The hearing device may comprise antenna and transceiver circuitry allowing a wireless link to an entertainment device (e.g. a TV-set), a communication device (e.g. a telephone), a wireless microphone, or another hearing device, etc. The hearing device may thus be configured to wirelessly receive a direct electric input signal from another device. Likewise, the hearing device may be configured to wirelessly transmit a direct electric output signal to another device. The direct electric input or output signal may represent or comprise an audio signal and/or a control signal and/or an information signal.
In general, a wireless link established by antenna and transceiver circuitry of the hearing device can be of any type. The wireless link may be a link based on near-field communication, e.g. an inductive link based on an inductive coupling between antenna coils of transmitter and receiver parts. The wireless link may be based on far-field, electromagnetic radiation. Preferably, frequencies used to establish a communication link between the hearing aid and the other device is below 70 GHz, e.g. located in a range from 50 MHz to 70 GHz, e.g. above 300 MHz, e.g. in an ISM range above 300 MHz, e.g. in the 900 MHz range or in the 2.4 GHz range or in the 5.8 GHz range or in the 60 GHz range (ISM=Industrial, Scientific and Medical, such standardized ranges being e.g. defined by the International Telecommunication Union, ITU). The wireless link may be based on a standardized or proprietary technology. The wireless link may be based on Bluetooth technology (e.g. Bluetooth Low-Energy technology), or Ultra WideBand (UWB) technology.
The hearing device may constitute or form part of a portable (i.e. configured to be wearable) device, e.g. a device comprising a local energy source, e.g. a battery, e.g. a rechargeable battery. The hearing device may e.g. be a low weight, easily wearable, device, e.g. having a total weight less than 500 g (e.g. a headset), e.g. less than 100 g, such as less than 20 g (e.g. a hearing aid). The hearing device may e.g. have maximum dimensions less than 0.2 m, e.g. less than 0.1 m, such as less than 0.05 m.
The hearing device may comprise a ‘forward’ (or ‘signal’) path for processing an audio signal between an input and an output of the hearing device. A signal processor may be located in the forward path. The signal processor may be adapted to provide a frequency dependent gain according to a user's particular needs (e.g. hearing impairment) and/or to improve a target signal in a noisy environment. The hearing device may comprise an ‘analysis’ path comprising functional components for analyzing signals and/or controlling processing of the forward path.
The hearing device (e.g. a headset) may comprise a ‘microphone path’ (e.g. for transmitting a sound picked up by the microphone(s) to a remote device) and a (e.g. separate) ‘loudspeaker path’ (e.g. for receiving an audio signal from a remote device and play it for the user). Some or all signal processing of the analysis path and/or the forward path and/or the microphone and/or loudspeaker paths may be conducted in the frequency domain, in which case the hearing aid comprises appropriate analysis and synthesis filter banks. Some or all signal processing of the analysis path and/or the forward path and/or the microphone and/or loudspeaker paths may be conducted in the time domain.
An analogue electric signal representing an acoustic signal may be converted to a digital audio signal in an analogue-to-digital (AD) conversion process, where the analogue signal is sampled with a predefined sampling frequency or rate fs, fs being e.g. in the range from 8 kHz to 48 kHz (adapted to the particular needs of the application) to provide digital samples xn (or x[n]) at discrete points in time tn (or n), each audio sample representing the value of the acoustic signal at tn by a predefined number Nb of bits, Nb being e.g. in the range from 1 to 48 bits, e.g. 24 bits. Each audio sample is hence quantized using Nb bits (resulting in 2Nb different possible values of the audio sample). A digital sample x has a length in time of 1/fs, e.g. 50 μs, for fs=20 kHz. A number of audio samples may be arranged in a time frame. A time frame may comprise 64 or 128 audio data samples. Other frame lengths may be used depending on the practical application.
The hearing device may comprise an analogue-to-digital (AD) converter to digitize an analogue input (e.g. from an input transducer, such as a microphone) with a predefined sampling rate, e.g. 20 kHz. The hearing device may comprise a digital-to-analogue (DA) converter to convert a digital signal to an analogue output signal, e.g. for being presented to a user via an output transducer.
The hearing device, e.g. the input unit, and or the antenna and transceiver circuitry may comprise a transform unit for converting a time domain signal to a signal in the transform domain (e.g. frequency domain or Laplace domain, etc.). The transform unit may be constituted by or comprise a TF-conversion unit for providing a time-frequency representation of an input signal. The time-frequency representation may comprise an array or map of corresponding complex or real values of the signal in question in a particular time and frequency range. The TF conversion unit may comprise a filter bank for filtering a (time varying) input signal and providing a number of (time varying) output signals each comprising a distinct frequency range of the input signal. The TF conversion unit may comprise a Fourier transformation unit (e.g. a Discrete Fourier Transform (DFT) algorithm, or a Short Time Fourier Transform (STFT) algorithm, or similar) for converting a time variant input signal to a (time variant) signal in the (time-)frequency domain. The frequency range considered by the hearing aid from a minimum frequency fmin to a maximum frequency fmax may comprise a part of the typical human audible frequency range from 20 Hz to 20 kHz, e.g. a part of the range from 20 Hz to 12 kHz. Typically, a sample rate fs is larger than or equal to twice the maximum frequency fmax, fs≥2fmax. A signal of the forward and/or analysis path of the hearing aid may be split into a number NI of frequency bands (e.g. of uniform width), where NI is e.g. larger than 5, such as larger than 10, such as larger than 50, such as larger than 100, such as larger than 500, at least some of which are processed individually. The hearing aid may be adapted to process a signal of the forward and/or analysis path in a number NP of different frequency channels (NP≤NI). The frequency channels may be uniform or non-uniform in width (e.g. increasing in width with frequency), overlapping or non-overlapping.
The hearing device may be configured to operate in different modes, e.g. a normal mode and one or more specific modes, e.g. selectable by a user, or automatically selectable. A mode of operation may be optimized to a specific acoustic situation or environment. A mode of operation may include a low-power mode, where functionality of the hearing aid is reduced (e.g. to save power), e.g. to disable wireless communication, and/or to disable specific features of the hearing device.
The hearing device may comprise a number of detectors configured to provide status signals relating to a current physical environment of the hearing device (e.g. the current acoustic environment), and/or to a current state of the user wearing the hearing device, and/or to a current state or mode of operation of the hearing device. Alternatively or additionally, one or more detectors may form part of an external device in communication (e.g. wirelessly) with the hearing device. An external device may e.g. comprise another hearing device (e.g. another hearing aid or another earpiece of a headset), a remote control, and audio delivery device, a telephone (e.g. a smartphone), an external sensor, etc.
One or more of the number of detectors may operate on the full band signal (time domain). One or more of the number of detectors may operate on band split signals ((time-) frequency domain), e.g. in a limited number of frequency bands.
The number of detectors may comprise a level detector for estimating a current level of a signal of the forward path. The detector may be configured to decide whether the current level of a signal of the forward path is above or below a given (L-)threshold value. The level detector operates on the full band signal (time domain). The level detector operates on band split signals ((time-) frequency domain).
The hearing device may comprise a voice activity detector (VAD) for estimating whether or not (or with what probability) an input signal comprises a voice signal (at a given point in time). A voice signal may in the present context be taken to include a speech signal from a human being. It may also include other forms of utterances generated by the human speech system (e.g. singing). The voice activity detector unit may be adapted to classify a current acoustic environment of the user as a VOICE or NO-VOICE environment. This has the advantage that time segments of the electric microphone signal comprising human utterances (e.g. speech) in the user's environment can be identified, and thus separated from time segments only (or mainly) comprising other sound sources (e.g. artificially generated noise). The voice activity detector may be adapted to detect as a VOICE also the user's own voice. Alternatively, the voice activity detector may be adapted to exclude a user's own voice from the detection of a VOICE.
The hearing device may comprise an own voice detector for estimating whether or not (or with what probability) a given input sound (e.g. a voice, e.g. speech) originates from the voice of the user of the system. A microphone system of the hearing device may be adapted to be able to differentiate between a user's own voice and another person's voice and possibly from NON-voice sounds.
The number of detectors may comprise a movement detector, e.g. an acceleration sensor. The movement detector may be configured to detect movement of the user's facial muscles and/or bones, e.g. due to speech or chewing (e.g. jaw movement) and to provide a detector signal indicative thereof.
The hearing device may comprise a classification unit configured to classify the current situation based on input signals from (at least some of) the detectors, and possibly other inputs as well. In the present context ‘a current situation’ may be taken to be defined by one or more of
The classification unit may be based on or comprise a neural network, e.g. a trained neural network, e.g. a recurrent neural network, such as a gated recurrent unit (GRU).
The hearing device may comprise an acoustic (and/or mechanical) feedback control (e.g. suppression) or echo-cancelling system. Adaptive feedback cancellation has the ability to track feedback path changes over time. It is typically based on a linear time invariant filter to estimate the feedback path but its filter weights are updated over time. The filter update may be calculated using stochastic gradient algorithms, including some form of the Least Mean Square (LMS) or the Normalized LMS (NLMS) algorithms. They both have the property to minimize the error signal in the mean square sense with the NLMS additionally normalizing the filter update with respect to the squared Euclidean norm of some reference signal.
The hearing device may further comprise other relevant functionality for the application in question, e.g. level compression, noise reduction, active noise cancellation, etc.
The hearing device may comprise a hearing instrument, e.g. a hearing instrument adapted for being located at the ear or fully or partially in the ear canal of a user, e.g. a headset, an earphone, an ear protection device or a combination thereof. A hearing system may comprise a speakerphone (comprising a number of input transducers and a number of output transducers, e.g. for use in an audio conference situation), e.g. comprising a beamformer filtering unit, e.g. providing multiple beamforming capabilities.
Use:
In an aspect, use of a hearing device as described above, in the ‘detailed description of embodiments’ and in the claims, is moreover provided. Use may be provided in a system comprising one or more hearing devices (e.g. hearing instruments), headsets, ear phones, active ear protection systems, etc., e.g. in handsfree telephone systems, teleconferencing systems (e.g. including a speakerphone), public address systems, karaoke systems, classroom amplification systems, etc.
A Method:
In an aspect, a method of operating a hearing system, e.g. comprising at least one hearing device configured to be worn on the head at or in an ear of a user is furthermore provided by the present application. The hearing system may comprise a microphone system comprising a multitude of M of microphones, where M is larger than or equal to two, the microphone system being adapted for picking up sound from the environment, and an output unit for providing an output signal in dependence of a processed signal.
The method may comprise
The method may further comprise
The method may further comprise
It is intended that some or all of the structural features of the device described above, in the ‘detailed description of embodiments’ or in the claims can be combined with embodiments of the method, when appropriately substituted by a corresponding process and vice versa. Embodiments of the method have the same advantages as the corresponding devices.
The method may comprise that the confidence measure (is determined by said hearing system and) comprises at least one of
A Computer Readable Medium or Data Carrier:
In an aspect, a tangible computer-readable medium (a data carrier) storing a computer program comprising program code means (instructions) for causing a data processing system (a computer) to perform (carry out) at least some (such as a majority or all) of the (steps of the) method described above, in the ‘detailed description of embodiments’ and in the claims, when said computer program is executed on the data processing system is furthermore provided by the present application.
By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Other storage media include storage in DNA (e.g. in synthesized DNA strands). Combinations of the above should also be included within the scope of computer-readable media. In addition to being stored on a tangible medium, the computer program can also be transmitted via a transmission medium such as a wired or wireless link or a network, e.g. the Internet, and loaded into a data processing system for being executed at a location different from that of the tangible medium.
A Computer Program:
A computer program (product) comprising instructions which, when the program is executed by a computer, cause the computer to carry out (steps of) the method described above, in the ‘detailed description of embodiments’ and in the claims is furthermore provided by the present application.
A Data Processing System:
In an aspect, a data processing system comprising a processor and program code means for causing the processor to perform at least some (such as a majority or all) of the steps of the method described above, in the ‘detailed description of embodiments’ and in the claims is furthermore provided by the present application.
An APP:
In a further aspect, a non-transitory application, termed an APP, is furthermore provided by the present disclosure. The APP comprises executable instructions configured to be executed on an auxiliary device to implement a user interface for a hearing device or a hearing system described above in the ‘detailed description of embodiments’, and in the claims. The APP may be configured to run on cellular phone, e.g. a smartphone, or on another portable device allowing communication with said hearing aid or said hearing system.
Embodiments of the disclosure may e.g. be useful in applications such as hearing aids or headsets or table- or wireless microphones or microphone systems, e.g. speakerphones.
The aspects of the disclosure may be best understood from the following detailed description taken in conjunction with the accompanying figures. The figures are schematic and simplified for clarity, and they just show details to improve the understanding of the claims, while other details are left out. Throughout, the same reference numerals are used for identical or corresponding parts. The individual features of each aspect may each be combined with any or all features of the other aspects. These and other aspects, features and/or technical effect will be apparent from and elucidated with reference to the illustrations described hereinafter in which:
The figures are schematic and simplified for clarity, and they just show details which are essential to the understanding of the disclosure, while other details are left out. Throughout, the same reference signs are used for identical or corresponding parts.
Further scope of applicability of the present disclosure will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the disclosure, are given by way of illustration only. Other embodiments may become apparent to those skilled in the art from the following detailed description.
The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. Several aspects of the apparatus and methods are described by various blocks, functional units, modules, components, circuits, steps, processes, algorithms, etc. (collectively referred to as “elements”). Depending upon particular application, design constraints or other reasons, these elements may be implemented using electronic hardware, computer program, or any combination thereof.
The electronic hardware may include micro-electronic-mechanical systems (MEMS), integrated circuits (e.g. application specific), microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), gated logic, discrete hardware circuits, printed circuit boards (PCB) (e.g. flexible PCBs), and other suitable hardware configured to perform the various functionality described throughout this disclosure, e.g. sensors, e.g. for sensing and/or registering physical properties of the environment, the device, the user, etc. Computer program shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.
The present disclosure relates to a wearable hearing system comprising one or more hearing devices, e.g. headsets or hearing aids. The present disclosure relates in particular to individualization of a multi-channel noise reduction system exploiting and extending a database comprising a dictionary of acoustic transfer functions, e.g. relative acoustic transfer functions (RATF).
The human ability to spatially localize a sound source is to a large extent dependent on perception of the sound at both ears. Due to different physical distances between the sound source and the left and right ears, a difference in time of arrival of a given wavefront of the sound at the left and right ears is experienced (the Interaural Time Difference, ITD). Consequently, a difference in phase of the sound signal (at a given point in time) will likewise be experienced and in particular perceivable at relatively low frequencies (e.g. below 1500 Hz). Due to the shadowing effect of the head (diffraction), a difference in level of the received sound signal at the left and right ears is likewise experienced (the Interaural Level Difference, ILD). The attenuation by the head (and body) is larger at relatively higher frequencies (e.g. above 1500 Hz). The detection of the cues provided by the ITD and ILD largely determine our ability to localize a sound source in a horizontal plane (i.e. perpendicular to a longitudinal direction of a standing person). The diffraction of sound by the head (and body) is described by the Head Related Transfer Functions (HRTF). The HRTF for the left and right ears ideally describe respective transfer functions from a sound source (from a given location) to the ear drums of the left and right ears. If correctly determined, the HRTFs provide the relevant ITD and ILD between the left and right ears for a given direction of sound relative to the user's ears. Such HRTFleft and HRTFright are preferably applied to a sound signal received by a left and right hearing assistance device in order to improve a user's sound localization ability (cf. e.g. Chapter 14 of [Dillon; 2001]).
Several methods of generating HRTFs are known. Standard HRTFs from a dummy head can e.g. be provided, as e.g. derived from the KEMAR HRTF database of [Gardner and Martin, and applied to sound signals received by left and right hearing assistance devices of a specific user. Alternatively, a direct measurement of the user's HRTF, e.g. during a fitting session can—in principle—be performed, and the results thereof be stored in a memory of the respective (left and right) hearing assistance devices. During use, e.g. in case the hearing assistance device is of the Behind The Ear (BTE) type, where the microphone(s) that pick up the sound typically are located near the top of (and often, a little behind) pinna, a direction of impingement of the sound source may be determined by each device, and the respective relative HRTFs applied to the (raw) microphone signal to (re)establish the relevant localization cues in the signal presented to the user, cf. e.g. EP2869599A1.
An essential part of a multi-channel noise reduction systems (such as minimum variance distortionless response (MVDR), Multichannel Wiener Filter (MWF), etc.) in hearing devices is to have access to relative acoustic transfer function RATF for the source of interest. Any mismatch between the true RATF and the RATF employed in the noise reduction system may lead to distortion and/or suppression of the signal of interest.
A first method (‘Method 1’) to find the RATF that is associated with the source signal of interest is the selection of a RATF from a dictionary of plausible (previously determined) RATFs. This method is referred to as constrained maximum likelihood RATF estimation [1,2]. For all the (previously determined (pd)) RATFs (RATFpd) in the database, the likelihood that a source of interest can be associated with a specific RATF is calculated based on the microphone input(s). The RATF (among the multitude of RATFs (RATFpd) of the data base) which is associated with the maximum likelihood is then selected as the current acoustic transfer function (RATFpd,cur) for the current electric input signal(s).
The advantage of this (first) method is good performance even in acoustic environments of poor target signal quality (e.g. low SNR) because the selected RATF (RATFpd,cur) is always a plausible RATF. Another advantage is that prior information may be used for the RATF selection, for example if some target directions are more likely than others (e.g. in dependence of a sensor or detector, e.g. an own voice detector, e.g. in case the user's own voice is the target signal).
The disadvantage is that the dictionary elements need to be known beforehand and are typically measured on a mannequin (e.g. a head and torso model). Even though the RATFs (RATFpd,std) measured on the mannequin are plausible, they may differ from the true RATFs due to differences in the acoustics due to difference in the wearer's anatomy, and/or device placement.
The second method (‘Method 2’) of RATF estimation is unconstrained which means that any RATF may be estimated from the input data. A maximum likelihood estimator is e.g. provided by the covariance whitening method (see e.g. [3,4]). The second, unconstrained RATF estimation method may e.g. comprise an estimator of the noisy input- and noise-only-covariance matrices, where the latter requires a target speech activity detector (to separate noise-only parts from noisy parts). Furthermore, the method may comprise an eigenvalue decomposition of the noise-only covariance matrix which is used to “whiten” the noisy input covariance matrix. The results may finally be used to compute the maximum likelihood estimate of the RATF. Any RATF may be found by this method, under the condition that the target signal is active in the input signals. Unconstrained HRTFs, e.g. RATFs, of a binaural hearing system, e.g. a binaural hearing aid system, for given electric input signals from microphones of the system may e.g. be determined as discussed in EP2869599A1.
The advantage of this (second) method is that an accurate estimate of the RATF can be found at high SNR, more accurately than with the constrained ML method (dictionary method), since it is not constrained to a finite/discrete set of dictionary elements. Further, the unconstrained acoustic transfer functions are personalized, in that they are estimated while the user wears the hearing system.
A disadvantage is that less accurate estimates are obtained in low SNR due to estimation errors, as compared to the constrained method, because the unconstrained method does not employ the prior knowledge that the RATF in question is related to a human head/mannequin—in other words, it could produce estimates which are not physically plausible.
The present disclosure proposes to combine these two methods (‘Method 1’, ‘Method 2’) into a hybrid method, in such a way that their advantages are harvested, and their disadvantages avoided.
Consider a RATF estimator that uses a pre-calibrated dictionary (cf. e.g. Δpd in
Under certain conditions (see example below) this more accurate RATF, estimated at high SNRs, can be stored as a new dictionary element which will then be available in ‘Method 1’ as a plausible RATF. We will refer to these dictionary elements as ‘trained’ (cf. e.g. Δpd and (dashed) arrow ATFuc,cur from controller (CTR3) to the data base (MEM [DB]) in
The dictionary elements that are allowed to be updated can be regarded as additional dictionary elements, i.e. a base of dictionary elements (cf. e.g. Δpd,std in
The dictionary elements may be updated jointly in both of a left and a right hearing instrument (of a binaural hearing system). A database adapted to the particular location of the left hearing device of a binaural hearing aid system (on the user's head) may be stored in the left hearing device. Likewise, a database adapted to the particular location of the right hearing device of a binaural hearing aid system (on the user's head) may be stored in the right hearing device. A database located in a separate device (e.g. a processing device in communication with the left and right hearing devices) may comprise a set of dictionary elements for the left hearing device and a corresponding set of dictionary elements for the right hearing device.
The RATFs estimated by the unconstrained method (and stored in the additional dictionary (Δpd,tr)) may (or may not) be assigned to a target location, e.g. depending on the proximity to the existing dictionary elements (which may (typically) be related to a specific target location (cf. e.g. θj). The distance may e.g. be determined as or based on the mean-squared error (MSE), or other distance measures allowing a ranking of vectors in order of similarity (proximity).
Instead of (or in addition to) assigning a location to the personalized additional dictionary elements (ATFpd,tr) of the sub-dictionary (Δpd,tr), the processor may be configured to log a frequency of use of these vectors to allow a ‘ranking’ of their use to be made. Thereby an improved scheme for storing new dictionary elements in the sub-dictionary (Δpd,tr) can be provided. The lowest ranking elements may e.g. be deleted, when a certain number of elements have been included in the personalized sub-dictionary (Δpd,tr). Thereby a qualified criterion is provided to limit the number of additional elements in the personalized sub-dictionary (Δpd,tr).
The previously determined acoustic transfer function vectors (ATFpd) of the dictionary (Δpd) may generally be ranked in dependence of their frequency of use, e.g. in that the processor logs a frequency of use of the vectors. The processor may e.g. be configured to log a frequency of use of the previously determined (standard) dictionary elements (ATFpd,std) of the sub-dictionary (Δpd,std). A comparison of the frequency of use of corresponding dictionary elements of the standard and personalized sub-dictionaries (Δpd,std, Δpd,tr) can be provided (e.g. logged). Based thereon conclusions regarding the relevance of the standard and/or personalized elements can be drawn. Elements concluded to be irrelevant may e.g. be deleted (either in an automatic process (e.g. the lowest ranking, e.g. above a certain number of stored elements, or manually, e.g. by the user or by a hearing care professional).
A dictionary Δpd of absolute and/or relative transfer functions may be determined as indicated in
To determine the relative acoustic transfer functions (RATF), e.g. RATF-vectors dθ, of the dictionary Δpd, from the corresponding absolute acoustic transfer functions (ΔATF), Hθ, the element of RATF-vector (dθ) for the mth microphone and direction (θ) is dm(k, θ)=Hm(θ, k)/Hi(θ, k), where Hi(θ,k) is the (absolute) acoustic transfer function from the given location (θ) to a reference microphone (m=i) among the M microphones of the microphone system (e.g. of a hearing instrument, or a binaural hearing system), and Hm(θ,k) is the (absolute) acoustic transfer function from the given location (θ) to the mth microphone. Such absolute and relative transfer functions (for a given artificial (e.g. a mannequin) or natural person (e.g. the user or (typically) other person)) can be estimated (e.g. in advance of the use of the hearing aid system) and stored in the dictionary Δpd as indicated above. The resulting (absolute) acoustic transfer function (ΔATF) vector He for sound from a given location (θ) to a hearing instrument or hearing system comprising M microphones may be written as
H(θ,k)=[H1(θ,k) . . . HM(θ,k)]T, k=1, . . . ,K.
The corresponding relative acoustic transfer function (RATF) vector do from this location may be written as
d(θ,k)=[d1(θ,k) . . . dM(θ,k)]T, k=1, . . . ,K.
where, di(k,θ)=1.
Target Estimation in Hearing Aids:
Classical hearing aid beamformers assume that the target of interest is in front of the hearing aid user. Beamformer systems may perform better in terms of target loss and thereby provide an SNR improvement for the user if they have access to accurate estimates of the target location.
The proposed method may use predetermined (standard) dictionary (vector) elements (ATFpd,std) measured on a mannequin (e.g. the Head and Torso Simulator (HATS) 4128C from Brüel & Kjær Sound & Vibration Measurement A/S, or the head and torso model KEMAR from GRAS Sound and Vibration A/S, or similar) as a baseline (e.g. stored in dictionary Δpd,std of the database Θ). The proposed method may further estimate more accurate (unconstrained) dictionary (vector) elements (ATFuc,cur) (e.g. RATFs) in good SNR (as estimated by an SNR estimator) and store them as dictionary elements (ATFpd,tr) given certain conditions (e.g. in a dictionary Δpd,tr of the database Θ).
An advantage is that this method can accommodate for individual acoustic properties as well as replacement effects, in both good and less good input SNR scenarios.
Example of usage in hearing aid application: A base dictionary (Δpd,std) may be given by 48 plausible RATF vectors (RATFpd,std) describing relative transfer functions of hearing aid microphones, measured on a HATS in the horizontal plane with 7.5 degrees interval (cf. e.g.
Own Voice Enhancement in Headsets (or Hearing Aids):
Beamforming is used in headsets to enhance the user's own voice in communication scenarios—hence, in this situation, the user's own voice is the signal of interest to be retrieved by a beamforming system. Microphones can be mounted at different locations in the headset. For example, multiple microphones may be mounted on a boom-arm pointing at the user's mouth, and/or multiple microphones may be mounted inside and outside of small in-ear headsets (or earpieces).
The RATFs which are needed for own voice capture may be affected by acoustic variations, such as: Individual user acoustic properties (as opposed to HATS in a calibration situation), microphone location variations due to boom arm placement, and human head movements (for example jaw movements affecting microphones placed in the ear canal).
A baseline dictionary may contain RATFs measured on a HATS in a standard boom arm placement and in a set of representative boom arm placements. The extended dictionary elements can then accommodate (for an individual user) variations and replacement variations related to the actual wearing situation, for example if the boom arm is outside the expected range of variations.
In a hearing aid, estimation of the user's own voice may also be of interest in a communication mode of operation, e.g. for transmission to a far-end communication partner (when using the hearing aid in a headset- or telephone-mode). Also, estimation of the user's own voice may be of interest in a hearing aid in connection with a voice control interface, where the user's own voice may be analysed in a keyword detector or by a speech recognition algorithm.
Hybrid Method Operation:
The RATF estimator may operate in different ways:
Rationale for Updating a Trained Dictionary Element:
In order to update a trainable dictionary element, the method needs a rationale. A straightforward rationale is when the target signal is available in good quality, e.g. when the (target) signal-to-noise-ratio (SNR) is sufficiently high, e.g. larger than a threshold value (SNRTH). A, preferably reliable/robust, target signal quality estimator, e.g. an SNR estimator may provide this. The Power Spectral Density (PSD) estimators provided by the maximum likelihood (ML) methods of e.g. [2] and [5] may e.g. be used to determine the SNR. US20190378531A1 teaches SNR-estimation.
Furthermore, the rationale may include the likelihood (cf. e.g. p(ATFuc,cur) in
The rationale may also be related to other detection algorithms, e.g., voice activity detection (VAD) algorithms, see [4] for an example (no update unless clear voice activity is detected), sound pressure level estimators (no update unless sound pressure level is within reasonable range for noise-free speech, e.g., between 55 and 70 dB SPL, cf e.g. signal voice activity control signal (V-NV) from voice activity detector (VAD) to the controller (CTR) in
A criterion for determining whether or not an estimated HRTF is plausible may be established (e.g. does it correspond to a likely direction; is within a reasonable range of values, etc.), e.g. relying on an own voice detector (OVD), or a proximity detector, or a direction-of-arrival (DOA) detector. Hereby an estimated HRTF may be dis-qualified, if it is not likely (and hence not used or not stored).
Binaural Devices:
With one device on each ear, for example hearing aids and in-ear headsets, we may exploit a binaural decision rationale for updating a trainable dictionary element.
The update criterion may be a binaural criterion, also taking into account that e.g. an otherwise plausible 45 degree HRTF is not plausible if the contralateral HRTF-angle does not correspond to a similar direction. Such differences may indicate that the hearing instruments are not correctly mounted (see also section on ‘user feedback’ below).
Comparing estimated left and right angles may e.g. reveal if the angle related to the dictionary elements agree on both sides. It could be that the angles are systematically shifted by a few degrees when comparing the left and right angles. This may indicate that the mounted instruments are not pointing towards the same direction. This bias may be taken into account when assigning the dictionary elements.
User Feedback on Device Usage:
If there is a large difference between trained elements (cf. e.g. ATFpd,tr in
It may also imply problems with microphones, for example in the case of dust or dirt in the microphone inlets.
Also, in the case of unexpected deviations in the binaural case, the user can be informed about possible problems with the device.
Relation to “Head Dictionaries”:
In our co-pending European patent application number EP20210249.7 filed with European patent office on 27 Nov. 2020 and having the title “A hearing aid system comprising a database of acoustic transfer functions”, it is proposed to include dictionaries of head related transfer functions for different heads (e.g. different users, sizes, forms, etc., cf. e.g.
The exemplary contents of the database Θ are illustrated in the upper right part of
The location of the sound source (S, or loudspeaker symbol in
Exemplary Embodiments of a Hearing Device:
The hearing device (HD) further comprises a database Θ stored in memory (MEM [DB]). The database Θ comprises a dictionary Δpd of stored acoustic transfer function vectors (ATFpd), whose elements ATFpd,m, m=1, . . . , M, are frequency dependent acoustic transfer functions representing location-dependent (θ) and frequency dependent (k) propagation of sound from a location (θj) of a target sound source to each of said M microphones, k being a frequency index, k=1, . . . , K, where K is a number of frequency bands. The stored acoustic transfer function vectors (ATFpd(θ, k)) may e.g. be determined in advance of use of the hearing device, while the microphone system (M1, . . . , MM) is mounted on a head at or in an ear of a natural or artificial person (preferably as it is when the hearing system/device is operationally worn for normal use by the user), e.g. gathered in a standard dictionary (Δpd,std). The (or some of the) stored acoustic transfer function vectors (ATFpd) may e.g. be updated during use of the hearing device (where the user wears the microphone system (M1, . . . , MM)), or a further dictionary (Δpd,tr) comprising said updated or ‘trained’ acoustic transfer function vectors (determined by the unconstrained method, and evaluated to be reliable (e.g. by fulfilling a target signal quality criterion)) may be generated during use of the hearing system. The dictionary Δpd comprises standard acoustic transfer function vectors (ATFpd,std) for the natural or artificial person (e.g. grouped in dictionary Δpd,std) and, optionally, trained acoustic transfer function vectors (ATFpd,tr) (e.g. grouped in dictionary Δpd,tr), for a multitude (J′) of different locations θ′j, j=1, . . . , J′, relative to the microphone system (see
The hearing device (HD), e.g. the controller (CTR), is configured to determine a constrained estimate of a current acoustic transfer function vector (ATFpd,cur) in dependence of said M electric input signals and said dictionary Δpd of stored acoustic transfer function vectors (ATFpd,std, and optionally ATFpd,tr, cf.
The database Θ is in the embodiment of
In the embodiment of
The embodiment of
In the embodiment of
In the embodiment of
The embodiment of
The embodiment of
The embodiment of
The controller (CTR) is connected to the database (MEM [DB]), cf. signal ATF, and configured to determine the constrained estimate of a current acoustic transfer function vector (ATFpd,cur) in dependence of the M electric input signals and the dictionary Δpd of stored acoustic transfer function vectors (ATFpd, and optionally ATFpd,tr, cf.
In the embodiments of
The hearing system, e.g. the processor (PRO), may comprise a multitude of M of analysis filter banks (FBAm, m=1, . . . , M) for converting the time domain electric input signals (x1, . . . , xM) to electric signals (X1, . . . , XM) in a time frequency representation (k, l).
The hearing system, e.g. the processor (PRO), comprises a controller (CTR1) configured to determine a constrained estimate of a current acoustic transfer function vector (ATFpd,cur) in dependence of the M electric input signals (X1, . . . , XM) and the dictionary (Δpd) of previously determined acoustic transfer function vectors (ATFpd) stored in the database (θ, MEM [DB]) via signal ATF. The database may form part of the at least one hearing device (HD), e.g. of the processor (PRO), or be accessible to the processor, e.g. via wireless link. The controller (CTR1) is further configured to provide an estimate of the reliability (p(ATFpd,cur)) of the constrained estimate of the current acoustic transfer function vector (ATFpd,cur). The reliability may e.g. be provided in the form of an acoustic-transfer-function-vector-matching-measure indicative of a degree of matching of the constrained estimate of the current acoustic transfer function vector (ATFpd,cur) considering the current electric input signals. The reliability may e.g. be related to how well the constrained estimate of the current acoustic transfer function vector (ATFpd,cur) matches the current electric input signals in a maximum likelihood sense (see e.g. EP3413589A1).
The hearing system, e.g. the processor (PRO), comprises a controller (CTR2) configured to determine an unconstrained estimate of a current acoustic transfer function vector (ATFuc,cur) in dependence of the M electric input signals (X1, . . . , XM). The controller (CTR2) is further configured to provide an estimate of the reliability (p(ATFuc,cur)), e.g. in the form of a probability) of the unconstrained estimate of the current acoustic transfer function vector (ATFuc,cur). The reliability may e.g. be provided in the form of an acoustic-transfer-function-vector-matching-measure indicative of a degree of matching of the unconstrained estimate of the current acoustic transfer function vector (ATFuc,cur) considering the current electric input signals. The reliability may e.g. be related to how well the unconstrained estimate of the current acoustic transfer function vector (ATFuc,cur) matches the current electric input signals in a maximum likelihood sense (see e.g. [4]).
The hearing system, e.g. the processor (PRO), comprises a target signal quality estimator (TQM-E, e.g. a target signal to noise (SNR) estimator, see e.g. SNRE in
The hearing system, e.g. the processor (PRO), comprises a controller (CTR3) configured to determine a resulting acoustic transfer function vector (ATF*) for the user in dependence of a) the constrained estimate of the current acoustic transfer function vector (ATFpd,cur), b) the unconstrained estimate of the current acoustic transfer function vector (ATFuc,cur), and of c) at least one of c1) the acoustic-transfer-function-vector-matching-measure (p(ATFpd,cur)) indicative of a degree of matching of the constrained estimate (ATFpd,cur), of c2) the acoustic-transfer-function-vector-matching-measure p(ATFuc,cur)) of the unconstrained estimate (ATFuc,cur), and of c3) a target-sound-source-location-identifier (TSSLI) indicative of a location of, direction to, or proximity of, the current target sound source.
The hearing system, e.g. the processor (PRO), may comprise a location estimator (LOCE) connected to one or more of the electric input signals (here X1, . . . , XM), or to a signal or signals derived therefrom. The location estimator (LOCE) may e.g. be configured to provide the target-sound-source-location-identifier (TSSLI) in dependence of an own voice detector configured to estimate whether or not (or with what probability) a given input sound (e.g. a voice, e.g. speech) originates from the voice of the user of the wearable hearing system (e.g. the hearing device), e.g. in dependence of at least one of said M electric input signals or a signal or signals originating therefrom. If own voice is detected (or detected with a high probability) in the electric input signal(s), and if own voice is assumed to be the target signal (e.g. in a communication mode of operation) the target source location is the user's mouth (and all other locations around the user can be ignored (or have less probability) in relation to determination of an appropriate current acoustic transfer function. The location estimator (LOCE) may e.g. be configured to provide the target-sound-source-location-identifier (TSSLI) in dependence of a direction of arrival estimator configured to estimate a direction of arrival of a current target sound source, e.g. in dependence of at least one of said M electric input signals or a signal or signals originating therefrom. Thereby acoustic transfer functions associated with locations within an angular range of the estimated direction of the location estimator may be associated with a higher probability than other transfer functions. The location estimator (LOCE) may e.g. be configured to provide the target-sound-source-location-identifier (TSSLI) in dependence of a proximity detector configured to estimate a distance to a current target sound source, e.g. in dependence of at least one of the M electric input signals or a signal or signals originating therefrom, or in dependence of a distance sensor or detector. Thereby appropriate acoustic transfer functions associated with locations around the user that are within a range of the estimated distance of the location estimator may be associated with a higher probability than other transfer functions.
The hearing system, e.g. the processor (PRO), comprises an audio signal processing part (SP) configured to provide the processed signal (OUT) in dependence of the resulting acoustic transfer function vector (ATF*) for the user. The signal audio signal processing part (SP) may e.g. comprise a beamformer (cf. BF in
The controller (CTR) in
The hearing device (HD), e.g. a hearing aid, of
The synthesis filter bank (FBS) is configured to convert a number of frequency sub-band signals (OUT) to one time-domain signal (out). The signal processor (SP) is configured to apply one or more processing algorithms to the electric input signals (e.g. beamforming and compressive amplification) and to provide a processed output signal (OUT; out) for presentation to the user via an output unit (OU), e.g. an output transducer. The output unit is configured to a) convert a signal representing sound to stimuli perceivable by the user as sound (e.g. in the form of vibrations in air, or vibrations in bone, or as electric stimuli of the cochlear nerve) or to b) transmit the processed output signal (out) to another device or system.
The processor (PRO) and the signal processor (SP) may form part of the same digital signal processor (or be independent units). The analysis filter banks (FB-A1, FB-A2), the processor (PRO), the signal processor (SP), the synthesis filter bank (FBS), the controller (CTR), the target signal quality estimator (TQME; SNR-E), the voice activity detector (VAD), the target-sound-source-location-identifier (TSSLI), and the memory (MEM [DB]) may form part of the same digital signal processor (or be independent units).
The hearing device may comprise a transceiver allowing an exchange of data with another device, e.g. a contra-lateral hearing device of a binaural hearing system, a smartphone or any other portable or stationary device or system. The database Θ may be located in the other device. Likewise, the processor PRO (or a part thereof) may be located in the other device (e.g. a dedicated processing device).
The beamformers (BF) and (OV-BF) are connected to an acoustic transfer function estimator (ATFE) for providing the current acoustic transfer function vector (ATF*) in dependence of the current electric input signals (and possible sensors or detectors) according to the present invention. In a communication mode (e.g. telephone mode) of operation, the own-voice beamformer (OV-BF) is activated and the current acoustic transfer function vector (ATF*) is an own voice acoustic transfer function (ATF*ov), determined when the user speaks. In a non-communication mode of operation, the environment beamformer (BF) is activated and the current acoustic transfer function vector (ATF*) is an environment acoustic transfer function (ATF*ev) (e.g. determined when the user does not speak). Likewise, in a communication mode wherein the environment beamformer is activated, the environment acoustic transfer function (ATF*env) may be determined from the electric input signals (X1, X2) when the user's voice is not present (e.g. when the far-end communication partner speaks).
The control unit (CONT) is configured to dynamically control the processing of the SSP- and MSP-signal processing units (G1 and G2, respectively), e.g. based on one or more control input signals (not shown).
The input signals (S-IN, M-IN) to the headset (HD) may be presented in the (time-) frequency domain or converted from the time domain to the (time-) frequency domain by appropriate functional units, e.g. included in receiver unit (Rx) and input unit (IU) of the headset. A headset according to the present disclosure may e.g. comprise a multitude of time to time time-frequency conversion units (e.g. one for each input signal that is not otherwise provided in a time-frequency representation, e.g. in the form of analysis filter bank units (FB-Am, m=1, . . . , M) of
In the embodiment of a hearing device in
The hearing system (here, the hearing device HD) may further comprise a detector unit e.g. comprising one or more inertial measurement units (IMU), e.g. a 3D gyroscope, a 3D accelerometer and/or a 3D magnetometer, here denoted IMU1 and located in the BTE-part (BTE). Inertial measurement units (IMUs), e.g. accelerometers, gyroscopes, and magnetometers, and combinations thereof, are available in a multitude of forms (e.g. multi-axis, such as 3D-versions), e.g. constituted by or forming part of an integrated circuit, and thus suitable for integration, even in miniature devices, such as hearing devices, e.g. hearing aids. The sensor IMU1 may thus be located on the substrate (SUB) together with other electronic components (e.g. MEM, FE, DSP). One or more movement sensors (IMU) may alternatively or additionally be located in or on the ITE part (ITE) or in or on the connecting element (IC), e.g. used to pick up sound from the user's mouth (own voice).
The hearing device (HD) further comprises an output unit (e.g. an output transducer) providing stimuli perceivable by the user as sound based on a processed audio signal from the processor or a signal derived therefrom. In the embodiment of a hearing device in
The electric input signals (from input transducers MBTE1, MBTE2, MBTE3, M1, M2, M3, IMU1) may be processed in the time domain or in the (time-) frequency domain (or partly in the time domain and partly in the frequency domain as considered advantageous for the application in question).
The hearing device (HD) exemplified in
In the above description and examples, focus has been made on wearable hearing devices associated with a particular person. The inventive ideas of the present disclosure (to select a predetermined acoustic transfer function from a dictionary (constrained method) OR to estimate a new acoustic transfer function (un-constrained method) in dependence of a confidence parameter, e.g. regarding the quality of a current target signal, or the location of the audio source of current interest to the user) may, however, further be applied to hearing devices associated with a particular acoustic environment, e.g. of a particular location where the hearing device is located, e.g. a particular room. An example of such device may be a speakerphone configured to pick up sound from audio sources (e.g. one or more persons speaking) located in the particular room, and to (e.g. process and) transmit the captured sound to one or more remote listeners. The speakerphone may further be configured to play sound received from the one or more remote listeners to allow persons located in the particular room to hear it. Instead of being adapted to and adapting to a particular person, acoustic transfer functions of the speakerphone (or other audio device) may be adapted to the particular room.
It is intended that the structural features of the devices described above, either in the detailed description and/or in the claims, may be combined with steps of the method, when appropriately substituted by a corresponding process.
As used, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well (i.e. to have the meaning “at least one”), unless expressly stated otherwise. It will be further understood that the terms “includes,” “comprises,” “including,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will also be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element but an intervening element may also be present, unless expressly stated otherwise. Furthermore, “connected” or “coupled” as used herein may include wirelessly connected or coupled. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. The steps of any disclosed method is not limited to the exact order stated herein, unless expressly stated otherwise.
It should be appreciated that reference throughout this specification to “one embodiment” or “an embodiment” or “an aspect” or features included as “may” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the disclosure. The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects.
The claims are not intended to be limited to the aspects shown herein but are to be accorded the full scope consistent with the language of the claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more.
Number | Date | Country | Kind |
---|---|---|---|
21192443 | Aug 2021 | EP | regional |
Number | Name | Date | Kind |
---|---|---|---|
20150010160 | Udesen | Jan 2015 | A1 |
20150055783 | Luo | Feb 2015 | A1 |
20190378531 | Jensen et al. | Dec 2019 | A1 |
Number | Date | Country |
---|---|---|
2 822 301 | Jan 2015 | EP |
2 869 599 | May 2015 | EP |
3 236 672 | Oct 2017 | EP |
3 413 589 | Dec 2018 | EP |
Entry |
---|
Hoang et al., “Joint Maximum Likelihood Estimation of Power Spectral Densities and Relative Acoustic Transfer Function for Acoustic Beamforming”, ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 1-5. |
Jensen et al., “Analysis of Beamformer Directed Single-Channel Noise Reduction System for Hearing Aid Applications”, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5728-5732. |
Markovich-Golan et al., “Performance Analysis of the Covariance Subtraction Method for Relative Transfer Function Estimation and Comparison to the Covariance Whitening Method”, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 544-548. |
Ye et al., “Maximum Likelihood DOA Estimation and Asymptotic Cramer-Rao Bounds for Additive Unknown Colored Noise”, IEEE Transactions on Signal Processing, vol. 43, No. 4, Apr. 1995, pp. 938-949. |
Zohourian et al., “Binaural Speaker Localization Integrated Into an Adaptive Beamformer for Hearing Aids”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, No. 3, Mar. 2018, pp. 515-528. |
Extended European Search Report, issued in Priority Application No. 21192443.6, dated Mar. 1, 2022. |
Number | Date | Country | |
---|---|---|---|
20230054213 A1 | Feb 2023 | US |