Spatial filtering (directionality) by beam forming in hearing aids is an efficient way to attenuate unwanted noise as a direction-dependent gain can cancel noise from one direction while preserving the sound of interest impinging from another direction hereby potentially improving the speech intelligibility. Typically, beam formers in hearing instruments have beam patterns, which continuously are adapted in order to minimize the noise while sound impinging from the target direction is unaltered. As the acoustic properties of the noise signal changes over time, the beam former is implemented as an adaptive system, which adapts the directional beam pattern in order to minimize the noise while the target sound (direction) is unaltered.
Despite the potential benefit, adaptive directionality also has some drawbacks. In a fluctuating acoustic environment, the adaptive system needs to react fast. The parameter estimates for such a fast system will have a high variance, which will lead to poorer performance in steady environments.
We thus propose a smoothing scheme which provides more smoothing of the adaptive parameter in fluctuating environments and less smoothing of the adaptive parameter in more steady acoustic environments.
In another aspect, a smoothing scheme based on adaptive covariance smoothing is presented, which may be advantageous in environments or situations where a direction to a sound source of interest changes (e.g. in that more than one (e.g. localized) sound source of interest is present and where the more than one sound sources are active at different points in time, e.g. one after the other, or un-correlated).
A hearing aid:
In a first aspect of the present application, a hearing aid adapted for being located in an operational position at or in or behind an ear or fully or partially implanted in the head of a user, is provided. The hearing aid comprises
where * denotes the complex conjugation and denotes the statistical expectation operator, and c is a constant. The hearing aid is adapted to provide that said adaptive beam former filtering unit (BFU) comprises a smoothing unit for implementing said statistical expectation operator by smoothing the complex expression C2*·C1 and the real expression |C2|2 over time.
In a second aspect, of the present application, a hearing aid adapted for being located in an operational position at or in or behind an ear or fully or partially implanted in the head of a user, is provided. The hearing aid comprises
In an embodiment, wC1HwC2=0, in other words, the first and second beam patterns are preferably mutually orthogonal. The following relations between beamformer weights and weighting parameters exist wC1=[W11,W12]T and wC2=[W21,W22]T
In an embodiment, the first beam pattern (C1) represents a target maintaining beamformer, e.g. implemented as a delay and sum beamformer. In an embodiment, the second beam pattern (C2) represents a target cancelling beamformer, e.g. implemented as a delay and subtract beamformer. In another embodiment C1 represents a front cardioid and C2 represents a rear cardioid. This may also represent a target cancelling beamformer and a target enhancing beamformer, but the target enhancing beamformer is implemented as a delay and subtract (differential) beamformer.
The expression for β has its basis in the generalized side lobe canceller structure, where in a special case of two microphones, we have (assuming that wC1HwC2=0)
wGSC(k)=wC1(k)−wC2(k)β*(k)
where (omitting the frequency index k)
Where E[·] represents the expectation operator. VAD=0 represents a situation where speech is absent (e.g. only noise is present in the given time segment), VAD means Voice Activity Detector. x represents input signals or a processed version of the input signals (e.g. x=1[X1(k,m), X2(k,m)]T). In the above expressions for β, Cv is also updated when VAD=0.
We notice that we may find β either directly from the signals C1=wC1Hx and C2=wC2Hx (cf. 1st aspect) or we may find β from the noise covariance matrix Cv, i.e.
(cf. second aspect). This may be a choice of implementation. If, e.g., signals C1 and C2 are already used elsewhere in the device or algorithm, it may be advantageous to derive β directly from these signals
but if we need to change the look direction (and hereby wC1 and wC2), it is a disadvantage that the weights are included inside the expectation operator. In that case, it is an advantage deriving β directly from the noise covariance matrix Cv (as in the 2nd aspect. Thereby, wC1 and wC2 will not be part of the smoothing and thus β can change quickly based on for example a change in target DOA (which will result in change of wC1 and wC2, where wC1=[W11 W12]T and wC2=[W21 W22]T). An embodiment of determining β according to this method is e.g. illustrated in
In an embodiment, the adaptive beam former filtering unit is configured to provide adaptive smoothing of a covariance matrix for said electric input signals comprising adaptively changing time constants (τatt1, τrel) for said smoothing in dependence of changes (ΔC) over time in covariance of said first and second electric input signals, wherein said time constants have first values (τatt1, τrel1) for changes in covariance below a first threshold value (ΔCth1) and second values (τatt2, τrel2) for changes in covariance above a second threshold value (ΔCth2), wherein the first values are larger than corresponding second values of said time constants, while said first threshold value (ΔCth1) is smaller than or equal to said second threshold value (ΔCth2). In an embodiment, the adaptive beam former filtering unit is configured to provide adaptive smoothing of the noise covariance matrix Cv. In an embodiment, the adaptive beam former filtering unit is configured to provide that the noise covariance matrix is Cv is updated when only noise is present. In an embodiment, the hearing aid comprises a voice activity detector for providing a (binary or continuous, e.g. over frequency bands) indication of whether—at a given point in time—the input signal(s) comprise speech or not.
Thereby an improved beam former filtering unit may be provided.
The statistical expectation operator is approximated by a smoothing operation, e.g. implemented as a moving average, e.g. implemented by a low pass filter, e.g. a FIR filter, e.g. implemented by an infinite impulse response (IIR) filter.
In an embodiment, the smoothing unit is configured to apply substantially the same smoothing time constants for the smoothing of the complex expression C2*·C1 and the real expression |C2|2. In an embodiment, the smoothing time constants comprise attack and release time constants τatt and τrel. In an embodiment, the attack and release time constants are substantially equal. Thereby no bias is introduced in the estimate by the smoothing operation. In an embodiment, the smoothing unit is configured to enable the use of different attack and release time constants τatt and τrel in the smoothing. In an embodiment, the attack time constants τatt for the smoothing of the complex expression C2*·C1 and the real expression |C2|2 are substantially equal. In an embodiment, the release time constants τrel for the smoothing of the complex expression C2*·C1 and the real expression |C2|2 are substantially equal.
In an embodiment, the smoothing unit is configured to smoothe a resulting adaptation parameter β(k). In an embodiment, the smoothing unit is configured to provide that the is time constants of the smoothing of the resulting adaptation parameter β(k) are different from the time constants of the smoothing complex expression C2*·C1 and the real expression |C2|2.
In an embodiment, the smoothing unit is configured to provide that the attack and release time constants involved in the smoothing of the resulting adaptation parameter β(k) is larger than the corresponding attack and release time constants involved in the smoothing of the complex expression C2*·C1 and the real expression |C2|2. This has the advantage that smoothing of the signal level dependent expressions expression C2*·C1 and |C2|2 are performed relatively faster (so that a sudden level change (in particular a level drop) can be detected fast). The resulting increased variance in the resulting adaptation parameter β(k) is handled by a performing a relatively slow smoothing of adaptation parameter β(k) (providing smoothed adaptation parameter β(k)=<β(k)>).
In an embodiment, the smoothing unit is configured to provide that the attack and release time constants involved in the smoothing of the complex expression C2*·C1 and the real expression |C2|2 are adaptively determined.
In an embodiment, the smoothing unit is configured to provide that the attack and release time constants involved in the smoothing of the resulting adaptation parameter β(k) are adaptively determined. In an embodiment, the smoothing unit comprises a low pass filter. In an embodiment, the low pass filter is adapted to allow the use of different attack and release coefficients. In an embodiment, the smoothing unit comprises a low pass filter implemented as an IIR filter with fixed or configurable time constant(s).
In an embodiment, the smoothing unit comprises a low pass filter implemented as an IIR filter with a fixed time constant, and an IIR filter with a configurable time constant. In an embodiment, the smoothing unit is configured to provide that the smoothing time constants take values between 0 and 1. A coefficient close to 0 applies averaging with a long time constant while a coefficient close to 1 applies a short time constant. In an embodiment, at least one of said IIR filters is a 1st order IIR filter. In an embodiment, the smoothing unit comprises a number of 1st order IIR filters.
In an embodiment, the smoothing unit is configured to determine the configurable time constant by a function unit providing a predefined function of the difference between a first filtered value of the real expression |C2|2 when filtered by an IIR filter with a first time constant, and a second filtered value of the real expression |C2|2 when filtered by an IIR filter with a second time constant, wherein the first time constant is smaller than the second time constant. In an embodiment, the smoothing unit comprises two 1st order IIR filters using said first and second time constants for filtering said real expression |C2|2 and providing said first and second filtered values, and a combination unit (e.g. a sum or difference unit) for providing said difference between said first and second filtered values of the real expression |C2|2 and a function unit for providing said configurable time constant, and a 1st order IIR filter for filtering the real expression |C2|2 using said configurable time constant.
In an embodiment, the function unit comprises an absolute value (ABS) unit providing an absolute value of the difference between the first and second filtered values.
In an embodiment, the first and second time constants are fixed time constants.
In an embodiment, the first time constant the fixed time constant and the second time constant is the configurable time constant.
In an embodiment, the predefined function is a decreasing function of the difference between the first and second filtered values. In an embodiment, the predefined function is a monotonously decreasing function of the difference between the first and second filtered values. The larger the difference between the first and second filtered values, the faster the smoothing should be performed, i.e. the smaller the time constant.
In an embodiment, the predefined function is one of a binary function, a piecewise linear function, and a continuous monotonous function. In an embodiment, predefined function is a sigmoid function.
In an embodiment, the smoothing unit comprises respective low pass filters implemented as IIR filters using said configurable time constant for filtering real and imaginary parts of the expression C2*·C1 and the real expression |C2|2, and wherein said configurable time constant is determined from |C2|2.
In an embodiment, the hearing aid comprises a hearing instrument adapted for being located at or in an ear of a user or for being fully or partially implanted in the head of a user, a headset, an earphone, an ear protection device or a combination thereof.
In an embodiment, the hearing aid is adapted to provide a frequency dependent gain and/or a level dependent compression and/or a transposition (with or without frequency compression) of one or frequency ranges to one or more other frequency ranges, e.g. to compensate for a hearing impairment of a user. In an embodiment, the hearing aid comprises a signal processing unit for enhancing the input signals and providing a processed output signal.
In an embodiment, the hearing aid comprises an output unit (e.g. a loudspeaker, or a vibrator or electrodes of a cochlear implant) for providing output stimuli perceivable by the user as sound. In an embodiment, the hearing aid comprises a forward or signal path between the first and second microphones and the output unit. In an embodiment, the beam former filtering unit is located in the forward path. In an embodiment, a signal processing unit is located in the forward path. In an embodiment, the signal processing unit is adapted to provide a level and frequency dependent gain according to a user's particular needs. In an embodiment, the hearing aid comprises an analysis path comprising functional components for analyzing the electric input signal(s) (e.g. determining a level, a modulation, a type of signal, an acoustic feedback estimate, etc.). In an embodiment, some or all signal processing of the analysis path and/or the forward path is conducted in the frequency domain. In an embodiment, some or all signal processing of the analysis path and/or the forward path is conducted in the time domain.
In an embodiment, an analogue electric signal representing an acoustic signal is converted to a digital audio signal in an analogue-to-digital (AD) conversion process, where the analogue signal is sampled with a predefined sampling frequency or rate fs, fs being e.g. in the range from 8 kHz to 48 kHz (adapted to the particular needs of the application) to provide digital samples xn (or x[n]) at discrete points in time tn (or n), each audio sample representing the value of the acoustic signal at tn by a predefined number Ns of bits, Ns being e.g. in the range from 1 to 16 bits. A digital sample x has a length in time of 1/fs, e.g. 50 μs, for fs=20 kHz. In an embodiment, a number of audio samples are arranged in a time frame. In an embodiment, a time frame comprises 64 or 128 audio data samples. Other frame lengths may be used depending on the practical application.
In an embodiment, the hearing aids comprise an analogue-to-digital (AD) converter to digitize an analogue input with a predefined sampling rate, e.g. 20 kHz. In an embodiment, the hearing aids comprise a digital-to-analogue (DA) converter to convert a digital signal to an analogue output signal, e.g. for being presented to a user via an output transducer.
In an embodiment, the hearing aid, e.g. the first and second microphones each comprises a (TF-)conversion unit for providing a time-frequency representation of an input signal. In an embodiment, the time-frequency representation comprises an array or map of corresponding complex or real values of the signal in question in a particular time and frequency range. In an embodiment, the TF conversion unit comprises a filter bank for filtering a (time varying) input signal and providing a number of (time varying) output signals each comprising a distinct frequency range of the input signal. In an embodiment, the TF conversion unit comprises a Fourier transformation unit for converting a time variant input signal to a (time variant) signal in the frequency domain. In an embodiment, the frequency range considered by the hearing aid from a minimum frequency fmin to a maximum frequency fmax comprises a part of the typical human audible frequency range from 20 Hz to 20 kHz, e.g. a part of the range from 20 Hz to 12 kHz. In an embodiment, a signal of the forward and/or analysis path of the hearing aid is split into a number NI of frequency bands, where NI is e.g. larger than 5, such as larger than 10, such as larger than 50, such as larger than 100, such as larger than 500, at least some of which are processed individually. In an embodiment, the hearing aid is/are adapted to process a signal of the forward and/or analysis path in a number NP of different frequency channels (NP≤NI). The frequency channels may be uniform or non-uniform in width (e.g. increasing in width with frequency), overlapping or non-overlapping. Each frequency channel comprises one or more frequency bands.
In an embodiment, the hearing aid is portable device, e.g. a device comprising a local energy source, e.g. a battery, e.g. a rechargeable battery.
In an embodiment, the hearing aid comprises a hearing instrument, e.g. a hearing instrument adapted for being located at the ear or fully or partially in the ear canal of a user, or for being fully or partially implanted in the head of the user.
In an embodiment, the hearing aid comprises a number of detectors configured to provide status signals relating to a current physical environment of the hearing aid (e.g. the current acoustic environment), and/or to a current state of the user wearing the hearing aid, and/or to a current state or mode of operation of the hearing aid. Alternatively or additionally, one or more detectors may form part of an external device in communication (e.g. wirelessly) with the hearing aid. An external device may e.g. comprise another hearing assistance device, a remote control, and audio delivery device, a telephone (e.g. a Smartphone), an external sensor, etc.
In an embodiment, one or more of the number of detectors operate(s) on the full band signal (time domain). In an embodiment, one or more of the number of detectors operate(s) on band split signals ((time-) frequency domain).
In an embodiment, the number of detectors comprises a level detector for estimating a current level of a signal of the forward path. In an embodiment, the number of detectors comprises a noise floor detector. In an embodiment, the number of detectors comprises a telephone mode detector.
In a particular embodiment, the hearing aid comprises a voice detector (VD) for determining whether or not an input signal comprises a voice signal (at a given point in time). A voice signal is in the present context taken to include a speech signal from a human being. It may also include other forms of utterances generated by the human speech system (e.g. singing). In an embodiment, the voice detector unit is adapted to classify a current acoustic environment of the user as a VOICE or NO-VOICE environment. This has the advantage that time segments of the electric microphone signal comprising human utterances (e.g. speech) in the user's environment can be identified, and thus separated from time segments only comprising other sound sources (e.g. artificially generated noise). In an embodiment, the voice detector is adapted to detect as a VOICE also the user's own voice. Alternatively, the voice detector is adapted to exclude a user's own voice from the detection of a VOICE. In an embodiment, the voice activity detector is adapted to differentiate between a user's own voice and other voices.
In an embodiment, the hearing aid comprises an own voice detector for detecting whether a given input sound (e.g. a voice) originates from the voice of the user of the system. In an embodiment, the microphone system of the hearing aid is adapted to be able to differentiate between a user's own voice and another person's voice and possibly from NON-voice sounds.
In an embodiment, the memory comprise a number of fixed adaptation parameter βfixj(k), j=1, . . . , Nfix, where Nfix is the number of fixed beam patterns, representing different (third) fixed beam patterns, which may be selected in dependence of a control signal, e.g. from a user interface or based on a signal from one or more detectors. In an embodiment, the choice of fixed beam former is dependent on a signal from the own voice detector and/or from a telephone mode detector.
In an embodiment, the hearing assistance device comprises a classification unit configured to classify the current situation based on input signals from (at least some of) the detectors, and possibly other inputs as well. In the present context ‘a current situation’ is taken to be defined by one or more of
a) the physical environment (e.g. including the current electromagnetic environment, e.g. the occurrence of electromagnetic signals (e.g. comprising audio and/or control signals) intended or not intended for reception by the hearing aid, or other properties of the current environment than acoustic;
b) the current acoustic situation (input level, feedback, etc.), and
c) the current mode or state of the user (movement, temperature, etc.);
d) the current mode or state of the hearing assistance device (program selected, time elapsed since last user interaction, etc.) and/or of another device in communication with the hearing aid.
In an embodiment, the hearing aid further comprises other relevant functionality for the application in question, e.g. compression, noise reduction, feedback suppression, etc.
In an embodiment, the hearing aid comprises a hearing instrument, e.g. a hearing instrument adapted for being located at the ear or fully or partially in the ear canal of a user or fully or partially implanted in the head of a user, a headset, an earphone, an ear protection device or a combination thereof.
Use:
In an aspect, use of a hearing aid as described above, in the ‘detailed description of embodiments’ and in the claims, is moreover provided. In an embodiment, use is provided in a system comprising one or more hearing instruments, headsets, ear phones, active ear protection systems, etc., e.g. in handsfree telephone systems, teleconferencing systems, public address systems, karaoke systems, classroom amplification systems, etc.
A Method of Operating a Hearing Aid:
In an aspect, a method of operating a hearing aid adapted for being located in an operational position at or in or behind an ear or fully or partially implanted in the head of a user is provided. The method comprises
where * denotes the complex conjugation and denotes the statistical expectation operator, and c is a constant. The method further comprises smoothing the complex expression C2*·C1 and the real expression |C2|2 over time.
In a further aspect, of the present application, a method of operating a hearing aid adapted for being located in an operational position at or in or behind an ear or fully or partially implanted in the head of a user is provided. The method comprises
In an embodiment, wC1HwC2=0, in other words, the first and second beam patterns are preferably mutually orthogonal.
It is intended that some or all of the structural features of the device described above, in the ‘detailed description of embodiments’ or in the claims can be combined with embodiments of the method, when appropriately substituted by a corresponding process and vice versa. Embodiments of the method have the same advantages as the corresponding devices.
A Method of Adaptive Covariance Matrix Smoothing
In another aspect, a smoothing scheme based on adaptive covariance smoothing, is provided by the present disclosure. Adaptive covariance smoothing may be advantageous in environments or situations where a direction to a sound source of interest changes, e.g. in that more than one (in space) stationary or semi stationary sound source is present and where the sound sources are active at different points in time, e.g. one after the other, or un-correlated in time.
A method of operating a hearing device, e.g. a hearing aid, is provided. The method comprises
In an embodiment, the first X1 and second X2 electric input signals are provided in a time frequency representation X1(k,m) and second X2(k,m), where k is a frequency index, k=1, . . . , K and m is time frame index. In an embodiment, said changes (ΔC) over time in covariance of said first and second electric input signals are related to changes over one or more time (possibly overlapping) frames (i.e. Δm≥1).
In an embodiment, said time constants represent attack and release time constants, respectively (τatt, τrel).
A Hearing Device Comprising an Adaptive Beamformer.
A hearing device configured to implement the method adaptive covariance matrix smoothing is also provided.
A hearing device, e.g. a hearing aid, is furthermore provided. The hearing device comprises
This has the advantage of providing an improved hearing device that is suitable for determining a direction of arrival (and/or location over time) of sound from sources in a dynamic listening environment with multiple competing speakers (and thus to steer a beam towards a currently active sound source).
A Computer Readable Medium:
In an aspect, a tangible computer-readable medium storing a computer program comprising program code means for causing a data processing system to perform at least some (such as a majority or all) of the steps of the method described above, in the ‘detailed description of embodiments’ and in the claims, when said computer program is executed on the data processing system is furthermore provided by the present application.
By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media. In addition to being stored on a tangible medium, the computer program can also be transmitted via a transmission medium such as a wired or wireless link or a network, e.g. the Internet, and loaded into a data processing system for being executed at a location different from that of the tangible medium.
A Data Processing System:
In an aspect, a data processing system comprising a processor and program code means for causing the processor to perform at least some (such as a majority or all) of the steps of the method described above, in the ‘detailed description of embodiments’ and in the claims is furthermore provided by the present application.
A Hearing System:
In a further aspect, a hearing system comprising a hearing aid as described above, in the ‘detailed description of embodiments’, and in the claims, AND an auxiliary device is moreover provided.
In an embodiment, the system is adapted to establish a communication link between the hearing aid and the auxiliary device to provide that information (e.g. control and status signals, possibly audio signals) can be exchanged or forwarded from one to the other.
In an embodiment, the auxiliary device is or comprises an audio gateway device adapted for receiving a multitude of audio signals (e.g. from an entertainment device, e.g. a TV or a music player, a telephone apparatus, e.g. a mobile telephone or a computer, e.g. a PC) and adapted for selecting and/or combining an appropriate one of the received audio signals (or combination of signals) for transmission to the hearing aid. In an embodiment, the auxiliary device is or comprises a remote control for controlling functionality and operation of the hearing aid(s). In an embodiment, the function of a remote control is implemented in a SmartPhone, the SmartPhone possibly running an APP allowing to control the functionality of the audio processing device via the SmartPhone (the hearing aid(s) comprising an appropriate wireless interface to the SmartPhone, e.g. based on Bluetooth or some other standardized or proprietary scheme). In an embodiment, the auxiliary device is or comprises a smartphone, or similar communication device.
In an embodiment, the auxiliary device is another hearing aid. In an embodiment, the hearing system comprises two hearing aids adapted to implement a binaural hearing aid system.
In an embodiment, the binaural hearing aid system (e.g. each of the first and second hearing aids of the binaural hearing aid system) is (are) configured binaurally exchange the smoothed beta values in order to create one joint βbin(k) value based on a combination of the two first and second smoothed β-values, β1(k), β2(k), of the first and second hearing aids, respectively.
Definitions:
In the present context, a ‘hearing aid’ refers to a device, such as e.g. a hearing instrument or an active ear-protection device or other audio processing device, which is adapted to improve, augment and/or protect the hearing capability of a user by receiving acoustic signals from the user's surroundings, generating corresponding audio signals, possibly modifying the audio signals and providing the possibly modified audio signals as audible signals to at least one of the user's ears. A ‘hearing aid’ further refers to a device such as an earphone or a headset adapted to receive audio signals electronically, possibly modifying the audio signals and providing the possibly modified audio signals as audible signals to at least one of the user's ears. Such audible signals may e.g. be provided in the form of acoustic signals radiated into the user's outer ears, acoustic signals transferred as mechanical vibrations to the user's inner ears through the bone structure of the user's head and/or through parts of the middle ear as well as electric signals transferred directly or indirectly to the cochlear nerve of the user.
The hearing aid may be configured to be worn in any known way, e.g. as a unit arranged behind the ear with a tube leading radiated acoustic signals into the ear canal or with a loudspeaker arranged close to or in the ear canal, as a unit entirely or partly arranged in the pinna and/or in the ear canal, as a unit attached to a fixture implanted into the skull bone, as an entirely or partly implanted unit, etc. The hearing aid may comprise a single unit or several units communicating electronically with each other.
More generally, a hearing aid comprises an input transducer for receiving an acoustic signal from a user's surroundings and providing a corresponding input audio signal and/or a receiver for electronically (i.e. wired or wirelessly) receiving an input audio signal, a (typically configurable) signal processing circuit for processing the input audio signal and an output means for providing an audible signal to the user in dependence on the processed audio signal. In some hearing aids, an amplifier may constitute the signal processing circuit. The signal processing circuit typically comprises one or more (integrated or separate) memory elements for executing programs and/or for storing parameters used (or potentially used) in the processing and/or for storing information relevant for the function of the hearing aid and/or for storing information (e.g. processed information, e.g. provided by the signal processing circuit), e.g. for use in connection with an interface to a user and/or an interface to a programming device. In some hearing aids, the output means may comprise an output transducer, such as e.g. a loudspeaker for providing an air-borne acoustic signal or a vibrator for providing a structure-borne or liquid-borne acoustic signal. In some hearing aids, the output means may comprise one or more output electrodes for providing electric signals.
In some hearing aids, the vibrator may be adapted to provide a structure-borne acoustic signal transcutaneously or percutaneously to the skull bone. In some hearing aids, the vibrator may be implanted in the middle ear and/or in the inner ear. In some hearing aids, the vibrator may be adapted to provide a structure-borne acoustic signal to a middle-ear bone and/or to the cochlea. In some hearing aids, the vibrator may be adapted to provide a liquid-borne acoustic signal to the cochlear liquid, e.g. through the oval window. In some hearing aids, the output electrodes may be implanted in the cochlea or on the inside of the skull bone and may be adapted to provide the electric signals to the hair cells of the cochlea, to one or more hearing nerves, to the auditory cortex and/or to other parts of the cerebral cortex.
A ‘hearing system’ refers to a system comprising one or two hearing aids, and a ‘binaural hearing system’ refers to a system comprising two hearing aids and being adapted to cooperatively provide audible signals to both of the user's ears. Hearing systems or binaural hearing systems may further comprise one or more ‘auxiliary devices’, which communicate with the hearing aid(s) and affect and/or benefit from the function of the hearing aid(s). Auxiliary devices may be e.g. remote controls, audio gateway devices, mobile phones (e.g. SmartPhones), public-address systems, car audio systems or music players. Hearing aids, hearing systems or binaural hearing systems may e.g. be used for compensating for a hearing-impaired person's loss of hearing capability, augmenting or protecting a normal-hearing person's hearing capability and/or conveying electronic audio signals to a person.
Embodiments of the disclosure may e.g. be useful in applications such as hearing aids, headsets, ear phones, active ear protection systems or combinations thereof.
The aspects of the disclosure may be best understood from the following detailed description taken in conjunction with the accompanying figures. The figures are schematic and simplified for clarity, and they just show details to improve the understanding of the claims, while other details are left out. Throughout, the same reference numerals are used for identical or corresponding parts. The individual features of each aspect may each be combined with any or all features of the other aspects. These and other aspects, features and/or technical effect will be apparent from and elucidated with reference to the illustrations described hereinafter in which:
The figures are schematic and simplified for clarity, and they just show details which are essential to the understanding of the disclosure, while other details are left out. Throughout, the same reference signs are used for identical or corresponding parts.
Further scope of applicability of the present disclosure will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the disclosure, are given by way of illustration only. Other embodiments may become apparent to those skilled in the art from the following detailed description.
The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practised without these specific details. Several aspects of the apparatus and methods are described by various blocks, functional units, modules, components, circuits, steps, processes, algorithms, etc. (collectively referred to as “elements”). Depending upon particular application, design constraints or other reasons, these elements may be implemented using electronic hardware, computer program, or any combination thereof.
The electronic hardware may include microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described throughout this disclosure. Computer program shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.
The present application relates to the field of hearing aids, e.g. hearing aids.
C1(k)=w11(k)·X1(k)+w12(k)·X2(k)
C2(k)=w21(k)·X1(k)+w22(k)·X2(k)
An adaptive beampattern (Y(k)), for a given frequency band k, is obtained by linearly combining two beam formers C1(k) and C2(k). C1(k) and C2(k) are different (possibly fixed) linear combinations of the microphone signals.
The beampatterns could e.g. be the combination of an omnidirectional delay-and-sum-beam former C1(k) and a delay-and-subtract-beam former C2(k) with its null direction pointing towards the target direction (target cancelling beam former) as shown in
Y(k)=C1(k)−β(k)C2(k).
The beam former is adapted to work optimally in situations where the microphone signals consist of a point-noise target sound source in the presence of additive noise sources. Given this situation, the scaling factor β(k) is adapted to minimize the noise under the constraint that the sound impinging from the target direction is unchanged. For each frequency band k, the adaptation factor β(k) can be found in different ways. The solution may be found in closed form as
where * denote the complex conjugation and <·> denotes the statistical expectation operator, which may be approximated in an implementation as a time average. As an alternative, the adaptation factor may be updated by an LMS or NLMS equation:
In the following we omit the frequency channel index k. In (1), the adaptation factor β is estimated by averaging across the input data. A simple way to average across data is by low-pass filtering the data as shown in
Such a low-pass filter LP may e.g. be implemented by a first order IIR filter as shown in
This is illustrated in
We propose different ways to overcome this problem. A simple extension is to enable different attack and release coefficients in the low-pass filter. Such a low-pass filter is shown in
The advantage of smoothing the estimate of β is that the estimate is less sensitive to sudden drops in input level. Consequently, we can apply a shorter time constant to the low-pass filters used in the numerator and the denominator of (1). Hereby we can adapt faster in case of a sudden decreasing level. By post-smoothing β, we cope with the increased estimation variance.
Another option is to apply an adaptive smoothing coefficient that changes if a sudden input level change is detected. Embodiments of such low-pass filters are shown in
The resulting smoothing estimate from the low-pass filter shown in
The hearing aid (HD) exemplified in
The hearing aid (HD) comprises a directional microphone system (beam former filtering unit (BFU)) adapted to enhance a target acoustic source among a multitude of acoustic sources in the local environment of the user wearing the hearing aid device. In an embodiment, the directional system is adapted to detect (such as adaptively detect) from which direction a particular part of the microphone signal (e.g. a target part and/or a noise part) originates. In an embodiment, the beam former filtering unit is adapted to receive inputs from a user interface (e.g. a remote control or a smartphone) regarding the present target direction. The memory unit (MEM) may e.g. comprise predefined (or adaptively determined) complex, frequency dependent constants (Wij) defining predefined or (or adaptively determined) ‘fixed’ beam patterns (e.g. omni-directional, target cancelling, etc.), together defining the beamformed signal YBF (cf. e.g.
The hearing aid of
The hearing aid (HD) according to the present disclosure may comprise a user interface UI, e.g. as shown in
The auxiliary device and the hearing aid are adapted to allow communication of data representative of the currently selected direction (if deviating from a predetermined direction (already stored in the hearing aid)) to the hearing aid via a, e.g. wireless, communication link (cf. dashed arrow WL2 in
The method is configured to operate a hearing aid adapted for being located in an operational position at or in or behind an ear or fully or partially implanted in the head of a user.
The Method Comprises
A method of Adaptive Covariance Matrix Smoothing for Accurate Target Estimation and Tracking.
In a further aspect of the present disclosure, a method of adaptively smoothing covariance matrices is outlined in the following. A particular use of the scheme is for (adaptively) estimating a direction of arrival of sound from a target sound source to a person (e.g. a user of a hearing aid, e.g. a hearing aid according to the present disclosure).
The method is exemplified as an alternative scheme for smoothing of the adaptation parameter β(k) according to the present disclosure (cf.
Signal Model:
We consider the following signal model of the signal x impinging on the ith microphone of a microphone array consisting of M microphones:
xi(n)=si(n)+vi(n), (1)
where s is the target signal, v is the noise signal, and n denotes the time sample index. The corresponding vector notation is
x(n)=s(n)+v(n), (2)
where x(n)=[x1(n); x2(n), . . . , xM(n)]T. In the following, we consider the signal model in the time frequency domain. The corresponding model is thus given by
X(k,m)=S(k,m)+V(k,m), (3)
where k denotes the frequency channel index and m denotes the time frame index. Likewise X(k,m)=[X1(k,m), X2(k,m), . . . , XM(k,m)]T. The signal at the ith microphone, xi is a linear mixture of the target signal si and the noise vi, vi is the sum of all noise contributions from different directions as well as microphone noise. The target signal at the reference microphone sref is given by the target signal s convolved by the acoustic transfer function h between the target location and the location of the reference microphone. The target signal at the other microphones is thus given by the target signal at the reference microphone convolved by the relative transfer function d=[1, d2, . . . , dM]T between the microphones, i.e. si=s*h*d1. The relative transfer function d depends on the location of the target signal. As this is typically the direction of interest, we term d the look vector. At each frequency channel, we thus define a target power spectral density σs2(k,m) at the reference microphone, i.e.
σs2(k,m)=|S(k,m)H(k,m)|2=S(k,m)ref|2, (4)
where denotes the expected value. Likewise, the noise spectral power density at the reference microphone is given by
σv2(k,m)=|V(k,m)ref|2, (5)
The inter-microphone cross-spectral covariance matrix at the kth frequency channel for the clean signal s is then given by
Cs(k,m)=σs2(k,m)d(k,m)dH(k,m), (6)
where H denotes Hermitian transposition. We notice the M×M matrix Cs(k,m) is a rank 1 matrix, as each column of Cs(k,m) is proportional to d(k,m). Similarly, the inter-microphone cross-power spectral density matrix of the noise signal impinging on the microphone array is given by,
Cv(k,m)=σv2(k,m)Γ(k,m0),m>m0, (7)
where Γ(k, m0) is the M×M noise covariance matrix of the noise, measured some time in the past (frame index m0). Since all operations are identical for each frequency channel index, we skip the frequency index k for notational convenience wherever possible in the following. Likewise, we skip the time frame index m, when possible. The inter-microphone cross-power spectral density matrix of the noisy signal is then given by
C=Cs+Cv (8)
C=σs2ddH+σv2Γ (9)
where the target and noise signals are assumed to be uncorrelated. The fact that the first term describing the target signal, Cs, is a rank-one matrix implies that the beneficial part (i.e., the target part) of the speech signal is assumed to be coherent/directional. Parts of the speech signal, which are not beneficial, (e.g., signal components due to late-reverberation, which are typically incoherent, i.e., arrive from many simultaneous directions) are captured by the second term.
Covariance Matrix Estimation
A look vector estimate can be found efficiently in the case of only two microphones based on estimates of the noisy input covariance matrix and the noise only covariance matrix. We select the first microphone as our reference microphone. Our noisy covariance matrix estimate is given by
where * denotes complex conjugate. Each element of our noisy covariance matrix is estimated by low-pass filtering the outer product of the input signal, XXH. We estimate each element by a first order IIR low-pass filter with the smoothing factor αε[0], i.e.
We thus need to low-pass filter four different values (two real and one complex value), i.e. Ĉx11(m), Re{Ĉx12(m)}, Im{Ĉx12(m)}, and Ĉx22(m). We don't need Ĉx21(m) since Ĉx21(m)=Ĉx12*. It is assumed that the target location does not change dramatically in speech pauses, i.e. it is beneficial to keep target information from previous speech periods using a slow time constant giving accurate estimates. This means that Ĉx is not always updated with the same time constant and does not converge to Ĉv in speech pauses, which is normally the case. In long periods with speech absence, the estimate will (very slowly) converge towards to Cno using a smoothing factor close to one. The covariance matrix Cno could represent a situation where the target DOA is zero degrees (front direction), such that the system prioritizes the front direction when speech is absent. Cno may e.g. be selected as an initial value of Cx.
In a similar way, we estimate the elements in the noise covariance matrix, in that case
The noise covariance matrix is updated when only noise is present. Whether the target is present or not may be determined by a modulation-based voice activity detector. It should be noted that “Target present” (cf.
Adaptive Smoothing
The performance of look vector estimation is highly dependent on the choice of smoothing factor α, which controls the update rate of Ĉx(m). When α is close to zero, an accurate estimate can be obtained in spatially stationary situations. When α is close to 1, estimators will be able to track fast spatial changes, for example when tracking two talkers in a dialogue situation. Ideally, we would like to obtain accurate estimates and fast tracking capabilities which is a contradiction in terms of the smoothing factor and there is a need to find a good balance. In order to simultaneously obtain accurate estimates in spatially stationary situations and fast tracking capabilities, an adaptive smoothing scheme is proposed.
In order to control a variable smoothing factor, the normalized covariance
ρ(m)=Cx11−1Cx12, (13)
can be observed an indicator for changes in the target DOA (where Cx11−1 and Cx12 are complex numbers).
In a practical implementation, e.g. a portable device, such as hearing aid, we prefer to avoid the division and reduce the number of computations, so we propose the following log normalized covariance measure
ρ(m)=Σk{log(max{0,Im{Ĉx12}+1})−log(Ĉx11))}, (14)
Two instances of the (log) normalized covariance measure are calculated, a fast instance {tilde over (ρ)}(m) and an instance
where {tilde over (α)} is a fast time constant smoothing factor, and the corresponding fast covariance estimate
according to
ρ(m)=Σk{log(max{0Im{{tilde over (C)}x12}+1})−log({tilde over (C)}x11)}, (17)
Similar expressions for the instance with variable update rate
where ã is a fast time constant smoothing factor, and the corresponding fast covariance estimate
according to
ρ(m)=Σk{log(max{0,Im{
The smoothing factor
where α0 is a slow time constant smoothing factor, i.e. α0<
The pre-smoothing unit (PreS) makes an initial smoothing over time (illustrated by ABS-squared units |·|2 for providing magnitude squared of the input signals Xi(k,m) and subsequent low-pass filtering provided by low-pass filters LP) to provide pre-smoothed covariance estimates Cx11, Cx12 and Cx22, as illustrated in
The Target Present input is e.g. a control input from a voice activity detector. In an embodiment, the Target Present input (cf. signal TP in
The Fast Rel Coef, the Fast Atk Coref, the Slow Rel Coef, and the Slow Atk Coef are fixed (e.g. determined in advance of the use of the procedure) fast and slow attack and release times, respectively. Generally, fast attack and release times are shorter than slow attack and release times. In an embodiment, the time constants (cf. signals TC in
It should be noted that the goal of the computation of y=log(max(Im{x12}+1,0))−log(x11) (cf. two instances in the right part of
The adaptive low-pass filters used in
The above scheme may e.g. be relevant for adaptively estimating a direction of arrival of alternatingly active sound sources at different locations (e.g. at different angles in a horizontal plane relative to a user wearing one or more hearing aids according to the present disclosure).
It is intended that the structural features of the devices described above, either in the detailed description and/or in the claims, may be combined with steps of the method, when appropriately substituted by a corresponding process.
As used, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well (i.e. to have the meaning “at least one”), unless expressly stated otherwise. It will be further understood that the terms “includes,” “comprises,” “including,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will also be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element but an intervening elements may also be present, unless expressly stated otherwise. Furthermore, “connected” or “coupled” as used herein may include wirelessly connected or coupled. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. The steps of any disclosed method is not limited to the exact order stated herein, unless expressly stated otherwise.
It should be appreciated that reference throughout this specification to “one embodiment” or “an embodiment” or “an aspect” or features included as “may” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the disclosure. The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects.
The claims are not intended to be limited to the aspects shown herein, but is to be accorded the full scope consistent with the language of the claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more.
Accordingly, the scope should be judged in terms of the claims that follow.
Number | Date | Country | Kind |
---|---|---|---|
16172042 | May 2016 | EP | regional |
This application is a Continuation of co-pending application Ser. No. 15/608,294, filed on May 30, 2017, which claims priority under 35 U.S.C. § 119(a) to Application No. 16172042.0, filed in the European Patent office on May 30, 2016, all of which are hereby expressly incorporated by reference into the present application.
Number | Name | Date | Kind |
---|---|---|---|
5473701 | Cezanne et al. | Dec 1995 | A |
6983055 | Luo | Jan 2006 | B2 |
9224393 | Kjems | Dec 2015 | B2 |
9301049 | Elko et al. | Mar 2016 | B2 |
9832576 | Jensen et al. | Nov 2017 | B2 |
10165373 | Bertelsen | Dec 2018 | B2 |
20120308038 | Shuang et al. | Dec 2012 | A1 |
20140126745 | Dickins et al. | May 2014 | A1 |
20150221313 | Purnhagen et al. | Aug 2015 | A1 |
20170105074 | Jensen | Apr 2017 | A1 |
20170295437 | Bertelsen et al. | Oct 2017 | A1 |
Number | Date | Country |
---|---|---|
2296142 | Mar 2011 | EP |
Entry |
---|
Bai et al.; “Frequency-Domain Array Beamformers for Noise Reduction”, In “Acoustic Array Systems”, Jan. 29, 2013, John Wiley & Sons Singapore Pte. Ltd., Singapore, pp. 315-344. |
Elko, “Microphone array systems for hands-free telecommunication”, Speech Communication, Elsevier Science Publishers, Amsterdam, NL, Dec. 1, 1996, vol. 20, No. 3, pp. 229-240. |
Lockwood et al., “Performance of time- and frequency-domain binaural beamformers based on recorded signals from real rooms”, The Journal of the Acoustical Society of America, American Institute of Physics for the Acoustical Society of America, New York, NY, Jan. 1, 2004, vol. 115, No. 1, pp. 379-391. |
Number | Date | Country | |
---|---|---|---|
20190158965 A1 | May 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15608294 | May 2017 | US |
Child | 16256742 | US |