The present application deals with a hearing device, such as a hearing aid, comprising a dynamic compressive amplification system for adapting a dynamic range of levels of an input sound signal, e.g. adapted to a reduced dynamic range of a person, e.g. a hearing impaired person, wearing the hearing device. Embodiments of the present disclosure address the problem of undesired amplification of noise produced by applying (traditional) compressive amplification to noisy signals.
By restoring audibility for soft signals while maintaining comfort for louder signals, compressive amplification (CA) has been designed to overcome degraded speech perception caused by sensorineural hearing loss (hearing loss compensation, HLC).
Fitting rationales, either proprietary or generic (e.g. NAL-NL2 of the National Acoustic Laboratories, Australia, cf. e.g. [Keidser et al.; 2011]), provide target gain and compression ratios for speech in quiet. The only exception to this is the work that Western University has generated targets for DSLm[i/o] 5.0 (Desired Sensation Level (DSL) version 5.0 of the Western University, Ontario, Canada, cf. e.g. [Scollie et al.; 2005]) for speech in noise, however to date these targets have not been widely adopted by the hearing aid industry.
In summary, classic CA schemes, used in today's hearing aids (HA), are designed and fitted for speech in quiet. They apply gain and compression independently of the amount of noise present in the environment, which typically leads to two main issues:
1. SNR Degradation in Noisy Speech Environment
2. Undesired Amplification in a pure noise environment
The next sub-sections below describe these two issues as well as the traditional countermeasure usually implemented in current HA.
Issue 1: SNR Degradation in Noisy Speech Environment
In a noisy speech condition (positive, but non-infinite long-term signal-to-noise ratio (SNR)), classic CA causes a long-term SNR degradation proportional to the static compression ratio, the time domain resolution (i.e. the level estimation time constants) and the frequency resolution (i.e. the number of level estimation sub-bands). [Naylor & Johannesson; 2009] have shown that the long-term SNR at the output of a compression system may be higher or lower than the long-term SNR at the input. This is dependent on interactions between the actual long term input SNR within the environment, the modulation characteristics of the signal and the noise, and additionally, the characteristics of the compression of the system (e.g. level estimation time constants, number of level estimation channels and compression ratio). SNR requirements for individuals with a hearing loss may vary greatly dependent upon a number of factors (see [Naylor; 2016)]) for a discussion of this and other issues.
It should be remembered that using a noise reduction (NR) system to improve the long-term SNR, will not prevent the long-term SNR degradation caused by classic CA:
Issue 2: Undesired Noise Amplification in Pure Noise Environment
In more or less noisy environments where speech is absent (SNR close to minus infinity), classic CA applies gain as if the input signal was clean speech at the same level,
Traditional Countermeasure: Environment Specific CA Configuration:
The above described two issues occur in particular sound environments (soundscapes). Hearing loss compensation in the environments speech in noise, quiet/soft noise or loud noise, requires other CA configuration approaches than the environment speech in quiet. Traditionally, the solution proposed to the above two issues has been based on environmental classification: The measured soundscape is classified as a pre-defined type of environment, typically:
For each environment, the characteristics of the compression scheme might be corrected, applying some offsets on the settings (see below). The classification might either use:
Alleviating Issue 1 with Environment Specific CA Configuration
In classic CA schemes, the long-term SNR degradation (issue 1) is often limited by applying the following steps
Linearization can typically be accomplished by:
However, such a solution has severe limitations:
More generally, limiting the long-term SNR degradation by directly acting on the configuration of either the compression ratio, the level estimation time constants and/or the number of level estimation channels is actually a reduction of the degree of freedom required in the optimization of speech audibility restoration, i.e. the hearing loss compensation (HLC), which is actually the ultimate goal of CA.
It should be remembered (as mentioned above) that using a noise reduction (NR) system to improve the long-term SNR, will not prevent the long-term SNR degradation caused by classic CA.
Alleviating Issue 2 with Environment Specific CA Configuration
In classic CA schemes, the undesired amplification in pure noise environment (issue 2) is often limited by applying the following steps
Such negative gain offsets (attenuation offsets) can typically be applied to the CA characteristic curves defined during the fitting of the HA.
However, such a solution might have a practical limitation: The environment classification engine is designed to solve issue 1 and 2. Because of that, it is trained to discriminate at least 3 environments: noise, speech in noise, speech in quiet. Assuming issue 1 is solved by another dedicated engine, the classification engine can be made more robust if it only has to behave like a voice activity detector (VAD), i.e. if it has to discriminate the environments speech present and speech absent.
A Hearing Device:
It is an object of the present disclosure to provide a dynamic system that decreases the negative impact of state of the art compressive amplification (CA) in noisy environments.
In an aspect of the present application, a hearing device, e.g. a hearing aid, is provided. The hearing device comprises
The hearing device further comprises,
Thereby an improved compression system for a hearing aid may be provided.
In the following the dynamic compressive amplification system according to the present disclosure is termed the ‘SNR driven compressive amplification system’ and abbreviated SNRCA.
The SNR driven compressive amplification system (SNRCA) is a compressive amplification (CA) scheme that aims to:
Compression Relaxing
The SNR degradation caused by CA is minimized on average. The CA is only linearized when the SNR of the input signal is locally low (see below) causing minimal reduction of the HLC performance, when:
The linearization is realized using estimated level post-processing. This functionality is termed the “Compression Relaxing” feature of SNRCA.
Gain Relaxing
This feature applies a (configured) reduction of the prescribed gain for very low SNR (i.e. noise only) environments. The reduction is realized using prescribed gain post-processing. This functionality is termed the “Gain Relaxing” feature of SNRCA.
In the present context, the target signal is taken to be a signal intended to be listened to by the user. In an embodiment, the target signal is a speech signal. In the present context, the noise signal is taken to comprise signals from one or more signal sources not intended to be listened to by the user. In an embodiment, the one or more signal sources not intended to be listened to by the user comprises voice and/or non-voice signal sources, e.g. artificially or naturally generated sound sources, e.g. traffic noise, wind noise, babble (an unintelligible mixture of different voices), etc.
The hearing devices comprises a forward path comprising the electric signal path from the input unit to the output unit including the forward gain unit (gain application unit) and possible further signal processing units.
In an embodiment, the hearing device, e.g. the control unit, is adapted to provide that classification of the electric input signal is indicative of a current acoustic environment of the user. In an embodiment, the control unit is configured to classify the acoustic environment in a number of different classes, said number of different classes e.g. comprising one or more of speech in noise, speech in quiet, noise, and clean speech. In an embodiment, the control unit is configured to classify noise as loud noise or soft noise.
In an embodiment, the control unit is configured to provide the classification according to (or based on) a current mixture of target signal and noise signal components in the electric input signal or a processed version thereof.
In an embodiment, the hearing device comprises a voice activity detector for identifying time segments of an electric input signal comprising speech and time segments comprising no speech, or comprises speech or no speech with a certain probability, and providing a voice activity signal indicative thereof. In an embodiment, the voice activity detector is configured to provide the voice activity signal in a number of frequency sub-bands. In an embodiment, the voice activity detector is configured to provide that the voice activity signal is indicative of a speech absence likelihood.
In an embodiment, the control unit is configured to provide the classification in dependence of a current target signal to noise signal ratio. In the present context, a signal to noise ratio (SNR), at a given instance in time, is taken to include a ratio of an estimated target signal component and an estimated noise signal component of an electric input signal representing audio, e.g. sound from the environment of a user wearing the hearing device. In an embodiment, the signal to noise ratio is based on a ratio of estimated levels or power or energy of said target and noise signal components. In an embodiment, the signal to noise ratio is an a priori signal to noise ratio based on a ratio of a level or power or energy of a noisy input signal to an estimated level or power or energy of the noise signal component. In an embodiment, the signal to noise ratio is based on broadband signal component estimates (e.g. in the time domain, SNR=SNR(t), where t is time). In an embodiment, the signal to noise ratio is based on sub-band signal component estimates (e.g. in the time-frequency domain, SNR=SNR (t,f), where t is time and f is frequency).
In an embodiment, the hearing device is adapted to provide that the electric input signal can be received or provided as a number of frequency sub-band signals. In an embodiment, the hearing device (e.g. the input unit) comprises an analysis filter bank for providing said electric input signal as a number of frequency sub-band signals. In an embodiment, the hearing device (e.g. the output unit) comprises a synthesis filter bank for providing an electric output signal in the time domain from a number of frequency sub-band signals.
In an embodiment, the hearing device comprises a memory wherein said hearing data of the user or data or algorithms derived therefrom are stored. In an embodiment, the user's hearing data comprises data characterizing a user's hearing impairment (e.g. a deviation from a normal hearing ability). In an embodiment, the hearing data comprises the user's frequency dependent hearing threshold levels. In an embodiment, the hearing data comprises the user's frequency dependent uncomfortable levels. In an embodiment, the hearing data includes a representation of the user's frequency dependent dynamic range of levels between a hearing threshold and an uncomfortable level.
In an embodiment, the level compression unit is configured to determine said compressive amplification gain according to a fitting algorithm. In an embodiment, the fitting algorithm is a standardized fitting algorithm. In an embodiment, the fitting algorithm is based on a generic (e.g. NAL-NL1 or NAL-NL2 or DSLm[i/o] 5.0) or a predefined proprietary fitting algorithm. In an embodiment, the hearing data of the user or data or algorithms derived therefrom comprises user specific level and frequency dependent gains. Based thereon, the level compression unit is configured to provide an appropriate (frequency and level dependent) gain for a given (modified) level of the electric input signal (at a given time).
In an embodiment, the level detector unit is configured to provide an estimate of a level of an envelope of the electric input signal. In an embodiment, the classification of the electric input signal comprises an indication of a current or average level of an envelope of the electric input signal. In an embodiment, the level detector unit is configured to determine a top tracker and a bottom tracker (envelope) from which a noise floor and a modulation index can be derived. A level detector which can be used as or form part of the level detector unit is e.g. described in WO2003081947A1.
In an embodiment, the hearing device comprises first and second level estimators configured to provide first and second estimates of the level of the electric input signal, respectively, the first and second estimates of the level being determined using first and second time constants, respectively, wherein the first time constant is smaller than the second time constant. In other words, the first and second level estimators correspond to fast and slow level estimators, respectively, providing fast and slow level estimates, respectively. In an embodiment, the first level estimator is configured to track the instantaneous level of the envelope of the electric input signal (e.g. comprising speech) (or a processed version thereof). In an embodiment, the second level estimator is configured to track an average level of the envelope of the electric input signal (or a processed version thereof). In an embodiment, the first and/or the second level estimates is/are provided in frequency sub-bands.
In an embodiment, the control unit is configured to determine first and second signal to noise ratios of the electric input signal or a processed version thereof, wherein said first and second signal-to-noise ratios are termed local SNR and global SNR, respectively, and wherein the local SNR denotes a relatively short-time (τL) and sub-band specific (ΔfL) signal-to-noise ratio and wherein the global SNR denotes a relatively long-time (τG) and broad-band (ΔfG) signal to noise ratio, and wherein the time constant τG and frequency range ΔfG involved in determining the global SNR are larger than corresponding time constant τL and frequency range ΔfL involved in determining the local SNR. In an embodiment, τL is much smaller than τG (τL<<τG). In an embodiment, ΔfL is much smaller than ΔfG (ΔfL<<ΔfG).
In an embodiment, the control unit is configured to determine said first and/or said second control signals based on said first and/or second signal to noise ratios of said electric input signal or a processed version thereof. In an embodiment, the control unit is configured to determine said first and/or said second signal to noise ratios using said first and second level estimates, respectively. The first, ‘fast’ signal-to-noise ratio is termed the local SNR. The second, ‘slow’ signal-to-noise ratio is termed the global SNR. In an embodiment, the first, ‘fast’, local, signal-to-noise ratio is frequency sub-band specific. In an embodiment, the second, ‘slow’, global, signal-to-noise ratio is based on a broadband signal.
In an embodiment, the control unit is configured to determine the first control signal based on said first and second signal to noise ratios. In an embodiment, the control unit is configured to determine the first control signal based on a comparison of the first (local) and second (global) signal to noise ratios. In an embodiment, the control unit is configured to increase the level estimate for decreasing first SNR-values if the first SNR-values are smaller than the second SNR-values. In an embodiment, the control unit is configured to decrease the level estimate for increasing first SNR-values if the first SNR-values are smaller than the second SNR-values. In an embodiment, the control unit is configured not to modify the level estimate for first SNR-values larger than the second SNR-values.
In an embodiment, the control unit is configured to determine the second control signal based on a smoothed signal to noise ratio of said electric input signal or a processed version thereof. In an embodiment, the control unit is configured to determine the second control signal based on the second (global) signal to noise ratio.
In an embodiment, the control unit is configured to determine the second control signal in dependence of said voice activity signal. In an embodiment, the control unit is configured to determine the second control signal based on the second (global) signal to noise ratio, when the voice activity signal is indicative of a speech absence likelihood.
In an embodiment, the hearing device comprises a hearing aid (e.g. a hearing instrument, e.g. a hearing instrument adapted for being located at the ear or fully or partially in the ear canal of a user, or for being fully or partially implanted in the head of a user), a headset, an earphone, an ear protection device or a combination thereof.
In an embodiment, the hearing device is adapted to provide a frequency dependent gain and/or a level dependent compression and/or a transposition (with or without frequency compression) of one or frequency ranges to one or more other frequency ranges, e.g. to compensate for a hearing impairment of a user. In an embodiment, the hearing device comprises a signal processing unit for enhancing the electric input signal and providing a processed output signal, e.g. including a compensation for a hearing impairment of a user.
The hearing device comprises an output unit for providing a stimulus perceived by the user as an acoustic signal based on a processed electric signal. In an embodiment, the output unit comprises a number of electrodes of a cochlear implant or a vibrator of a bone conducting hearing device. In an embodiment, the output unit comprises an output transducer. In an embodiment, the output transducer comprises a receiver (loudspeaker) for providing the stimulus as an acoustic signal to the user. In an embodiment, the output transducer comprises a vibrator for providing the stimulus as mechanical vibration of a skull bone to the user (e.g. in a bone-attached or bone-anchored hearing device).
The hearing device comprises an input unit for providing an electric input signal representing sound. In an embodiment, the input unit comprises an input transducer, e.g. a microphone, for converting an input sound to an electric input signal. In an embodiment, the input unit comprises a wireless receiver for receiving a wireless signal comprising sound and for providing an electric input signal representing said sound. In an embodiment, the hearing device comprises a directional microphone system (e.g. comprising a beamformer filtering unit) adapted to spatially filter sounds from the environment, and thereby enhance a target acoustic source among a multitude of acoustic sources in the local environment of the user wearing the hearing device. In an embodiment, the directional system is adapted to detect (such as adaptively detect) from which direction a particular part of the microphone signal originates.
In an embodiment, the hearing device comprises an antenna and transceiver circuitry for wirelessly receiving a direct electric input signal from another device, e.g. a communication device or another hearing device. In an embodiment, the hearing device comprises a (possibly standardized) electric interface (e.g. in the form of a connector) for receiving a wired direct electric input signal from another device, e.g. a communication device or another hearing device. In an embodiment, the direct electric input signal represents or comprises an audio signal and/or a control signal and/or an information signal. In an embodiment, the hearing device comprises demodulation circuitry for demodulating the received direct electric input to provide the direct electric input signal representing an audio signal and/or a control signal e.g. for setting an operational parameter (e.g. volume) and/or a processing parameter of the hearing device. In general, a wireless link established by a transmitter and antenna and transceiver circuitry of the hearing device can be of any type. In an embodiment, the wireless link is used under power constraints, e.g. in that the hearing device comprises a portable (typically battery driven) device. In an embodiment, the wireless link is a link based on near-field communication, e.g. an inductive link based on an inductive coupling between antenna coils of transmitter and receiver parts. In another embodiment, the wireless link is based on far-field, electromagnetic radiation. In an embodiment, the communication via the wireless link is arranged according to a specific modulation scheme, e.g. an analogue modulation scheme, such as FM (frequency modulation) or AM (amplitude modulation) or PM (phase modulation), or a digital modulation scheme, such as ASK (amplitude shift keying), e.g. On-Off keying, FSK (frequency shift keying), PSK (phase shift keying), e.g. MSK (minimum shift keying), or QAM (quadrature amplitude modulation). In an embodiment, the wireless link is based on a standardized or proprietary technology. In an embodiment, the wireless link is based on Bluetooth technology (e.g. Bluetooth Low-Energy technology).
In an embodiment, the hearing device is portable device, e.g. a device comprising a local energy source, e.g. a battery, e.g. a rechargeable battery.
In an embodiment, the hearing device comprises a forward or signal path between an input transducer (microphone system and/or direct electric input (e.g. a wireless receiver)) and an output transducer. In an embodiment, the signal processing unit is located in the forward path. In an embodiment, the signal processing unit is adapted to provide a frequency dependent gain according to a user's particular needs. In an embodiment, the hearing device comprises an analysis path comprising functional components for analyzing the input signal (e.g. determining a level, a modulation, a type of signal, an acoustic feedback estimate, etc.). In an embodiment, some or all signal processing of the analysis path and/or the signal path is conducted in the frequency domain. In an embodiment, some or all signal processing of the analysis path and/or the signal path is conducted in the time domain.
In an embodiment, an analogue electric signal representing an acoustic signal is converted to a digital audio signal in an analogue-to-digital (AD) conversion process, where the analogue signal is sampled with a predefined sampling frequency or rate fs, fs being e.g. in the range from 8 kHz to 48 kHz (adapted to the particular needs of the application) to provide digital samples xn (or x[n]) at discrete points in time tn (or n)), each audio sample representing the value of the acoustic signal at tn by a predefined number Nb of bits, Nb being e.g. in the range from 1 to 48 bits, e.g. 24 bits. A digital sample x has a length in time of 1/fs, e.g. 50 μs, for fs=20 [kHz]. In an embodiment, a number of audio samples are arranged in a time frame. In an embodiment, a time frame comprises 64 or 128 audio data samples. Other frame lengths may be used depending on the practical application.
In an embodiment, the hearing devices comprise an analogue-to-digital (AD) converter to digitize an analogue input with a predefined sampling rate, e.g. 20 kHz. In an embodiment, the hearing devices comprise a digital-to-analogue (DA) converter to convert a digital signal to an analogue output signal, e.g. for being presented to a user via an output transducer.
In an embodiment, the hearing device, e.g. the microphone unit, and or the transceiver unit comprise(s) a TF-conversion unit for providing a time-frequency representation of an input signal. In an embodiment, the time-frequency representation comprises an array or map of corresponding complex or real values of the signal in question in a particular time and frequency range. In an embodiment, the TF conversion unit comprises a filter bank for filtering a (time varying) input signal and providing a number of (time varying) output signals each comprising a distinct frequency range of the input signal. In an embodiment, the TF conversion unit comprises a Fourier transformation unit for converting a time variant input signal to a (time variant) signal in the frequency domain. In an embodiment, the frequency range considered by the hearing device from a minimum frequency fmin to a maximum frequency fmax comprises a part of the typical human audible frequency range from 20 Hz to 20 kHz, e.g. a part of the range from 20 Hz to 12 kHz. In an embodiment, a signal of the forward and/or analysis path of the hearing device is split into a number M of frequency bands, where M is e.g. larger than 5, such as larger than 10, such as larger than 50, such as larger than 100, such as larger than 500, at least some of which are processed individually. In an embodiment, the hearing device is/are adapted to process a signal of the forward and/or analysis path in a number Q of different frequency channels (M≤Q). The frequency channels may be uniform or non-uniform in width (e.g. increasing in width with frequency), overlapping or non-overlapping.
In an embodiment, the hearing device comprises a number of detectors configured to provide status signals relating to a current physical environment of the hearing device (e.g. the current acoustic environment), and/or to a current state of the user wearing the hearing device, and/or to a current state or mode of operation of the hearing device. Alternatively or additionally, one or more detectors may form part of an external device in communication (e.g. wirelessly) with the hearing device. An external device may e.g. comprise another hearing device, a remote control, and audio delivery device, a telephone (e.g. a Smartphone), an external sensor, etc.
In an embodiment, one or more of the number of detectors operate(s) on the full band signal (time domain). In an embodiment, one or more of the number of detectors operate(s) on band split signals ((time-) frequency domain).
In an embodiment, the number of detectors comprises a level detector for estimating a current level of a signal of the forward path. In an embodiment, the predefined criterion comprises whether the current level of a signal of the forward path is above or below a given (L-)threshold value.
In a particular embodiment, the hearing device comprises a voice detector (VD) for determining whether or not an input signal comprises a voice signal (at a given point in time). A voice signal is in the present context taken to include a speech signal from a human being. It may also include other forms of utterances generated by the human speech system (e.g. singing). In an embodiment, the voice detector unit is adapted to classify a current acoustic environment of the user as a VOICE or NO-VOICE environment. This has the advantage that time segments of the electric microphone signal comprising human utterances (e.g. speech) in the user's environment can be identified, and thus separated from time segments only comprising other sound sources (e.g. artificially generated noise). In an embodiment, the voice detector is adapted to detect as a VOICE also the user's own voice. Alternatively, the voice detector is adapted to exclude a user's own voice from the detection of a VOICE.
In an embodiment, the hearing device comprises an own voice detector for detecting whether a given input sound (e.g. a voice) originates from the voice of the user of the system. In an embodiment, the microphone system of the hearing device is adapted to be able to differentiate between a user's own voice and another person's voice and possibly from NON-voice sounds.
In an embodiment, the hearing device comprises a classification unit configured to classify the current situation based on input signals from (at least some of) the detectors, and possibly other inputs as well. In the present context ‘a current situation’ is taken to be defined by one or more of
a) the physical environment (e.g. including the current electromagnetic environment, e.g. the occurrence of electromagnetic signals (e.g. comprising audio and/or control signals) intended or not intended for reception by the hearing device, or other properties of the current environment than acoustic;
b) the current acoustic situation (input level, acoustic feedback, etc.), and
c) the current mode or state of the user (movement, temperature, activity, etc.);
d) the current mode or state of the hearing device (program selected, time elapsed since last user interaction, etc.) and/or of another device in communication with the hearing device.
In an embodiment, the hearing device further comprises other relevant functionality for the application in question, e.g. feedback suppression, etc.
Use:
In an aspect, use of a hearing device as described above, in the ‘detailed description of embodiments’ and in the claims, is moreover provided. In an embodiment, use is provided in a system comprising audio distribution, e.g. a system comprising a microphone and a loudspeaker. In an embodiment, use is provided in a system comprising one or more hearing instruments, headsets, ear phones, active ear protection systems, etc., e.g. in handsfree telephone systems, teleconferencing systems, public address systems, karaoke systems, classroom amplification systems, etc.
A Method:
In an aspect, a method of operating a hearing device, e.g. a hearing aid, is provided. The method comprises
It is intended that some or all of the structural features of the hearing device described above, in the ‘detailed description of embodiments’ or in the claims can be combined with embodiments of the method, when appropriately substituted by a corresponding process and vice versa. Embodiments of the method have the same advantages as the corresponding hearing devices.
A Computer Readable Medium:
In an aspect, a tangible computer-readable medium storing a computer program comprising program code means for causing a data processing system to perform at least some (such as a majority or all) of the steps of the method described above, in the ‘detailed description of embodiments’ and in the claims, when said computer program is executed on the data processing system is furthermore provided by the present application.
By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media. In addition to being stored on a tangible medium, the computer program can also be transmitted via a transmission medium such as a wired or wireless link or a network, e.g. the Internet, and loaded into a data processing system for being executed at a location different from that of the tangible medium.
A Data Processing System:
In an aspect, a data processing system comprising a processor and program code means for causing the processor to perform at least some (such as a majority or all) of the steps of the method described above, in the ‘detailed description of embodiments’ and in the claims is furthermore provided by the present application.
A Hearing System:
In a further aspect, a hearing system comprising a hearing device as described above, in the ‘detailed description of embodiments’, and in the claims, AND an auxiliary device is moreover provided.
In an embodiment, the system is adapted to establish a communication link between the hearing device and the auxiliary device to provide that information (e.g. control and status signals, possibly audio signals) can be exchanged or forwarded from one to the other.
In an embodiment, the auxiliary device is or comprises an audio gateway device adapted for receiving a multitude of audio signals (e.g. from an entertainment device, e.g. a TV or a music player, a telephone apparatus, e.g. a mobile telephone or a computer, e.g. a PC) and adapted for selecting and/or combining an appropriate one of the received audio signals (or combination of signals) for transmission to the hearing device. In an embodiment, the auxiliary device is or comprises a remote control for controlling functionality and operation of the hearing device(s). In an embodiment, the function of a remote control is implemented in a SmartPhone, the SmartPhone possibly running an APP allowing to control the functionality of the audio processing device via the SmartPhone (the hearing device(s) comprising an appropriate wireless interface to the SmartPhone, e.g. based on Bluetooth or some other standardized or proprietary scheme).
In an embodiment, the auxiliary device is another hearing device. In an embodiment, the hearing system comprises two hearing devices adapted to implement a binaural hearing system, e.g. a binaural hearing aid system.
An App:
In a further aspect, a non-transitory application, termed an APP, is furthermore provided by the present disclosure. The APP comprises executable instructions configured to be executed on an auxiliary device to implement a user interface for a hearing device or a hearing system described above in the ‘detailed description of embodiments’, and in the claims. In an embodiment, the APP is configured to run on a cellular phone, e.g. a smartphone, or on another portable device allowing communication with said hearing device or said hearing system.
In the present context, a ‘hearing device’ refers to a device, such as a hearing aid, e.g. a hearing instrument, or an active ear-protection device, or other audio processing device, which is adapted to improve, augment and/or protect the hearing capability of a user by receiving acoustic signals from the user's surroundings, generating corresponding audio signals, possibly modifying the audio signals and providing the possibly modified audio signals as audible signals to at least one of the user's ears. A ‘hearing device’ further refers to a device such as an earphone or a headset adapted to receive audio signals electronically, possibly modifying the audio signals and providing the possibly modified audio signals as audible signals to at least one of the user's ears. Such audible signals may e.g. be provided in the form of acoustic signals radiated into the user's outer ears, acoustic signals transferred as mechanical vibrations to the user's inner ears through the bone structure of the user's head and/or through parts of the middle ear as well as electric signals transferred directly or indirectly to the cochlear nerve of the user.
The hearing device may be configured to be worn in any known way, e.g. as a unit arranged behind the ear with a tube leading radiated acoustic signals into the ear canal or with an output transducer, e.g. a loudspeaker, arranged close to or in the ear canal, as a unit entirely or partly arranged in the pinna and/or in the ear canal, as a unit, e.g. a vibrator, attached to a fixture implanted into the skull bone, as an attachable, or entirely or partly implanted, unit, etc. The hearing device may comprise a single unit or several units communicating electronically with each other. The loudspeaker may be arranged in a housing together with other components of the hearing device, or may be an external unit in itself (possibly in combination with a flexible guiding element, e.g. a dome-like element).
More generally, a hearing device comprises an input transducer for receiving an acoustic signal from a user's surroundings and providing a corresponding input audio signal and/or a receiver for electronically (i.e. wired or wirelessly) receiving an input audio signal, a (typically configurable) signal processing circuit for processing the input audio signal and an output unit for providing an audible signal to the user in dependence on the processed audio signal. The signal processing unit may be adapted to process the input signal in the time domain or in a number of frequency bands. In some hearing devices, an amplifier and/or compressor may constitute the signal processing circuit. The signal processing circuit typically comprises one or more (integrated or separate) memory elements for executing programs and/or for storing parameters used (or potentially used) in the processing and/or for storing information relevant for the function of the hearing device and/or for storing information (e.g. processed information, e.g. provided by the signal processing circuit), e.g. for use in connection with an interface to a user and/or an interface to a programming device. In some hearing devices, the output unit may comprise an output transducer, such as e.g. a loudspeaker for providing an air-borne acoustic signal or a vibrator for providing a structure-borne or liquid-borne acoustic signal. In some hearing devices, the output unit may comprise one or more output electrodes for providing electric signals (e.g. a multi-electrode array for electrically stimulating the cochlear nerve).
In some hearing devices, the vibrator may be adapted to provide a structure-borne acoustic signal transcutaneously or percutaneously to the skull. In some hearing devices, the vibrator may be implanted in the middle ear and/or in the inner ear. In some hearing devices, the vibrator may be adapted to provide a structure-borne acoustic signal to a middle-ear bone and/or to the cochlea. In some hearing devices, the vibrator may be adapted to provide a liquid-borne acoustic signal to the cochlear fluids, e.g. through the oval window. In some hearing devices, the output electrodes may be implanted in the cochlea or on the inside of the skull bone and may be adapted to provide the electric signals to the hair cells of the cochlea, to one or more hearing nerves, to the auditory brainstem, to the auditory midbrain, to the auditory cortex and/or to other parts of the cerebral cortex and associated structures.
A hearing device, e.g. a hearing aid, may be adapted to a particular user's needs, e.g. a hearing impairment. A configurable signal processing circuit of the hearing device may be adapted to apply a frequency and level dependent compressive amplification of an input signal. A customized frequency and level dependent gain may be determined in a fitting process by a fitting system based on a user's hearing data, e.g. an audiogram, using a generic or proprietary fitting rationale. The frequency and level dependent gain may e.g. be embodied in processing parameters, e.g. uploaded to the hearing device via an interface to a programming device (fitting system), and used by a processing algorithm executed by the configurable signal processing circuit of the hearing device.
A ‘hearing system’ refers to a system comprising one or two hearing devices, and a ‘binaural hearing system’ refers to a system comprising two hearing devices and being adapted to cooperatively provide audible signals to both of the user's ears. Hearing systems or binaural hearing systems may further comprise one or more ‘auxiliary devices’, which communicate with the hearing device(s) and affect and/or benefit from the function of the hearing device(s). Auxiliary devices may be e.g. remote controls, audio gateway devices, mobile phones (e.g. SmartPhones), or music players. Hearing devices, hearing systems or binaural hearing systems may e.g. be used for compensating for a hearing-impaired person's loss of hearing capability, augmenting or protecting a normal-hearing person's hearing capability and/or conveying electronic audio signals to a person. Hearing devices or hearing systems may e.g. form part of or interact with public-address systems, active ear protection systems, hands free telephone systems, car audio systems, entertainment (e.g. karaoke) systems, teleconferencing systems, classroom amplification systems, etc.
The aspects of the disclosure may be best understood from the following detailed description taken in conjunction with the accompanying figures. The figures are schematic and simplified for clarity, and they just show details to improve the understanding of the claims, while other details are left out for the sake of brevity. Throughout, the same reference numerals are used for identical or corresponding parts. The individual features of each aspect may each be combined with any or all features of the other aspects. These and other aspects, features and/or technical effect will be apparent from and elucidated with reference to the illustrations described hereinafter in which:
The figures are schematic and simplified for clarity, and they just show details which are essential to the understanding of the disclosure, while other details are intentionally left out. Throughout, the same reference signs are used for identical or corresponding parts.
Further scope of applicability of the present disclosure will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the disclosure, are given by way of illustration only. Other embodiments may become apparent to those skilled in the art from the following detailed description.
The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practised without these specific details. Several aspects of the apparatus and methods are described by various blocks, functional units, modules, components, circuits, steps, processes, algorithms, etc. (collectively referred to as “elements”). Depending upon particular application, design constraints or other reasons, these elements may be implemented using electronic hardware, computer program, or any combination thereof.
The electronic hardware may include microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described throughout this disclosure. The term ‘computer program’ shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.
The present application relates to the field of hearing devices, e.g. hearing aids.
In the following, the concept of compressive amplification (CA) is outlined in an attempt to highlight the problems that the SNR driven compressive amplification system (SNRCA) of the present disclosure addresses.
Compressive amplification (CA) is designed and used to restore speech audibility.
With x[n] the signal at the input of the compressor (i.e. CA scheme), e.g. the electric input signal (time domain), n the sampled time index, one can write x[n] as the sum of the M sub-bands signals xm[n]:
Each of the M sub-bands can be used as a level estimation channel, and produce lm,τ[n], an estimate of the power level Px
lm,τ[n]=Hm(|xm[n]|2,n,τ)
Using the compression characteristic curve, i.e. a function that maps the level of each channel lm to a channel gain gm(lm), the compressor computes, for each estimated level lm,τ[n], a gain gm[n]=gm(lm,τ[n]) that can be applied on xm[n] to produce the amplified mth sub-band ym[n]:
ym[n]=gm[n]xm[n]
The gain gm[n] is a function of the estimated input level lm,τ[n], i.e. gm[n]=gm(lm,τ[n]), under the following constraints: For the two estimated level lsoft and lloud with
lsoft<lloud
The corresponding gains gsoft=g(lsoft) and gloud=g(lloud) satisfy:
gsoft≥gloud
However, the compression ratio shall not be negative, so the following condition is always satisfied:
lsoftgsoft≤lloudgloud
The compressor output signal y[n] can be reconstructed as follows:
However, applied to noisy signals, CA tends to degrade the SNR, behaving as a noise amplifier (see next section for more details). In other words, SNRO the SNR at the output of the compressor is potentially smaller than SNRI the SNR at the input of the compressor:
SNRO≤SNRI
1. Compressive Amplification and SNR Degradation:
Depending on the long-term broadband SNR at the compressor input, classical CA can (in certain acoustic situations) be counter-productive in terms of SNR as mentioned above. Before going more in details into this in the next sub-sections, please find some definitions in the following:
Time Constants
τL and τG are averaging time constants satisfying
τL≤τG
τL represents a relative short time: Its magnitude order typically corresponds to the length of a phoneme or a syllable (i.e. 1 to less than 100 ms.).
τG represents a relative long time: Its magnitude order typically corresponds to the length of one two several words or even sentences (i.e. 0.5 s to more than 5 s).
Usually, the difference in magnitude order between τL and τG is large, i.e.
τL<<τG
e.g. τL≤10τG.
Bandwidths
ΔfL and ΔfG are bandwidths satisfying
ΔfL≤ΔfG
ΔfL represents a relative narrow bandwidth. It is typically the bandwidth used in auditory filter banks, i.e. from several Hertz to several kHz.
ΔfG represents the full bandwidth of the processed signal. It is defined as half the sampling frequency fs, i.e. ΔfG=fs/2. In current HA, it is typically between 8 to 16 kHz.
Usually, the difference in magnitude order between ΔfL and ΔfG is large, i.e.
ΔfL<<ΔfG
e.g. ΔfL≤10ΔfG.
Input and Output Signals
The input signal of the compressor, e.g. the electric input signal (CA scheme), is denoted x[n], where n is the sampled time index.
The output signal of the compressor (CA scheme) is denoted y[n].
Both x and y are broadband signals, i.e. they use the full bandwidth ΔfG.
xm[n] is the mth of the M sub-bands of the input signal x[n]. Its bandwidth ΔfL,m is smaller than ΔfG: compared to x, xm is localized in frequency.
ym[n] is the mth of the M sub-bands of the output signal y[n]. Its bandwidth ΔfL,m is smaller than ΔfG: compared to y, ym is localized in frequency.
Note that if the filter bank that splits x into the M sub-bands xm is uniform, then ΔfL,m=ΔfL for all m. In the rest of this text, we assume the usage of constants bandwidth sub-bands, i.e. ΔfL,m=ΔfL, without loss of generality: Assuming the signal is split into M′ sub-bands with non-constant bandwidth ΔfL,m′, one can select a bandwidth ΔfL,m=ΔfL that is the greatest common divisor of bandwidth ΔfL,m′, i.e. ΔfL,m′=Cm′ΔfL with Cm′ a strictly positive integer for all m′. The new number of sub-bands is
Level estimation in sub-bands in the gain application can be emulated:
Gain application in larger sub-bands can be emulated:
The broadband input signal segment
The broadband output signal segment
The broadband input signal segment
Additive Noise Model
The broadband input signal x[n] can be modelled as the sum of the broadband input speech signal s[n] and the broadband input noise (disturbance) d[n]:
x[n]=s[n]+d[n]
The sub-band input signal xm[n] can be modelled as the sum of the input sub-band speech signal sm[n] and the input sub-band noise (disturbance) dm[n]:
xm[n]=sm[n]+dm[n]
The broadband output signal y[n] can be modelled as the sum of the broadband output speech signal ys[n] and the broadband output noise (disturbance) yd[n]:
y[n]=ys[n]+yd[n]
The sub-band output signal ym[n] can be modelled as the sum of the output sub-band speech signal ys
ym[n]=ys
Input Power
Px
Note that in CA, the level estimation stage provide an estimate lm,τ
lm,τ
Ps
Pd
Note that in SNRCA, a noise power estimator is used to provide an estimate ld
ld
Note also that Px
Px,τ
Ps,τ
Pd,τ
Note that Px,τ
Px=Px,τ
Ps=Ps,τ
Pd=Pd,τ
Note that Px,τ
Output Power
is the average sub-band output signal power over a time τL=KL/fs
is the average sub-band input speech power over a time τL=KL/fs
is the average sub-band output noise power over a time τL=KL/fs
Py,τ
Py
Py
Py=Py,τ
Py
Py
Input SNR
SNRI,m,τ
SNRI,m,τ
SNRI,τ
SNRI,τ
SNRI,m,τ
SNRI,m,τ
SNRI=SNRI,τ
SNRI,τ
Output SNR
SNRO,m,τ
SNRO,τ
SNRO,m,τ
SNRO=SNRO,τ
SNRO,τ
Global and Local SNR
The term ‘input global SNR’ or simply ‘global SNR’ denotes a signal to noise ratio computed on the broadband (i.e. full bandwidth ΔfG) input signal x of the compressor, and averaged over a relative long time τG:
SNR(
The term ‘output global SNR’ denotes a signal to noise ratio computed on the broadband (i.e. full bandwidth ΔfG) output signal y of the compressor, and averaged over a relative long time τG:
SNR(
The term ‘input local SNR’ or simply ‘local SNR’ denotes interchangeably, according to the context:
a signal to noise ratio computed on the broadband (i.e. full bandwidth ΔfG) input signal x of the compressor, and averaged over a relative short time τL
SNR(
or a signal to noise ratio computed on the sub-band (i.e. bandwidth ΔfL,m) input signal xm of the compressor, and averaged over a relative long time τG
SNR(
or a signal to noise ratio computed on the sub-band (i.e. bandwidth ΔfL) input signal xm of the compressor, and averaged over a relative short time τL
SNR(
The local SNR is denoted SNRL as long as, in the discussed context:
SNR and Modulated Temporal Envelope
Let be a the sum of two orthogonal signals u and v, i.e
a=u+v
and
Pa,τ
Let u have a temporal envelope that is more modulated than the temporal envelope of v. This means that the variance
of Pu,τ
of Pv,τ
With
And
The variances can be estimated as follows:
Respectively
Let u have a long term power larger than v, i.e.
Pu,τ
The situation is illustrated by an example on
Pv,τ
Pa,τ
Because
Pu,τ
On the other hand, in the modulated envelope valleys (approximately 0.6 s and 1.6 s) the total power Pa,τ
Pa,τ
Because
Pu,τ
Let b be the output of CA with a as input, with bu and bv the compressed counterpart of u and v respectively:
b=bu+bv
Pb
If u represents the speech and v the noise (case 1a), the soundscape can be describe as follows:
Speech is more modulated than steady state noise.
Note: This situation might happen to be broadband, i.e. if u=s, v=d, a=x, bu=yu, bv=yv and, b=y or in some sub-band m, i.e. u=sm, v=dm, a=xm, bu=ys
If v represents the speech and u the noise (case 1b), the soundscape can be describe as follows:
Speech is less modulated than noise.
Note: This situation might happen to be broadband, i.e. if u=s, v=d, a=x, bu=yu, bv=yv and, b=y or in some sub-band m, i.e. u=sm, v=dm, a=xm, bu=ys
Let u have a long term power smaller than v, i.e.
Pu,τ
The situation is illustrated by an example on
Pv,τ
Pa,τ
excepted on the peaks of the temporal envelope (approximately 0.4 s and 1.25 s) where Pu,τ
Pu,τ
Or even
Pu,τ
Let b be the output of CA with a as input, with bu and bv the compressed counterpart of u and v respectively:
b=bu+bv
Pb
If u represents the speech and v the noise (case 2a),
Speech is more modulated than noise.
Note: This situation might happen to be broadband, i.e. if u=s, v=d, a=x, bu=yu, bv=yv and, b=y or in some sub-band m, i.e. u=sm, v=dm, a=xm, bu=ys
If v represents the speech and u the noise (case 2b),
Speech is less modulated than noise.
Note: This situation might happen to be broadband, i.e. if u=s, v=d, a=x, bu=yu, bv=yv and, b=y or in some sub-band m, i.e. u=sm, v=dm, a=xm, bu=ys
Summary for compressive amplification of the modulated temporal envelope:
SNR and Modulated Spectral Envelope
Let be am the sum of two orthogonal sub-bands signals um and vm, i.e
am=um+vm
and
Pa
Let um have a higher spectral contrast than vm, i.e. um has a spectral envelope that is more modulated than the spectral envelope of vm. This means that the variance
of Pu
of Pv
With
And
The variances can be estimated as follows:
Respectively
Let u have a broadband power larger than v, i.e.
Pu,τ≥Pv,τ
The situation is illustrated by an example on
On the peak of the spectral envelope (e.g. approximately 200 Hz) the total power Pa
Pa
Because
Pu
On the other hand, in the modulated envelope valleys (e.g. 8 kHz) the total power Pa
Pa
Because
Pu
Let bm be the output of CA with a as input, with bu
bm=bu
Pb
If um represents the speech and vm the noise (case 1a), the soundscape can be describe as follows:
Speech has more spectral contrast than noise.
Note: This situation might happen over a long term (τ=τG) or a short term (τ=τL).
If vm represents the speech and um the noise (case 1b), the soundscape can be describe as follows:
Noise has more spectral contrast than speech.
Note: This situation might happen over the long term (τ=τG) or the short term (τ=τL).
Let v have a broadband power larger than u, i.e.
Pv,τ≥Pu,τ
The situation is illustrated by an example on
Because vm has more power than um, am has a relative weak spectral contrast, similar to vm. In general, the total power Pa
Pa
except on the peaks of the spectral envelope (e.g at approximately 200 Hz) where Pu
Pu
Or even
Pu
Let bm be the output of CA with a as input, with bu
Pb
If um represents the speech and vm the noise (case 2a), the soundscape can be describe as follows:
Speech has more spectral contrast than noise.
Note: This situation might happen over the long term (τ=τG) or the short term (τ=τL).
If vm represents the speech and um the noise (case 2b), the soundscape can be describe as follows:
Noise has more spectral contrast than speech.
Note: This situation might happen over the long term (τ=τG) or the short term (τ=τL).
Summary for compressive amplification of the modulated spectral envelope:
Conclusion (CA and SNR Degradation)
In theory, CA is not systematically a bad things in terms of SNR. However, the cases where one can expect CA to cause SNR improvements are almost unlikely and irrelevant, in particular if, as it is the case in modern hearing instruments (see next section), CA is placed behind a noise reduction (NR) system. In conclusion, CA should be considered as globally counter-productive in terms of SNR.
2. Noise Reduction and Compressive Amplification:
Because a noise reduction (NR) systematically improves the SNR (SNRO≥SNRI), while CA improves the SNR if it is negative at its input, i.e. SNRO≥SNRI if SNRI<0, but degrades it if it is positive at its input, i.e SNRO≤SNRI if SNRI>0, (see section 1, SNR and Modulated Temporal Envelope as well as SNR and Modulated Spectral Envelope), one might be tempted to conclude that the optimal setup places the CA before the NR, maximizing the chances of SNR improvement.
However, such a design ignores that:
On one hand, let assume that one can design an arbitrarily good NR scheme that is able to remove 100% of the noise, i.e. systematically producing an infinite output SNR, independently of whether it is placed before or after the CA. On the other hand, it is well known that an NR scheme can, by definition, only attenuate the signal. So, at the input of the CA, the noisy input signal can only be softer if the NR is placed before the CA than if there is no NR or if the NR is placed after the CA. If one use the arbitrarily good NR scheme described above, the output signal of the whole system, NR and CA, has an infinite SNR (independently of where one would place the NR) but it is under-amplified if the NR is placed after the CA compared to a placement before the CA. Indeed if the NR is placed after the CA, the CA is analyzing a noise corrupted signal that can only be louder that its noise free counterpart, and by the way get less gain, which would result in a poorer HLC performance. Consequently, the better the NR scheme, the more sense it makes to place the NR before the CA.
It is better to place the NR in front of the CA. For SNR based CA according to the present disclosure, there is virtually no reason to not place the NR at the output of the compressor.
For completeness purpose, let's discuss both NR placed at the input as well as at the output of the compressor.
NR Placement Relative to CA:
Using a noise reduction (NR) system (e.g. comprising directionality (spatial filtering/beamforming) and noise suppression) potentially provides global SNR improvements but does not prevent the SNR degradation caused by classic CA. This is independent of the NR location (i.e. at the input or the output of the CA).
NR at the CA Output:
The SNR of the source signal can be:
NR at the CA Input:
As long as the NR is not able to increase the SNR to infinity (which is of course not realistic), there is still residual noise at the NR output. The SNR of the NR output signal can be:
In fact, the better the NR scheme, the higher the likelihood of a positive SNR at the output of the NR. In other words, the better the NR scheme, the more important is the design of the enhanced CA, capable of minimizing the SNR degradation. This can be accomplished with a system like SNRCA according to the present disclosure that limits the amount of SNR degradation.
3. The SNR Driven Compressive Amplification System (SNRCA):
The SNRCA is a concept designed to alleviate the undesired noise amplification caused by applying CA on noisy signals. On the other hand, it provides classic CA like amplification for noise-free signals.
Among the 4 cases (1a, 1b, 2a, and 2b for time domain as well as for frequency domain) described in an above section 1, only cases 1a and 2a are relevant use cases for modern HA (i.e. HA using NR placed before the compressor) that describe how the SNRCA must behave and what it must achieve:
The above 3 use cases can be interpreted as follows:
Requirement: Speech Distortion Minimization:
The minimal distortion requirement will only be guaranteed by proper design and configuration of the linearization and gain relaxing mechanisms, such that, in very high SNR conditions, they will not modify the expected gain in a direction that is away from the prescribed gain and compression that is achieved by classic CA.
Requirement: Linearization/Compression Relaxing:
It is possible to imagine achieving SNR dependent linearization by increasing the time constants used by the level estimation based on the SNR estimate.
However, this solution has a severe limitation: Slowed down CA minimizes undesired noise amplification at the risk of over-amplification at speech onset or transients.
Instead, it is proposed to provide an SNR based post-processing of the level estimate. In an embodiment, an SNR controlled level offset is provided, whereby SNRCA linearizes the level estimate for a decreasing SNR.
Requirement: Gain Relaxing:
Gain relaxing is provided, when the signal contains no speech but only weakly modulated noise, i.e. when the global (long-term and across sub-bands) SNR becomes very low.
The CA logically amplifies such a noise signal by a gain corresponding to its level. It is however questionable if such amplification of a noise is really useful? Indeed:
In other words, the CA delivered gain must be (at least partially) relaxed in such situations. Because such signals are weakly modulated, the role played by the time domain resolution (TDR, i.e. the used level estimation time constants) of the level estimation tends to be zero. Consequently, such a gain relaxing cannot be achieved by linearization (increasing the time constant, estimated level post correction, etc.)
However, SNRCA achieves gain relaxing by decreasing the gain at the output of the “Level to Gain Curve” unit as seen in
SNRCA Processing and Processing Elements: Short Description
Using continuous local (short-term and sub-band) as well as global (long-term and broadband) SNR estimations, the proposed SNR driven compressive amplification system (SNRCA) is able to:
Compared to classic CA, SNRCA based CA is made of 3 new components:
SNRCA Processing and Processing Elements: Full Description.
The dynamic (SNR driven) compressive amplification system (SNRCA) (in the following termed ‘the SNRCA unit’, and indicated by the dotted rectangular enclosure in
The SNRCA unit further comprises a control unit (CTRU) configured to analyse the electric input signal IN (or a signal derived therefrom) and to provide a classification of the electric input signal IN and providing the first and second control signals CTR1, CTR2 based on the classification.
1. A level envelope estimation stage (comprising units LEU1, LEU2) providing fast and slow level estimates LE1 and LE2, respectively. The level of the temporal envelope is estimated both at a high (LE1) and at a low (LE2) time-domain resolution.
2. The SNR estimation stage (comprising units NPEU, LSNRU, GSNRU, and SALEU) that may provide and comprise:
4. A level envelope post-processing stage (comprising units LMOD and LPP) providing the modified estimated level (signal MLE) obtained by combining the level of the modulated envelope (signal LE1), i.e. the instantaneous or short-term level of the envelope, the envelope average level (signal LE2), i.e. a long-term level of the envelope, as well as a level offset bias (signal CTR1) that depends on the local and global SNR (signals LSNR, GSNR). Compared to the instantaneous short-term level (signal LE1), the modified estimated level (signal MLE) may provide linearized behavior for degraded SNR conditions (compression relaxing).
The compression characteristics (comprising unit L2G providing signal CAG): It is made of a level to gain mapping curve function. This curve generates a channel gain gq, with q=0, . . . , Q−1, for each channel q among the Q different channels using the M sub-bands level estimates as input. The output signal CAG contains Gq, the Q channel gains converted in dB, i.e. Gq=20 log10(gq). If the M estimation sub-bands and the Q gain channels have a 1 to 1 relationship (implying M=Q), the level to gain mapping is simply gm=gm(lm). If such a trivial mapping is not used, e.g. when M<Q, the mapping is done using some interpolation (usually zero-order interpolation for simplicity). In that case, each gq is potentially a function of the M level estimates lm, i.e. gq=gq(l0, . . . , lM-1), with m=0, . . . , M−1. The mapping is very often realized after converting the level estimates into dB, i.e. Gq(L0, . . . , LM-1), with Lm=log10(lm). As input, though, instead of the ‘true’ estimate of the level (LE1) of the envelope of the electric input signal IN, it receives the modified (post-processed in LPP unit) level estimate MLE. In other words, MLE contains the M sub-bands level estimates {tilde over (L)}m (see LPP unit,
5. A gain post-processing stage (comprising units GMOD and GPP providing modified gain (signal MCAG): The speech absence likelihood estimate (signal SALE, cf. also
As in the embodiment of
The forward path may comprise further processing units, e.g. for applying other signal processing algorithms, e.g. frequency shift, frequency transposition beamforming, noise reduction, etc.
Local SNR Estimation (Unit LSNRU)
Ξm,τ
Ξm,τ
Ξm,τ
The saturation is required because without it, the signal Ξm,τ
The choice of the operational range spanned by Ξfloor,m and Ξceil,m must be done such that the smoothed Ξm,τ
Typical values for [Ξfloor,m, Ξceil,m] are [−25,100] dB.
In the LSNRU unit, the signal W1 contains the zero-floored (unit MAX1) difference (unit SUB1) of the signals LE1 and NPE, converted in decibel (unit DBCONV1), i.e. 10 log10 (max(lm,τ
Global SNR Estimation (Unit GSNRU)
With A being a linear low pass filter, typically a 1st order infinite impulse response filter, configured such that τG is the total averaging time constant, i.e. such that Ξτ
Ξτ
In the GSNRU unit, the input signal LSNR that contains the M local SNR estimate Ξm,τ
To generate the biased local SNR Bm,τ
Bm,τ
Unit SNR2ΔSNR produces the SNR bias ΔΞm,τ
With ΔΞmin,m<ΔΞmax,m≤0 the smallest respectively largest SNR bias for sub-band m, Ξmin,m<Ξmax,m the threshold SNR values of for sub-bands m where Ξτ
Unit SNR2ΔL produces the level estimation offset ΔLm[n] (signal CTR1) by mapping the biased local SNR Bm,τ
With ΔLmin,m<ΔLmax,m≤0 the smallest respectively largest level estimation offset for sub-band m, Bmin,m<Bmax,m the threshold SNR values of for sub-bands m where Bm,τ
wm[n]=min(f(max(pm[n]−ptol,0),1/(1−ptol)),1)
With ptol defining a tolerance (a likelihood below ptol produces a modification gain equal to zero) and f some mapping function that has an average slope of 1/(1−ptol) over the interval [ptol,1]. However, to maintain the required computational resources low current (as is advantageous in battery driven, portable electronic devices, such as hearing aids), it is proposed to simply make f linear over [ptol,1], i.e.
wm[n]=min(1/(1−ptol)·max(pm[n]−ptol,0),1)
Typically, the smallest value for ptol is ptol=½.
Lm,τ
And
Lm,τ
The LPP unit output {tilde over (L)}m,τ[n] (signal MLE) is obtained by combining, for each sub-band m, the local and global level estimates (Lm,τ
{tilde over (L)}m,τ[n]=max(ΔLm,τ
However, in the unit L2G (cf.
S1 receiving or providing an electric input signal with a first dynamic range of levels representative of a time variant sound signal, the electric input signal comprising a target signal and/or a noise signal;
S2 providing a level estimate of said electric input signal;
S3 providing a modified level estimate of said electric input signal in dependence of a first control signal;
S4 providing a compressive amplification gain in dependence of said modified level estimate and hearing data representative of a user's hearing ability;
S5 providing a modified compressive amplification gain in dependence of a second control signal;
S6 analysing said electric input signal to provide a classification of said electric input signal, and providing said first and second control signals based on said classification;
S7 applying said modified compressive amplification gain to said electric input signal or a processed version thereof; and
S8 providing output stimuli perceivable by a user as sound representative of said electric input signal or a processed version thereof.
Some of the steps may, if convenient or appropriate, be carried out in another order than outlined above (or indeed in parallel).
In total summary, traditional compressive amplification (CA) is designed (i.e. prescribed by fitting rationales) for speech in quiet. CA with real world (noisy) signals has the following properties (both in time and frequency domain):
a) the SNR at the output of compressor is smaller than the SNR at the input of the compressor, if the input SNR>0 (SNR DEGRADATION),
b) the SNR at the output of the compressor is larger than the SNR at the input of the compressor, if the input SNR<0 (SNR IMPROVEMENT),
c) that situation (b) is unlikely, in particular with the use of a noise reduction,
d) when the SNR at the input of the compressor tends towards minus infinity (noise only), it is probably better not to amplify at all.
Conclusion from (a): compression might be a bad idea if the signal is noisy. Idea: relaxing the compression as function of the SNR.
Conclusion from (d): pure noise signal are not strongly modulated, so the compression ratio (as a function of the time constants, number of channels and static compression ratios in the gain map) has a limited influence. Idea: On the other hand, it might be reasonable to relax the amplification because the applied gain is defined for clean speech at the same level.
SNRCA concept/idea: drive the compressive amplification using SNR estimation(s).
Embodiments of the disclosure may e.g. be useful in applications where dynamic level compression is relevant such as hearing aids. The disclosure may further be useful in applications such as headsets, ear phones, active ear protection systems, hands free telephone systems, mobile telephones, teleconferencing systems, public address systems, karaoke systems, classroom amplification systems, etc.
It is intended that the structural features of the devices described above, either in the detailed description and/or in the claims, may be combined with steps of the method, when appropriately substituted by a corresponding process.
As used, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well (i.e. to have the meaning “at least one”), unless expressly stated otherwise. It will be further understood that the terms “includes,” “comprises,” “including,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will also be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element but an intervening elements may also be present, unless expressly stated otherwise. Furthermore, “connected” or “coupled” as used herein may include wirelessly connected or coupled. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. The steps of any disclosed method is not limited to the exact order stated herein, unless expressly stated otherwise.
It should be appreciated that reference throughout this specification to “one embodiment” or “an embodiment” or “an aspect” or features included as “may” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the disclosure. The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects.
The claims are not intended to be limited to the aspects shown herein, but is to be accorded the full scope consistent with the language of the claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more.
Accordingly, the scope should be judged in terms of the claims that follow.
Number | Name | Date | Kind |
---|---|---|---|
6198830 | Holube et al. | Mar 2001 | B1 |
20020057808 | Goldstein | May 2002 | A1 |
20030028374 | Ribic | Feb 2003 | A1 |
20120020485 | Visser et al. | Jan 2012 | A1 |
20120250883 | Narita | Oct 2012 | A1 |
20120263329 | Kjeldsen | Oct 2012 | A1 |
20140133666 | Tanaka | May 2014 | A1 |
20160322068 | Muesch | Nov 2016 | A1 |
Number | Date | Country |
---|---|---|
2375781 | Oct 2011 | WO |
2012161717 | Nov 2012 | WO |
2014166525 | Oct 2014 | WO |
Entry |
---|
Doblinger, “Computationally efficient speech enhancement by spectral minima tracking in subbands,” Proceedings of Euro Speech, vol. 2, 1995, 4 pages. |
Ephraim et al., “Speech enhancement using a minimum mean-square error log-spectral amplitude estimator,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP-33, No. 2, Apr. 1985, pp. 443-445. |
Naylor et al., “Long-term Signal-to-Noise Ratio at the input and output of amplitude compression systems,” Journal of the American Academy of Audiology, vol. 20, No. 3, 2009, pp. 161-171. |
Naylor, “Theoretical Issues of Validity in the Measurement of Aided Speech Reception Threshold in Noise for Comparing Nonlinear Hearing Aid Systems,” Journal of the American Academy of Audiology, vol. 27, No. 7, 2016, pp. 504-514. |
Scollie et al., “The Desired Sensation Level Multistage Input/Output Algorithm,” Trends in Amplification, vol. 9, No. 4, 2005, pp. 159-197. |
Number | Date | Country | |
---|---|---|---|
20180184213 A1 | Jun 2018 | US |