The present application relates to audio processing devices, in particular to identification of artifacts due to processing (e.g. noise reduction) algorithms in audio processing devices and in particular to reduction of musical noise. The disclosure relates specifically to an audio processing device comprising a forward path for processing an audio signal, the processing comprising the application of a processing (e.g. noise reduction) algorithm to a signal of the forward path.
The disclosure furthermore relates to the use of such device and to a method of operating an audio processing device. The disclosure further relates to a data processing system comprising a processor and program code means for causing the processor to perform at least some of the steps of the method.
Embodiments of the disclosure may e.g. be useful in applications such as hearing aids, headsets, ear phones, active ear protection systems, handsfree telephone systems, mobile telephones, teleconferencing systems, public address systems, karaoke systems, classroom amplification systems, etc.
The following account of the prior art relates to one of the areas of application of the present application, hearing aids.
Many state of the art hearing aids are equipped with a single-channel noise reduction (SC-NR) algorithm. In some modern hearing aids, the signal is represented internally as a time-frequency representation (which for multi-microphone hearing aids could be an output of a beamformer or directionality algorithm). A SC-NR algorithm applies a gain value to each time-frequency unit to reduce the noise level in the signal. The term ‘gain’ is in the present application used in a general sense to include amplification (gain >1) as well as attenuation (gain <1) as the case may be. In a noise reduction algorithm, however, the term ‘gain’ is typically related to ‘attenuation’. Specifically, a SC-NR algorithm estimates the signal-to-noise ratio (SNR) for each time-frequency coefficient and applies a gain value to each time-frequency unit based on this SNR estimate. Eventually, the noise-reduced (and possibly amplified and compressed) time-domain signal is reconstructed by passing the time-frequency representation of the noise-reduced signal through a synthesis filter bank.
When applying the gain to the time-frequency units, the SC-NR algorithm invariably introduces artifacts, because it bases its decisions on SNR estimates. The true SNR values are obviously not observable, since only the noisy signal is available. Some of these artifacts are known as “musical noise”, which are perceptually particularly annoying. It is well-known that the amount of “musical noise” can be reduced by limiting the maximum attenuation that the SC-NR is allowed to perform (cf. e.g. EP 2 463 856 A1), in other words by applying a ‘less aggressive’ noise reduction algorithm. The following tradeoff exists: 1) Larger maximum attenuation implies better noise reduction, but higher risk of introducing musical artifacts, and, on the other hand, 2) Lower maximum attenuation reduces the risk of musical artifacts but makes the noise reduction less effective. Therefore, an ideal maximum attenuation exists. However, the ideal maximum attenuation is dependent on input signal type, general SNR, frequency, etc. So, the ideal maximum attenuation is not fixed across time, but must be adapted to changing situations (as reflected in the input signal).
Recently, objective measures have been presented for estimating the amount of musical noise in a given noise-reduced signal, based on the noise-reduced signal itself, and the original noisy signal, the latter being the input to the SC-NR system (cf. e.g. [Uemura et al.; 2012], [Yu & Fingerscheidt; 2012] and [Uemura et al.; 2009]). More specifically, in [Uemura et al.; 2009] it is proposed to compare characteristics of the noisy unprocessed signal with signal characteristics of the noise-reduced signal to determine to which extent musical noise is present in the noise-reduced signal. It is found that the change (the ratio, in fact) of the signal kurtosis is a robust predictor of musical noise. Based on this measure, it is proposed in EP 2 144 233 A2 to adjust the parameters of the noise reduction algorithm (e.g., the maximum attenuation) to reduce the amount of musical noise (at the price of reduced noise reduction).
EP 2 144 233 A2 describes a noise suppression estimation device that calculates a noise index value, which varies according to kurtosis of a frequency distribution of magnitude of a sound signal before or after suppression of the noise component, the noise index value indicating a degree of occurrence of musical noise after suppression of the noise component in a frequency domain. A schematic block diagram reflecting such control of a noise reduction algorithm is shown in
WO2008115445A1 deals with speech enhancement based on a psycho-acoustic model capable of preserving the fidelity of speech while sufficiently suppressing noise including the processing artifact known as “musical noise”.
WO2009043066A1 deals with a method for enhancing wide-band speech audio signals in the presence of background noise, specifically to low-latency single-channel noise reduction using sub-band processing based on masking properties of the human auditory system. WO0152242A1 deals with a multi-band spectral subtraction scheme comprising a multi-band filter architecture, noise and signal power detection, and gain function for noise reduction. WO9502288A1 deals with properties of human audio perception used to perform spectral and time masking to reduce perceived loudness of noise added to speech signals.
A weakness of the prior art kurtosis-ratio-based musical noise measure is that it treats each and every time-frequency unit identically and does not take into account aspects of the human auditory system (although the basic goal of it is to predict perceived quality of a noise-reduced signal). More specifically, time-frequency units which are completely masked by other signal components, and which are therefore completely unavailable to the listener, will still contribute to the traditional kurtosis-ratio based measure, leading to erroneous predictions of the musical noise level.
An object of the present application is to provide an improved scheme for identifying and removing artifacts, e.g. musical noise, in an audio processing device.
Objects of the application are achieved by the invention described in the accompanying claims and as described in the following.
In an aspect of the present application, an object of the application is achieved by an audio processing device comprising
The audio processing device further comprises,
An advantage of the present disclosure is to dynamically optimize noise reduction with a view to audibility of artifacts.
The term ‘forward path’ is in the present context taken to mean a forward signal path comprising functional components for providing, propagating and processing an input signal representing an audio signal to an output signal.
The term ‘analysis path’ is in the present context taken to mean an analysis signal path comprising functional components for analysing one or more signals of the forward path and possibly controlling one or more functional components of the forward path based on results of such analysis.
The term ‘artifact’ is in the present context of audio processing taken to mean elements of an audio signal that are introduced by signal processing (digitalization, noise reduction, compression, etc.) that are in general not perceived as natural sound elements, when presented to a listener. The artifacts are often referred to as musical noise, which are due to random spectral peaks in the resulting signal. Such artifacts sound like short pure tones. Musical noise is e.g. described in [Berouti et al.; 1979], [Cappe; 1994] and [Linhard et al.; 1997].
According to the present disclosure, gain (attenuation) of the processing (e.g. noise reduction) algorithm at the given frequency and time is only modified in case the artifact in question is estimated to be audible as determined from a psychoacoustic or perceptual model, e.g. a masking model or an audibility model. Preferably, the attenuation of the processing (e.g. noise reduction) algorithm is optimized to provide that attenuation of noise at a given frequency and time (k,m) is maximized while keeping artifacts (just) inaudible. Psycho-acoustic models of the human auditory system are e.g. discussed in [Fastl & Zwicker, 2007], cf. e.g. chapter 4 on ‘Masking’, pages 61-110, and chapter 7.5 on ‘Models for Just-Noticeable Variations’, pages 194-202. An audibility model may e.g. be defined in terms of a speech intelligibility measure, e.g. the speech-intelligibility index (SII, standardized as ANSI S3.5-1997)
In an embodiment, the audio processing device comprises a time to time-frequency conversion unit for converting a time domain signal to a frequency domain signal. In an embodiment, the audio processing device comprises a time-frequency to time conversion unit for converting a time domain signal to a frequency domain signal.
In an embodiment, the time-frequency conversion unit is configured to provide a time-frequency representation of a signal of the forward path in a number of frequency bands k and a number of time instances m, k being a frequency band index and m being a time index, (k, m) thus defining a specific time-frequency bin or unit comprising a complex or real value of the signal corresponding to time instance m and frequency index k.
In general, any available method of identifying and/or reducing a risk of introducing artifacts introduced by a processing algorithm can be used. Examples are methods of identifying gain variance, e.g. fast fluctuations in gains intended for being applied by the processing algorithm. Such methods may include limiting a rate of change the applied gain, e.g. detecting gains that fluctuate and selectively decrease the gain in these cases (cf. e.g. EP2463856A1).
In an embodiment, a predetermined criterion regarding values of the artifact identification measure indicating the presence of an artifact in a given TF-bin (k,m) is defined.
In an embodiment, the artifact identification unit is configured to determine artifacts based on a measure of kurtosis for one or more signals of the forward path. Other measures may be used, though. An alternative measure may be based on a detection of modulation spectra. A modulation spectrum may be determined an associated with each TF-bin (k,m) by making a Fourier transformation of a ‘plot’ of magnitude or magnitude squared for TF-units of a specific frequency bin k over a number of consecutive time frames (a sliding window comprising a number of previous time frames, cf. e.g.
In an embodiment, the artifact identification unit is configured to determine the artifact identification measure by comparing a kurtosis value based on the electric input signal or a signal originating there from with a kurtosis value based on the processed signal.
In an embodiment, the artifact identification unit is configured to determine the artifact identification measure based on the kurtosis values Kb(k,m) and Ka(k,m) of the input signal or a signal originating there from and of the processed signal, respectively.
In statistics kurtosis describes a degree of peakedness (or ‘peak steepness’) of a probability function of a random (stochastic) variable X. Several measures of kurtosis K exist. e.g. Pearsons':
where μ is the mean value of X, μ4 is the fourth moment about the mean, σ is the standard deviation (μ2 is the second moment and equal to the variance Var(X)=σ2), and E[▪] is the expected value operator of ▪.
The n'th order moment μn is defined by
μn=∫0∞XnP(X)dX
where P(X) is the probability density function of X (cf. e.g. [Uemura et al.; 2009]).
In an embodiment, the artifact identification measure AIDM(k,m) comprises a kurtosis ratio Ka(k,m)/Kb(k,m). In an embodiment, the predetermined criterion is defined by the kurtosis ratio Ka(k,m)/Kb(k,m) being larger than or equal to a predefined threshold value AIDMTH.
In an embodiment, the audio processing device comprises an SNR unit for dynamically estimating an SNR value based on estimates of the target signal part and/or the noise signal part. In an embodiment, the SNR unit is configured to determine an estimate of a signal to noise ratio.
In an embodiment, the audio processing device comprises a voice activity detector (VAD) configured to indicate whether or not a human voice is present in the input audio signal at a given point in time (e.g. by a VOICE and NO-VOICE indication, respectively).
In an embodiment, the audio processing device, e.g. the artifact identification unit, is configured to perform the analysis of kurtosis during time spans where no voice is present in the electric input signal (as e.g. indicated by a voice activity detector).
The processing algorithm preferably comprises processing steps for enhancing a user's perception of the current electric input signal. In an embodiment, the algorithm comprises a compression algorithm. In a preferred embodiment, the processing algorithm comprises a noise reduction algorithm, e.g. a single-channel noise reduction (SC-NR) algorithm. In an embodiment, the noise reduction algorithm is configured to vary the gain between a minimum value and a maximum value. In an embodiment, the noise reduction algorithm is configured to vary the gain in dependence of the SNR value.
An artifact indication measure can be determined for a given signal before and after the application of a processing algorithm, e.g. a noise reduction algorithm for reducing noise in an audio signal comprising speech, cf. e.g. signals x(n) and z(n) in
In an embodiment, only the magnitude (or magnitude squared) of a TF-bin of a signal of the forward path (e.g. x or z) is considered when determining a resulting gain of the processing algorithm. In an embodiment, the energy of each time-frequency bin is determined as the magnitude squared (|▪|2) of the signal in the TF-bins in question.
In an embodiment, the audio processing device comprises an analogue-to-digital (AD) converter for converting an analogue electric signal representing an acoustic signal to a digital audio signal. In an embodiment, the analogue signal is sampled with a predefined sampling frequency or rate fs, fs being e.g. in the range from 8 kHz to 40 kHz (adapted to the particular needs of the application) to provide digital samples xn (or x[n]) at discrete points in time tn (or n), each audio sample representing the value of the acoustic signal at tn by a predefined number Ns of bits, Ns being e.g. in the range from 1 to 16 bits. In an embodiment, the signals of a particular frequency band (index k) are analyzed over a certain time span (e.g. more than 100 ms or 200 ms), e.g. a particular number Nf of time frames of the signal. In an embodiment, a sampling frequency fs is larger than 16 kHz, e.g. equal to 20 kHz (corresponding to a sample length in time of 1/fs=50 μs). In an embodiment, a number of audio samples are arranged in a time frame. In an embodiment, the number of samples in a time frame is 64 (corresponding to a frame length in time of 3.2 ms) or more. In an embodiment, the number of time frames Nf of the (sliding) window constituting the analyzing time span is larger than 20 such as larger than 50.
In an embodiment, the audio processing device, e.g. the artifact identification unit, is configured to determine a probability density function p(k,m) of the energy of a signal of the forward path. According to the present disclosure, a kurtosis parameter K(k,m) is determined for a probability density function of the energy (magnitude squared, |▪|2) at a given frequency (k) and time (m) of a signal of the forward path of the audio processing device before (Kb(k,m)) and after (Ka(k,m)) the processing algorithm in question, e.g. a noise reduction algorithm. A kurtosis parameter K(k,m) at a particular frequency k and time instance m is based on a number of previous time frames, e.g. corresponding to a sliding window (e.g. the Nf previous time frames relative to a given (e.g. present) time frame, cf. e.g.
An artifact identification measure AIDM(k,m) based on the kurtosis parameters Kb(k,m) and Ka(k,m) signals of the forward path (e.g. a kurtosis ratio Ka(k,m)/Kb(k,m), or difference Ka(k,m)−Kb(k,m), or other functional relationship between the two) can be defined. A predetermined criterion regarding the value of the artifact identification measure is defined, e.g. Ka(k,m)/Kb(k,m)≧AIDMTH. In an embodiment, AIDMTH≧1.2, e.g. ≧1.5. If the predefined criterion is fulfilled by the artifact identification measure of a given TF-bin, an artifact at that frequency and time is identified.
In an embodiment, the gain control unit is configured to modify a gain of the processing algorithm (e.g. noise reduction algorithm, where an attenuation is reduced), if an artifact is identified. In an embodiment, the modification comprises that a reduction of a gain (i.e. an attenuation) otherwise intended to be applied by the processing algorithm is reduced with a predefined amount ΔG (e.g. eliminated, i.e. no attenuation, gain=1). In an embodiment, the modification comprises that a reduction of gain (an attenuation) otherwise intended to be applied by the processing algorithm is gradually modified in dependence of the size of the artifact identification difference measure. In an embodiment, attenuation is reduced with increasing kurtosis ratio and vice versa (i.e. increased with decreasing kurtosis ratio). In an embodiment, the gain control unit is configured to limit a rate of the modification, e.g. to a value between 0.5 dB/s and 5 dB/s.
In an embodiment, the perceptive model comprises a masking model configured to identify to which extent an identified artifact of a given time-frequency unit of the processed signal or a signal derived there from is masked by other elements of the current signal.
In an embodiment, the gain control unit is configured to dynamically modify the gain of the noise reduction algorithm otherwise intended to be applied by the algorithm to provide that the amount of noise reduction is always at a maximum level subject to the constraint that no (or a minimum of) musical noise is introduced.
The audio processing device comprises a forward or signal path between an input unit, e.g. an input transducer (e.g. comprising a microphone system and/or direct electric input (e.g. a wireless receiver)) and an output unit, e.g. an output transducer. A signal processing unit is located in the forward path. In an embodiment, the signal processing unit—in addition to the processing algorithm—is adapted to provide a frequency dependent gain according to a user's particular needs. The audio processing device comprises an analysis path comprising functional components for analyzing the input signal, including determining a signal to noise ratio, a kurtosis value, etc. In an embodiment, the analysis path comprises a unit for determining one or more of a level, a modulation, a type of signal, an acoustic feedback estimate, etc. In an embodiment, some or all signal processing of the analysis path and/or the signal path is conducted in the frequency domain. In an embodiment, some or all signal processing of the analysis path and/or the signal path is conducted in the time domain.
In an embodiment, the audio processing device comprises a digital-to-analogue (DA) converter to convert a digital signal to an analogue output signal, e.g. for being presented to a user via an output transducer.
In an embodiment, the time to time-frequency (TF) conversion unit comprises a filter bank for filtering a (time varying) input signal and providing a number of (time varying) output signals each comprising a distinct frequency range of the input signal. In an embodiment, the TF conversion unit comprises a Fourier transformation unit for converting a time variant input signal to a (time variant) signal in the frequency domain. In an embodiment, the frequency range considered by the audio processing device from a minimum frequency fmin to a maximum frequency fmax comprises a part of the typical human audible frequency range from 20 Hz to 20 kHz, e.g. a part of the range from 20 Hz to 12 kHz. In an embodiment, a signal of the forward and/or analysis path of the audio processing device is split into a number NI of frequency bands, where NI is e.g. larger than 5, such as larger than 10, such as larger than 50, such as larger than 100, such as larger than 500, at least some of which are processed individually. In an embodiment, the audio processing device is/are adapted to process a signal of the forward and/or analysis path in a number NP of different frequency channels (NP 5. NI). The frequency channels may be uniform or non-uniform in width (e.g. increasing in width with frequency), overlapping or non-overlapping.
In an embodiment, the audio processing device comprises a frequency analyzing unit configured to determine a power spectrum of a signal of the forward path, the power spectrum being e.g. represented by a power spectral density, PSD(k), k being frequency index, the total power of the power spectrum at a given point in time m being determined by a sum or integral of PSD(k) over all frequencies at the given point in time). In an embodiment, the frequency analyzing unit is configured to determine a probability density function of the energy (magnitude squared, |▪|2) at a given frequency (k) and time (m) of a signal of the forward path of the audio processing device based on a number of previous time frames, e.g. corresponding to a sliding window (e.g. the Nf previous time frames relative to a given (e.g. present) time frame).
In an embodiment, the audio processing device comprises a number of microphones and a directional unit or beamformer for providing a directional (or omni-directional) signal. Each microphone picks up a separate version of a sound field surrounding the audio processing device and feeds an electric microphone signal to the directional unit. The directional unit forms a resulting output signal as a weighted combination (e.g. a weighted sum) of the electric microphone signals. In an embodiment, the processing algorithm is applied to one or more of the electric microphone signals. Preferably, however, the processing algorithm is applied to the resulting (directional or omni-directional) signal from the directional unit.
In an embodiment, the audio processing device comprises an acoustic (and/or mechanical) feedback suppression system. In an embodiment, the audio processing device further comprises other relevant functionality for the application in question, e.g. compression.
In an embodiment, the audio processing device comprises a listening device, such as a hearing aid, e.g. a hearing instrument, e.g. a hearing instrument adapted for being located at the ear or fully or partially in the ear canal of a user, or a headset, an earphone, an ear protection device or a combination thereof.
In an aspect, use of an audio processing device as described above, in the ‘detailed description of embodiments’ and in the claims, is moreover provided. In an embodiment, use is provided in a system comprising audio distribution, e.g. a system comprising a microphone and a loudspeaker in sufficiently close proximity of each other to cause feedback from the loudspeaker to the microphone during operation by a user. In an embodiment, use is provided in a system comprising one or more hearing instruments, headsets, ear phones, active ear protection systems, etc., e.g. in handsfree telephone systems, teleconferencing systems, public address systems, karaoke systems, classroom amplification systems, etc.
A method:
In an aspect, a method of operating an audio processing device comprising a forward path for applying a processing algorithm to an audio input signal and an analysis path for analyzing signals of the forward path to control the processing algorithm, the method comprising
The method further comprises
It is intended that some or all of the structural features of the audio processing device described above, in the ‘detailed description of embodiments’ or in the claims can be combined with embodiments of the method, when appropriately substituted by a corresponding process and vice versa. Embodiments of the method have the same advantages as the corresponding devices.
In an embodiment, the method further comprises
In an embodiment, the method comprises identifying whether or not a human voice is present in the input audio signal at a given point in time. In an embodiment, the method comprises that the analysis of kurtosis is only performed during time spans where no voice is present in the electric input signal.
In an embodiment, the method provides that the processing algorithm comprises a noise reduction algorithm, e.g. a single-channel noise reduction (SC-NR) algorithm.
In an aspect, a tangible computer-readable medium storing a computer program comprising program code means for causing a data processing system to perform at least some (such as a majority or all) of the steps of the method described above, in the ‘detailed description of embodiments’ and in the claims, when said computer program is executed on the data processing system is furthermore provided by the present application. In addition to being stored on a tangible medium such as diskettes, CD-ROM-, DVD-, or hard disk media, or any other machine readable medium, and used when read directly from such tangible media, the computer program can also be transmitted via a transmission medium such as a wired or wireless link or a network, e.g. the Internet, and loaded into a data processing system for being executed at a location different from that of the tangible medium.
In an aspect, a data processing system comprising a processor and program code means for causing the processor to perform at least some (such as a majority or all) of the steps of the method described above, in the ‘detailed description of embodiments’ and in the claims is furthermore provided by the present application.
In a further aspect, an audio processing system comprising an audio processing device as described above, in the ‘detailed description of embodiments’, and in the claims, AND an auxiliary device is moreover provided.
In an embodiment, the system is adapted to establish a communication link between the audio processing device and the auxiliary device to provide that information (e.g. control and status signals, possibly audio signals) can be exchanged or forwarded from one to the other.
In an embodiment, the auxiliary device is or comprises an audio gateway device adapted for receiving a multitude of audio signals (e.g. from an entertainment device, e.g. a TV or a music player, a telephone apparatus, e.g. a mobile telephone or a computer, e.g. a PC) and adapted for selecting and/or combining an appropriate one of the received audio signals (or combination of signals) for transmission to the audio processing device. In an embodiment, the auxiliary device is or comprises a remote control for controlling functionality and operation of the audio processing device(s).
In an embodiment, the auxiliary device is another audio processing device. In an embodiment, the audio processing system comprises two audio processing devices adapted to implement a binaural audio processing system, e.g. a binaural hearing aid system. In a preferred embodiment, information about the control of the processing algorithm (e.g. a noise reduction algorithm) is exchanged between the two audio processing devices (e.g. first and second hearing instruments), e.g. via a specific inter-aural wireless link (IA-WLS in
Further objects of the application are achieved by the embodiments defined in the dependent claims and in the detailed description of the invention.
As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well (i.e. to have the meaning “at least one”), unless expressly stated otherwise. It will be further understood that the terms “includes,” “comprises,” “including,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will also be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element or intervening elements may be present, unless expressly stated otherwise. Furthermore, “connected” or “coupled” as used herein may include wirelessly connected or coupled. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless expressly stated otherwise.
The disclosure will be explained more fully below in connection with a preferred embodiment and with reference to the drawings in which:
The figures are schematic and simplified for clarity, and they just show details which are essential to the understanding of the disclosure, while other details are left out.
Further scope of applicability of the present disclosure will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the disclosure, are given by way of illustration only. Other embodiments may become apparent to those skilled in the art from the following detailed description.
In the following, the functional units of the signal processing unit (SPU) are described. The analysis filter banks (A-FB) of signal processing unit (SPU) receives time domain microphone signals IN1, . . . , INp and provides time-frequency representations INF1, . . . , INFp of the p microphone input signals. The p TF-representations of the input signals are fed to a directional (or beamforming) unit (DIR) for providing a single resulting directional or omni-directional signal. The resulting output signal BFS of the DIR unit is a weighted combination (e.g. a weighted sum) of the input signals INF1, . . . , INFp. The processing algorithm, here a noise reduction algorithm (NR), is applied to the resulting (directional or omni-directional) signal BFS. The noise reduced signal NRS is fed to a further processing algorithm (HAG) for applying a gain to signal NRS, e.g. a frequency and/or level dependent gain to compensate for a user's hearing loss and/or to compensate for un-wanted sound sources in the sound field of the environment. The output AMS of the further processing algorithm (HAG) is fed to synthesis filter bank (S-FB) for conversion to time-domain signal OUT. The signal processing unit (SPU) further comprises an analysis path comprising a control unit (CNT) for controlling the noise reduction algorithm (NR). The control unit (CNT) comprises the same functional elements shown in
The noise reduction system as described in the listening device of
Kurtosis values K1(k,m) (K1=K(x)) and K2(k,m) (K2=K(z)) of signals of the forward path before and after, respectively, the application of the noise reduction algorithm are determined in units Kurtosis(x) and Kurtosis(z), respectively, for the TF-bins in question. According to the present disclosure, a kurtosis value K1(k,m) or K2(k,m) is determined for a probability density function p of the energy (magnitude squared, |▪|2) at a given frequency (k) and time (m) of the signal (K1(k,m) and K2(k,m)) in question. A kurtosis parameter K(k,m) at a particular frequency k and time instance m is based on a probability density function p(|▪|2) of the energy for a number of previous time frames, e.g. corresponding to a sliding window (e.g. the Nf previous time frames relative to a given (e.g. present) time frame, cf. e.g.
An artifact identification measure AIDM(k,m), e.g. comprising a kurtosis ratio KR(k,m)=K2(k,m)/K1(k,m), is determined in unit Kurtosis ratio based on the determined kurtosis values K1(k,m) and K2(k,m). A predetermined criterion regarding the value of the artifact identification measure is defined, e.g. K2(k,m)/K1(k,m)≧AIDMTH. In an embodiment, AIDMTH≧1.2, e.g. ≧1.5. If the predefined criterion is fulfilled by the artifact identification measure of a given TF-bin, an artifact at that frequency and time is identified.
Compared to the noise reduction system described in connection with
This improved musical noise predictor can e.g. be used in an online noise-reduction system in a hearing instrument or other audio processing device, where parameters of the noise reduction system is continuously updated based on a musical noise predictor, such that the amount of noise reduction is always at a level where the noise reduction is maximum subject to the constraint that no musical noise is introduced (or that musical noise is minimized). A noise reduction system applying a band specific scheme is e.g. described in WO 2005/086536 A1.
In an embodiment, the system is configured to control the gain of a noise reduction algorithm independently in each of the first and second hearing instruments. It may be a problem, however, if artifacts are ‘detected’ and thus attenuation reduced at one ear, but not at the other ear. Thus (at that frequency and time) gain will increase (because of a less aggressive noise reduction, e.g. by reducing attenuation from 10 dB to 4 dB) at the one ear relative to the other ear, which—in some instances—may erroneously be interpreted as spatial cues and thus cause confusion for the user.
In a preferred embodiment, information about the control of the noise reduction is exchanged between the first and second hearing instruments, e.g. via the inter-aural wireless link (IA-WLS), thus allowing a harmonized control of the noise reduction algorithms of the respective hearing instruments. Specifically, information about the control of gains of time-frequency regions for which gains should be increased (attenuation reduced) to reduce the risk of producing audible artifacts is exchanged between the first and second hearing instruments. Preferably, the same attenuation strategy is applied in first and second hearing instruments (at least regarding attenuation in time-frequency regions at risk of producing audible artifacts).
A masking model or an audibility model applied to an output signal (e.g. the noise reduced signal, or a further processed signal) is, however, preferably used to qualify the artifacts in audible and in-audible artifacts.
Preferably, a perceptive noise reduction scheme as proposed in the present application is implemented. When an artifact identification measure AIDM(k,m) (e.g. a kurtosis ratio) for the particular TF-unit (k,m) is smaller than a threshold value AIDMTH, no risk of introducing artifacts is identified, and a normal operation of the noise reduction algorithm is applied (as described above for
The algorithm ALG is assumed to have a specific form for determining a gain for a given TF bin, when artifacts are not considered (normal mode).
According to the present disclosure, where artifacts are identified using an artifact identification measure AIDM that is calculated on a TF bin basis, AIDM(k,m), a modification ΔGALG of the ‘normal’ gain is proposed when artifacts can be identified.
In an embodiment, ΔGALG is identical for all values of k and m. In an embodiment, ΔGALG is dependent on frequency (index k). In an embodiment, ΔGALG is dependent on the artifact identification measure AIDM(k,m).
In an embodiment, a speech or voice activity detector is configured to determine whether the audio signal (either the full signal and/or specific time-frequency elements of the signal) at a given time contain speech elements. For a noise reduction algorithm, a modification ΔGNR of the ‘normal’ gain (GNR in
Preferably the rate of change of the modification is limited, the rate of change being defined by ΔGNR and the time interval tF between successive time frames of the signal. In an embodiment, a time frame has a duration of between 0.5 ms and 30 ms, depending on the application in question (and determine by the length in time of one sample (determined by the sampling rate fs) and the number of samples per time frame, e.g. 2n, n being a positive integer, e.g. larger than or equal to 6). A relatively short time frame enables a system with a relatively low latency (e.g. necessary in applications where a transmitted sound signal is intended to be in synchrony with an image, e.g. a live image, such as e.g. in hearing aid system). Relatively longer time frames results in higher system latency, but may be acceptable in other applications, however, e.g. in cell phone systems.
In an embodiment, ΔGNR is adaptively determined in dependence of the size of the artifact identification measure (AIDM), e.g. so that ΔGNR is larger the larger AIDM(k,m) (e.g. proportional to AIDM).
Typically, the ‘noise only’ periods of time are (by definition) periods of time with a low signal to noise ratio (see indication ‘noisy signal’ in
Preferably, the steps ΔGNR and the frame length in time (tF determining a time unit from time index m to time index m+1) are configured to provide that an adaptation rate of the noise reduction gain GNR(k,m)—when artifacts are detected—is a compromise between the risk of creating artifacts in the processed signal of the forward path and the wish to ensure an aggressive noise reduction. In an embodiment, ΔGNR and tF are selected to provide that the adaptation rate of GNR(k,m) is in the range from 0.5 dB/s to 5 dB/s. An exemplary frame length tF of 5 ms and an adaptation rate of 2.5 dB/s leads for example to a step size per time unit ΔGNR of 0.0125 dB (ΔGNR/tF=AR).
The invention is defined by the features of the independent claim(s). Preferred embodiments are defined in the dependent claims. Any reference numerals in the claims are intended to be non-limiting for their scope.
Some preferred embodiments have been shown in the foregoing, but it should be stressed that the invention is not limited to these, but may be embodied in other ways within the subject-matter defined in the following claims and equivalents thereof.
Number | Date | Country | Kind |
---|---|---|---|
12197643.5 | Dec 2012 | EP | regional |
This application claims priority under 35 U.S.C. §119(e) to U.S. Provisional Application No. 61/738,407 filed on Dec. 18, 2012. This application also claims priority under U.S.C. §119(a) to Patent Application No. 12197643.5 filed in Europe on Dec. 18, 2012. The entire contents of all the above applications are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
61738407 | Dec 2012 | US |