The invention relates to the field of binaural hearing systems, and in particular to noise reduction in such hearing systems. It relates to methods and apparatuses according to the opening clauses of the claims.
Rather specifically, the present invention relates to binaural noise reduction through Wiener Filtering for hearing aids preserving interaural transfer functions (ITF), and more particularly it relates to an algorithm for preserving interaural transfer functions of the speech and noise components and thus preserving the interaural time delay (ITD) and interaural level difference (ILD) cues of the speech and noise components.
Under a hearing device, a device is understood, which is worn in or adjacent to an individual's ear with the object to improve the individual's acoustical perception. Such improvement may also be barring acoustic signals from being perceived in the sense of hearing protection for the individual. If the hearing device is tailored so as to improve the perception of a hearing impaired individual towards hearing perception of a “standard” individual, then we speak of a hearing-aid device. A hearing-aid device is also referred to as hearing aid. With respect to the application area, a hearing device may be applied behind the ear, in the ear, completely in the ear canal or may be implanted.
A hearing system comprises at least one hearing device. In case that a hearing system comprises at least one additional device, all devices of the hearing system are operationally connectable within the hearing system. Typically, said additional devices such as another hearing device, a remote control or a remote microphone, are meant to be worn or carried by said individual.
Under audio signals, we understand electrical signals, analogue and/or digital, which represent sound.
An interaural transfer function (ITF) is a function describing how to obtain a signal representing sound originating from one sound source and picked up in or near one ear of an individual, from a signal representing the identical sound (originating from the identical sound source) picked up in or near the other ear of said individual. An ITF can, e.g., be obtained by dividing data representing said signals picked up in or near said one ear by data representing said signals picked up in or near said other ear. An ITF is actually defined only for one single sound source, but it is nevertheless also used for a mixture of signals originating from two or more sound sources, as long as signals from one of the sources prevail over signals from other sources.
We understand under technical “beam-forming” tailoring the amplification of an electrical signal with respect to an acoustical signal as a function of direction of arrival (DOA) of the acoustical signal relative to a predetermined spatial direction. Most generically, technical beam-forming is always achieved when the output signals of two spaced input acoustical-to-electrical converter arrangements are processed to result in a combined output signal. Within the field of a binaural hearing systems, we understand under technical “monaural beam-forming”, the beam-forming as performed separately at the respective hearing devices. Under “binaural beam-forming”, we understand within this field beam-forming which exploits the mutual distance between an individual's ears.
Hearing impaired persons localize sounds better without their bilateral hearing aids than with them [2]. In addition, noise reduction algorithms currently used in hearing aids are not designed to preserve localization cues [3]. The inability to correctly localize sounds puts the hearing aid user at a disadvantage. The sooner the user can localize a speech signal, the sooner the user can begin to exploit visual cues. Generally, visual cues lead to large improvements in intelligibility for hearing impaired persons [4]. Furthermore, preserving the spatial separation between the target speech and the interfering signals leads to an improvement in speech understanding [5], [6].
Studies have shown that the spatial separation between the speech and noise sources contributes to an improvement in intelligibility [5], [6]. This is referred to as spatial release from masking. Therefore the benefit of a noise reduction algorithm that preserves localization cues is twofold. First, noise reduction leads to an improvement in intelligibility. Additionally, preserving localization cues preserves the spatial separation of the target speech and noise sources, resulting again in an improvement in intelligibility.
A hearing impaired person wearing a monaural hearing aid on each ear is said to be using bilateral hearing aids. Each monaural hearing aid uses its own microphone inputs to generate an output for its respective ear. No information is shared between the hearing aids. Contrastingly, binaural hearing aids use the microphone inputs from both the left and right hearing aid, typically through a wireless link, to generate an output for the left and right ear.
Interaural time delay (ITD) and interaural level difference (ILD) help listeners localize sounds horizontally [7]. ITD is the time delay in the arrival of the sound signal between the left and right ear, and ILD is the intensity difference between the two ears. ITD cues are more reliable in low frequencies.
On the other hand, ILD is more prominent in high frequencies, since it stems from the scattering of the sound waves by the head.
In [8], the Wiener filter cost function used in a noise reduction procedure has been extended, and includes terms related to ITD and ILD cues of the noise component. The ITD cost function is expressed as the phase difference between the output noise cross-correlation and the input noise cross correlation. The ILD cost function is expressed as the difference between the output noise power ratio and the input noise power ratio. It has been shown that it is possible to preserve the binaural cues of both the speech and noise components without significantly compromising the noise reduction performance. However, iterative optimization techniques are used to compute the filter.
It is desirable to provide for an improved noise reduction in hearing systems.
Several documents are cited throughout the text of this specification. Each of the documents herein (including any manufacturer's specifications, instructions etc.) are hereby incorporated by reference; however, there is no admission that any document cited is indeed prior art of the present invention.
Therefore, one object of the invention is to create a binaural hearing system that does not have the disadvantages mentioned above. It shall be provided for an improved noise reduction.
In addition, the respective method of operating a binaural hearing system shall be provided.
Another object of the invention is to provide for a way to achieve an improved speech intelligibility, in particular in noisy environments.
Another object of the invention is to provide for an alternative way of providing localization cues while performing noise reduction in a hearing system.
Further objects emerge from the description and embodiments below.
At least one of these objects is at least partially achieved by apparatuses and methods according to the patent claims.
The binaural hearing system comprises
Through this, an improved noise reduction can be achieved. In particular, this allows to provide for localization cues while performing noise reduction. An improved speech intelligibility can be achieved.
The corresponding method of operating a binaural hearing system comprises the steps of
Said ITF means can be a means providing said at least one interaural transfer function.
In one embodiment, said ITF means also allows to obtain said at least one interaural transfer function, e.g., by calculating.
Said ITF means can be or comprise a storage means comprising predefined, e.g., pre-calculated data describing said at least one interaural transfer function.
Said noise reduction means can be means performing said noise reduction in dependence of said at least one interaural transfer function.
In one embodiment, said at least one interaural transfer function comprises an interaural transfer function of wanted signal components and/or an interaural transfer function of unwanted signal components. It may comprise two or more interaural transfer functions of wanted signal components and/or two or more interaural transfer functions of unwanted signal components. In most practical cases, there will be one source of wanted signals and, accordingly, one interaural transfer function of wanted signal components, and one or two sources of unwanted signals and, accordingly, one or two interaural transfer functions of wanted signal components.
We occasionally speak of wanted/unwanted signal “components”, in order to emphasize that signals subject to noise reduction are a composition of wanted signals and unwanted signals. The primary aim of said noise reduction is to separate wanted signal components from unwanted signal components.
Typically, said wanted signals are speech signals. Said unwanted signals are often referred to as noise.
In one embodiment, said binaural hearing system comprises
In one embodiment, said first adaptive filtering unit is a first adaptive filter, and said second adaptive filtering unit is a second adaptive filter.
Said ITF means can also be referred to as an ITF unit.
In one embodiment, each input transducer unit comprises at least one input transducer. Input transducers are usually acoustic-to-electric converters, e.g., microphones.
In one embodiment, said binaural hearing system comprises a first and a second hearing device, each comprising an input transducer belonging to said first and second input transducer unit, respectively.
An input transducer unit may comprise a remote input transducer such as a remote microphone.
Typically, said first and said second input transducer units each comprise at least one input transducer that is worn in or near the left and right ear, respectively, of an individual using said binaural hearing system.
In one embodiment, said filtering in said first and second adaptive filtering units depends in essentially the same way on said at least one interaural transfer function. More particularly, the optimization functions of said first and second adaptive filtering units are identical, i.e. have the same form. (Note that differences between filtering and filtering coefficients in said first and second adaptive filtering units is due to the assignment of different audio signals to the inputs of said first and second adaptive filtering units, respectively.
In one embodiment, said first and second adaptive filtering units each have a set of filtering coefficients, which depend on said at least one interaural transfer function.
We refer to filtering coefficients of an adaptive filter as coefficients (or terms), which influence the way the adaptive filter filters the signals inputted to the filter.
In one embodiment, said first and second adaptive filtering units each have an optimization function depending on said at least one interaural transfer function. For Wiener filters, said optimization function is typically referred to as “cost function”. In case of a constraint optimization, the functional expression describing the constraint is—in the framework of the present application—considered to be comprised in said optimization function.
In one embodiment, said binaural hearing system comprises
In one embodiment, said first and second output transducer units are each comprised in one device of said binaural hearing system, in particular in a hearing device.
In one embodiment, said first and second output transducer units are are located in or near the left and the right ear, respectively, of said individual during normal operation of said binaural hearing system.
Typically, such output transducer units are embodied as loudspeakers, also referred to as receivers.
In one embodiment, said first and second adaptive filtering units each have an optimization function comprising at least one term describing at least one desired interaural transfer function, such as to aim at outputting audio signal components from said first and second adapative filtering units, respectively, which are related to each other as described by said at least one desired interaural transfer function. In particular:
In one embodiment, said first and second adaptive filtering units each have an optimization function comprising a first term describing a desired interaural transfer function for wanted signal components and a second term describing a desired interaural transfer function for unwanted signal components, such as to aim at realizing that a transfer function describing the relation between wanted audio signal components outputted from said first and second adapative filtering units, respectively, corresponds to said desired interaural transfer function for wanted signal components, and at realizing that a transfer function describing the relation between unwanted audio signal components outputted from said first and second adapative filtering units, respectively, corresponds to said desired interaural transfer function for unwanted signal components.
Of course, as indicated above, there may be additional terms in said optimization function, for further wanted and/or (more likely) unwanted signal components.
In one embodiment, said ITF means comprises a first and a second input, for obtaining an interaural transfer function from audio signals inputted to said first and second inputs, wherein said first and second inputs are operationally connected to said first and second input transducer unit, respectively.
In this embodiment, it is possible to preserve at least one ITF. Through this, the localization cues in the filtered signals are similar to or even at least approximately the same as the localization cues in the unfiltered signals.
It is also possible to use other ITFs. This allows to virtually locate sources of sound. E.g., instead of preserving the ITF for unwanted signal components, an ITF corresponding to a source from sideways behind the hearing system user's head can be used, which can lead to an enhanced intelligibility, in particular if the actual source of noise is located in a direction close to the direction where the source of wanted signals is located, which is usually expected to be in direction of said user's nose.
In one embodiment, said binaural hearing system comprises at least one detecting unit operationally connected to at least one of said first and second input transducer units, and having an output operationally connected to said control input of at least one of said first and second adaptive filters, for deciding whether audio signals received from said at least one of said input transducer units are considered wanted signals or unwanted signals.
Said detecting unit can comprise a voice activity detector.
Said detecting unit may be based on at least one of frequency spectrum analysis, a directional analysis, e.g., as a localizer does, or classification, also referred to as acoustic scene analysis.
In case that said first and second adaptive filtering units each have an optimization function comprising a first term describing a desired interaural transfer function for wanted signal components and a second term describing a desired interaural transfer function for unwanted signal components, this embodiment provides a good way to allow to assign said obtained interaural transfer function to either said first or said second term.
In one embodiment, said first and second adaptive filtering units comprise at least one Wiener filter each, in particular multichannel Wiener filters.
It is also possible to use other types of filters. E.g., filters based on blind source separation (BSS) can be used.
In general, preferably, linear filters are used. With respect to other filters, they have the advantage of providing good results at relatively low computational cost. Instead of implementing at least one desired ITF in a filter's optimization function, it is also possible to perform a constraint optimization. Said constraint can in this case aim at accomplishing that a relation between audio signals output from said first and second filtering units corresponds to a desired interaural transfer function.
In one embodiment, said noise reduction means comprises two binaural Wiener filters each having a cost function comprising at least one term describing a desired interaural transfer function, in particular wherein said at least one interaural transfer function provided by said ITF means is assigned to said at least one term.
In one embodiment, said binaural hearing system comprises
This can be valuable, in particular when the bandwith for transmitting data from said sending unit to said receiving unit is limited, in particular when the bandwidth allows to transmit one audio signal stream, but not two audio signal streams in the desired quality (defined, e.g., by bit-depth and sampling frequency). Said processing in said preprocessor typically combines the two or more audio signal streams input to the preprocessor into a smaller number of audio signal streams, typically into only one audio signal stream. But it is also possible to provide that a preprocessor outputs the same number of audio signal streams may as are inputted to the preprocessor. In the latter case, the preprocessor typically performs compression of audio signals.
In one embodiment, said preprocessor performs beamforming, more precisely technical beamforming, typically monaural beamforming, e.g., by delaying one input signal stream with respect to another input signal stream and adding the two, possibly inverting one of the signals, i.e. by the well-known delay-and-add method for beamforming. It is also possible to perform the well-known filter-and-add method by delaying one input signal stream with respect to another input signal stream and frequency-bin-wise adding the two, weighting the frequency bins, and possibly inverting one of the signals.
In one embodiment, said preprocessor performs compression, in particular perceptual coding, i.e. a compression making use of the fact that certain components of audio signals are not or hardly perceivable by the human ear, which therefore can be omitted. It is also possible to use a compression that makes use of the fact that audio signals picked up by closely-spaced input transducers are very similar. Components in said audio signals that are identical or practically identical can be omitted in one of the preprocessed audio signals. And components that can be derived from one preprocessed audio signal stream also need not be comprised in another preprocessed audio signal stream. Said closely-spaced input transducers can comprise input transducers comprised in the same device of the binaural hearing system, and it is also possible to provide that said closely-spaced input transducers can comprise input transducers comprised in the same device of the binaural hearing system.
In one embodiment, said preprocessor is, at least in part, comprised in said noise reduction means. It is possible to use intermediate results of said noise reduction means or audio signals derived therefrom, as preprocessed audio signals.
Said communication link is typically a wireless communication link, but can also be a wire-bound or other communication link, e.g., one making use of skin conduction.
Said first and/or second device of said binaural hearing system can be, e.g., hearing device, or remote control, or wearable processing unit, or remote microphone unit.
In one embodiment, said binaural hearing system comprises
As pointed out before, said communication links are typically wireless communication links, but can also be other communication links.
In one embodiment, said ITF means is comprised in one device of said binaural hearing system, and said at least one interaural transfer function provided by said ITF means, or a portion thereof, is transmitted to another device of said binaural hearing system.
In another embodiment, said ITF means comprises two sub-units comprised in different devices of said binaural hearing system, and each providing at least one interaural transfer function. This can allow to render a transmission of at least one interaural transfer function from one device of said binaural hearing system to another device of said binaural hearing system superfluous.
Said noise reduction means and said ITF means and said preprocessor and said detecting unit are typically implemented in at least one processor, typically a programmable processor, in particular a signal processor, usually a digital signal processor (DSP). Their functions can be realized in one such processor, but typically they will be distributed over at least two such processors.
In one embodiment, said noise reduction means are or comprise two binaural Wiener filters, each having a cost function J(W) as follows
wherein the meaning of all the variables is explained in the Examples I to III in the Detailed Description of the Invention below.
A great advantage of this cost function is, that its minimum can be derived analytically, and the corresponding optimum filtering coefficients W can be obtained from measurable data. In the formulae depicted after equation (1) in the Detailed Description of the Invention (Example III, section B), said optimum filtering coefficients W are explicitely given.
In one embodiment of said method of operating a binaural hearing system, said binaural hearing system comprises a first and a second input transducer unit and a first and a second adaptive filtering unit, and said method comprises the steps of
In one embodiment of said method of operating a binaural hearing system, said filtering in said first and second adaptive filtering units depends in essentially the same way on said at least one interaural transfer function.
In one embodiment, said method comprises the steps of
In one embodiment, said method comprises the step of obtaining said at least one interaural transfer function from calculating a relation between said first audio signals or audio signals derived therefrom and said second audio signals or audio signals derived therefrom.
In one embodiment, said method comprises the steps of
In one embodiment, said first and second adaptive filtering units each have an optimization function comprising a first term describing a desired interaural transfer function for wanted signal components and a second term describing a desired interaural transfer function for unwanted signal components, said method comprising the step of
In one embodiment, said first and second adaptive filtering units both perform Wiener filtering.
In one embodiment, said binaural hearing system comprises a first and a second device and a first and a second input transducer unit and an adaptive filtering unit having at least a first and a second audio signal input, said first input transducer unit comprising at least two input transducers, said method comprising the steps of
It has been found, that in many noise reduction systems, wanted signals are subject to relatively low distortion, and for that reason, the ITF of wanted signals is usually not severely distorted. But, since it is the task of a noise reduction system to suppress unwanted signal components, the ITF of unwanted signal components is usually relatively strongly distorted by noise reduction algorithms. It has been found that providing unwanted signal components with a well-defined ITF (be it an artificial ITF or an ITF derived from the original signals) can significantly enhance the intelligibility of the noise reduced signals. The present invention allows to provide wanted and/or unwanted signal components with a well-defined ITF.
The advantages of the methods correspond to the advantages of corresponding apparatuses.
The present invention can solve problems of the related art of binaural cue preservation by preserving the ITFs of the speech and noise component.
In a specific view, the invention is drawn to an algorithm which preserves both the interaural time delay (ITD) and interaural level difference (ILD) of the speech and noise components. This is achieved by preserving the ITFs of wanted signal components (speech component) and unwanted signal components (noise component). Clearly, the interaural transfer function (ITF), which is the ratio between the speech components (noise components) in the microphone signals at the left and right ear, captures all information between the two ears including ITD and ILD cues.
Viewed from a certain angle, present invention attacks the problem of binaural cue preservation by preserving the ITF. If the algorithm preserves the ITFs of the speech and noise components then the algorithm preserves the ITD and ILD cues of the speech and noise components.
More particularly the present invention concerns an improvement of the binaural multi-channel Wiener filtering based noise reduction algorithm by extending the underlying cost function to incorporate terms for the interaural transfer functions (ITF) of the speech and noise components, which improvement preserves both the interaural time delay (ITD) and interaural level difference (ILD) of the speech and noise components. Using weights, the emphasis on the preservation of the ITFs can be controlled in addition to the emphasis on noise reduction. Adapting these parameters allows one to preserve the ITFs of the speech and noise component, and therefore ITD and ILD cues, while enhancing the signal-to-noise ratio.
Viewed from a certain point of view, by present invention a binaural noise reduction algorithm has been designed and provided that allows one to control the ITD and ILD cues.
In a further aspect of the invention, the desired ITFs can be replaced by known ITFs for a specific direction of arrival. Preserving these desired ITFs allows one to change the direction of arrival of the speech and noise sources. Furthermore, an algorithm that intentionally distorts the localization cues of the speech and noise sources to improve the spatial separation of speech and noise could lead to improvements in intelligibility.
Considered under a specific point of view, the present invention provides a binaural Wiener filter based noise reduction procedure improved by incorporating two terms in the cost function that account for the ITFs of the speech and noise components. Using weights, the emphasis on the preservation of the ITF of the speech and noise component can be controlled in addition to the emphasis on noise reduction.
Adapting theses parameters allows one to preserve the ITF of the speech and noise component, and therefore ITD and ILD cues, while enhancing the signal-to-noise ratio. Additionally, it has been shown that the algorithm can even shift the noise source to a new location, by using a different desired ITF for the noise source, while maintaining good noise reduction performance.
Present invention is, in a certain aspect, an improvement of the binaural Wiener filter described in [1], where the cost function is comprised of four terms. The first two terms are present in the monaural speech distortion weighted Wiener filter proposed by [9]. The remaining two terms aim at preserving the ITFs of the speech and noise component. Contrary to the Wiener filter extensions proposed in [1], this algorithm co-designs the right and left filter. In other words, the left and right filter are related to each other in that they have common dependencies.
Further scope of applicability of the present invention will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating specific embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Further preferred embodiments and advantages emerge from the dependent claims and the figures.
Below, the invention is described in more detail by means of examples and the included drawings. The present invention will become more fully understood from the detailed description given herein below and the accompanying drawings which are given by way of illustration only, and thus are not limitative of the present invention, and wherein:
The reference symbols used in the figures and their meaning are summarized in the list of reference symbols. The described embodiments are meant as examples and shall not confine the invention.
The following detailed description of the invention refers to the accompanying drawings. Also, the following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims and equivalents thereof.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
In Example I, the system model is introduced. Additionally, the notation used in this paper is presented. The ITF is defined in Example II. In Example III, the original speech distortion weighted binaural Wiener filtering cost function is reviewed. Next, the cost function is extended by adding two terms to control the ITFs of the speech and noise component. Performance measures and the experimental setup are presented in Example IV.
Y
L
(ω)=XL
Y
R
(ω)=XR
In (1) and (2), XLm(ω) and XRm(ω) represent the speech component in the mth microphone pair. Likewise, VLm(ω) and VRm(ω) represent the noise component of the mth microphone pair. All received microphone signals are used to design the filters, WL(ω) and WR(ω), and to generate an output for the left and right ear, ZL0(ω) and ZR0(ω). ω indicates the frequency domain variable.
The following definitions will be used in the derivation of the Wiener filter extension. First, we define the 2M dimensional signal vector.
Y(ω)=[YL
As generally known, the letter T as used in equation (3) indicates that the vector (or matrix) is transposed.
In a similar fashion we write X(ω) and V(ω), where Y(ω)=X(ω)+V(ω). Next, we define the 2M-dimensional filters for the left and right hearing aid.
W
L(ω)=[WL
W
R(ω)=[WR
Using (4) and (5), we write the 4M-dimensional stacked filter,
The outputs of the left and right Wiener filter are written below.
Z
L(ω)=WLH(ω)Y(ω) ZR(ω)=WRH(ω)Y(ω) (7)
As generally known, the letter H as used in equation (7) indicates hermitian transposition.
The outputs of the left and right Wiener filters are the estimates of the speech (or noise) components in the first microphone pair. Nevertheless, the algorithm could be designed to estimate any microphone pair, more precisely, to estimate the speech or noise components in any microphone pair. For clarity, the frequency domain variable, ω, will be omitted throughout the remainder of this application.
In this example we define the desired ITFs of the speech and noise components. The cost function in example III will incorporate these desired ITFs. This is, why they are referred to as desired ITFs.
In order to preserve the ITFs of the speech and noise components, we simply have to set the desired ITFs equal to the actual ITFs. Correspondingly, the localization cues, ITD and ILD cues, of the speech and noise components can be preserved.
Alternatively, any pair of desired ITFs can be chosen. Therefore the perceived location of the speech and noise component can be manipulated.
The ITF is the ratio of the signal in the left ear to the signal in the right ear. The input speech and noise ITFs are written below.
Similarly, the ITFs of the output speech and noise components are,
In order to preserve the binaural cues of the speech and noise components, the original ITFs are selected as the desired ITFs. We assume that the original ITFs (8) to be constant1 and can be estimated, in a least squares sense, using the microphone signals. 1In the case of a single noise source, this desired noise ITF is equal to the ratio of the acoustic transfer functions between the noise source and the reference microphone signals. In this case, it can also be shown that preserving the ITF is mathematically equivalent to preserving the phase of the cross-correlation, i.e. the ITD, and preserving the power ratio, i.e. the ILD.
As commonly known, the letter ε as used in equation (10) indicates that the expectation value is formed. The index “des” stands for “desired”.
However, any set of HRTFs (head-related transfer functions) can be chosen. Therefore the direction of arrival (more precisely: the apparent direction of arrival) of the speech and noise components can be controlled. For simplicity, the desired ITFs of the speech and noise components are written in function of the desired angles of the speech and noise components, θX and θV, and frequency, ω.
HRTFXL(ω;θX) and HRTFXR(ω;θX) are the head-related transfer functions (HRTF) for the speech component of the left and right ear. Similarly, HRTFVL(ω;θV) and HRTFVR(ω;θV) are the HRTFs for the noise component of the left and right ear.
In this paper we will address both situations. First we will look at the performance of the algorithms when trying to preserve the original ITFs. Later the possibility of manipulating the ITFs of the speech and noise components will be explored.
In this example we derive the binaural multi-channel Wiener filter that performs noise reduction, while preserving the ITFs of the speech and noise component. We begin by looking at the binaural expansion of the speech distortion weighted cost function discussed in [9]. Using the reasoning from example II the cost function is manipulated to incorporate two terms used to preserve the ITFs of the speech and noise components. The final cost function contains the original speech distortion weighted terms (cf. [9]) plus two additional terms for the ITFs of the speech and noise components.
A. Original Cost Function
The multi-channel Wiener filter generates a minimum mean square error estimate of the speech component in the first microphone pair2 [1], [10]. The original binaural cost function is written as, 2However the mth microphone pair can be used.
In [9]-[11] the original cost function is split into two terms. The first term quantifies speech distortion and the second residual noise. Next a weight, μ, is added to initiate a trade-off between speech distortion and noise reduction. Analogously, this reasoning can be applied to the binaural cost function in (13). The binaural speech distortion weighted cost function is expressed below.
B. Cost Function Incorporating ITFs
In order to incorporate the ITFs of the speech and noise components, the speech distortion and residual noise vectors are broken into components that are parallel and perpendicular to the desired ITF vector. Seeing that only the direction of the desired ITF vector is important, whether preserving or manipulating the original ITFs, we can write the desired noise ITF vector as,
The decomposition of the residual noise vector is depicted in
This can be done by putting a positive weight on the perpendicular terms. Therefore our cost function is now
The speech distortion terms in (16) can be rewritten as
A similar step can be taken for the residual noise vector.
Furthermore,
for both vectors perpendicular to
Armed with (17), (19), and (20) and defining new weights, α and β, the cost function, consisting of a speech distortion term, a residual noise term and two ITF terms, is
Using the definition of the cross product, (21) can be written as (18). Next, we take the derivative of (18), set the derivative to zero, and solve for W. Since J(W) is the cost function, the optimum solution for W, i.e., the optimum filter, can be found as a zero of its derivative. The solution, i.e., the optimum filter, is expressed in matrix form below.
This notation allows us to gain some crucial insight into the filter design. Clearly, if there is no correlation between the signals at the right and left ear, the filter design is decoupled. This is logical since there are no cues to preserve. Additionally, if α and β are chosen to be zero, then the left and right filter design becomes independent. And the filters are those from the original binaural speech distortion weighted cost function in (14).
A. Experimental Setup
Two sets of simulations were run. The first set of simulations attempted to show the algorithm's ability to preserve the original ITFs of the speech and noise components. The second set of simulations showed how altering the algorithm's desired ITFs can shift the perceived location of the noise source.
The recordings used in the simulations were made in a reverberant room, T60=0:76 sec. Two behind the ear (BTE) hearing aids were placed on a CORTEX MK2 artificial head. Each hearing aid had two omni-directional microphones. The sound level measured at the center of the dummy head was 70 dB SPL. Speech and noise sources were recorded separately. All recordings were performed at a sampling frequency of 16 kHz. HINT sentences and HINT noise were used for the speech and noise signals [12].
In the simulations both microphone signals from each hearing aid were used, M=2, to estimate the speech component in the first microphone pair. The statistics were calculated offline, and access to a perfect voice activity detection (VAD) algorithm was assumed. An FFT length of 256 was used.
For the first set of simulations the speech source was located in front of the artificial head, 0°, and the noise source was located at 45°. The parameter controlling the ITF of the speech component, α, was varied from 0 to 10 and the parameter controlling the ITF of the noise component, β, was varied from 0 to 100. The parameter governing noise reduction, μ, was held constant at 1.
The same setup was used for the second set of simulations. However, this time the desired noise ITF was not the least squares estimate of the actual noise ITF, but the ITF for a source located at 225°. This ITF was calculated using the HRTFs for a source located at 225°. Again, α, was varied from 0 to 10 and β was varied from 0 to 100. The noise reduction parameter, μ, was held constant at 1.
B. Performance Measures
The purpose of the simulations is to show the effect of the parameters on ITD error, ILD error, SNR improvement, and ITF error. The ITD metric, written below, is the average over frequency bins of the absolute difference between the cosine of the phase of the input cross-correlation and the cosine of the phase of the output cross-correlation.
The second measure, expressed below, assessed the preservation of the ILD cues. The average over frequency bins of the absolute difference of the ILD of the input signals and ILD of the output signals is used.
P stands for power and ILD error is averaged over the N frequency bins. The ITF error corresponds to the ITF terms of the speech and noise component in the cost function,
WHRR
In order to quantify the noise reduction performance, the speech intelligibility weighted signal-to-noise-ratio, defined in [13], is used.
The weight, wj, emphasizes the importance of the jth ⅓-octave frequency band's overall contribution to intelligibility, and SNRj is the signal-to-noise-ratio of the jth ⅓-octave frequency band. The band definitions and the individual weights of the J frequency bands are given in [14].
C. Results and Discussion
The first set of simulations attempted to show the algorithm's ability to preserve the original ITFs of the speech and noise components.
One should begin by looking at the ITF error of the speech and noise component. Clearly, it can be seen from
In
Looking at the ILD error of the speech component, depicted in
On the other hand, the ILD cues of the noise component are clearly distorted when α and β are both zero. As β is increased, the ILD error of the noise component decreases. The parameter α has little influence on the ILD error of the noise component for β>0. Again a combination of α and β can be found that preserve the ILD cues of the speech and noise components.
Finally, the improvement in speech intelligibility weighted SNR for the left and right ear is shown in
Clearly, regardless of the values of α and β this algorithm performs good noise reduction.
Varying α and β causes some fluctuation in noise reduction performance, but the overall performance remains good. The second set of simulations is designed to show how altering the algorithm's desired ITFs can shift the perceived location of a source. In this case we focus on shifting the noise source from its original location at 45° to a new location at 225°. The main performance measure we will use is the value of the ITF terms from the cost function. The ITF error is plotted in
Additionally, by looking at
Clearly, we have shown that for the correct choice of parameters it is possible to preserve the current acoustical situation. It is even possible to alter the current acoustical situation to a more favourable one by moving noise sources. A further aspect of present invention is the automatical selection of the parameters in function of the current acoustical situation. Yet another aspect of present invention is to choose α, β, and μ to be frequency dependent. These parameters can be chosen in function of the speech and noise power in each frequency bin. It does not make sense to try to preserve the ITF of a component in a frequency bin where that component is not present. Conversely, it would be beneficial to make sure the ITF of the component is preserved when a frequency bin contains a large amount of that component. This will lead to better preservation of the localization cues and help reduce the interdependencies among the parameters.
After the more mathematically and algorithmically oriented aspects of the invention have now been described in great detail, in the following, some embodiments are described in conjunction with block-diagrammatical figures.
Input transducer units 2a,2b receive sound (in form of sound waves), and convert it into audio signals S2a,S2b, which are fed to both filtering units 5a,5b in order to be filtered, so as to reduce noise components and achieve an improved intelligibility.
ITF unit 3 also receives audio signals from input transducer units 2a and 2b and obtains therefrom at least one interaural transfer function 30 (more precisely: data representative of at least one interaural transfer function), which is fed to control inputs 55a and 55b of filtering units 5a and 5b, respectively.
Detecting unit 6a,6b, which are, e.g., embodied as voice activity detectors 6a,6b, also receive audio signals from input transducer units 2a and 2b each, and obtain therefrom voice activity signals 60a and 60b, respectively. These signals are fed to control inputs 55a and 55b of filtering units 5a and 5b, respectively.
The optimization functions of filtering units 5a,5b are identical (have the same form), comprising at least one term representing a desired interaural transfer function for wanted signals and at least one term describing a desired interaural transfer function for unwanted signals. Values to be assigned to said terms are received at said control inputs 55a and 55b, respectively.
Accordingly, filtering coefficients of filtering units 5a,5b depend on data received at said control inputs 55a,55b, respectively. If voice activity signals 60a and 60b, respectively, indicate that speech signals, i.e. wanted signals, are currently prevailing, the ITF 30 will be interpreted by filtering units 5a and 5b, respectively, as an ITF of wanted signal components. Accordingly, in the calculation of the filtering coefficients in the filtering units 5a and 5b, newly obtained values will be assigned to terms representing the desired ITF for wanted signal components.
On the other hand, if voice activity signals 60a and 60b, respectively, indicate that noise signals, i.e. unwanted signals, are currently prevailing, the ITF 30 will be interpreted by filtering units 5a and 5b, respectively, as an ITF of unwanted signal components (noise). Accordingly, in the calculation of the filtering coefficients in the filtering units 5a and 5b, newly obtained values will be assigned to terms representing the desired ITF for unwanted signal components.
This allows to generate noise-filtered audio signals S5a,S5b, in which noise is reduced, while the ITF is preserved, for both, wanted and unwanted signal components, so as to preserve binaural cues.
These signals S5a,S5b are converted by loudspeakers 9a and 9b, repsectively, into signals 11a,11b to be perceived by a user 10 of said binaural hearing system 1.
Of course, it is also possible to provide only one voice activity detector instead of two, in which case the control signal produced by this one voice activity detector would be fed to both control inputs 55a,55b.
In the following
Said audio signals S2a and S2b can comprise more than one audio signal stream, in particular if the input transducer units 2a,2b comprise more than one input transducer each.
The functional units shown in
The following
In order to optimize the use of bandwidth available for data transmission between the hearing devices 2a and 2b, the transmitted audio signals should be particularly useful audio signals. An unprocessed output of an input transducer is usually not as valuable as a signal obtained by combining signals of two or more input transducers.
In order to generate particularly useful audio signals, said preprocessors 4a,4b are used. Such a preprocessor 4a,4b has a reduced number of output audio signal streams with respect to input audio signal streams, in particular, from two or more input audio signal stream, one single output audio signal stream is obtained, referred to as preprocessed audio signals S4a, S4b. Such a preprocessor 4a,4b can implement, e.g., a beamformer or a compression algorithm.
There is also one other way of optimizing the use of the available bandwidth shown in
The second input of ITF units 3a,3b is fed with un-preprocessed audio signals from the input transducer unit 2a,2b comprised in the same hearing device 1a,1b as the corresponding ITF unit 3a,3b. It is possible to use preprocessed audio signals S4a,S4b instead.
In conjunction with the transmission, in particular wireless transmission, of data between different devices of the binaural hearing system, it is actually only necessary to comprise a sender 7 in one device 1a, and a receiver 8 in another device 1b. The other functional units may be comprised in the same or in other devices. E.g., communication from one hearing device to the other hearing device may, in part, take place indirectly, via a third device. Such a third device may, e.g., be worn at a necklace. A third device may be much less restricted with respect to energy consumption and/or to transmission intensity and/or bandwidth. Such a third device may furthermore provide processing power, e.g., for implementing signal processing, e.g., for preprocessing and/or filtering.
It is to be noted that also remote microphones can be an input transducer unit or be comprised therein. As input audio signals for ITF units, nevertheless, it is usually strongly preferred to use audio signals from input transducers located in or near the left and right ear, respectively, of a user. But for the noise reduction aspect, remote microphones can be very useful.
Since only one ITF unit 3 is provided, the ITF data 30 have to be transmitted from hearing device 1b to hearing device 1a. The amount of data per time of the ITF data 30 is in principle the same as the amount of data per time of one audio signal stream. But the ITF usually will not change very fast, since sound sources usually do not move very fast. Therefore, it is possible to save data transmission bandwidth by transmitting not the full ITF data as obtainable from the audio signals; e.g. by transmitting only a portion of said full ITF data. In
E.g., it is possible to compress the ITF data 30. It is also possible to transmit data related to the ITF only when the ITF changes more than by a prescribed amount. It is also possible to use a smaller sampling rate for said data-reduced form 30′ and/or to use a smaller resolution therefor, e.g., by a smaller bit depth.
Instead of providing both filtering units 5a,5b with said data-reduced form 30′ of the ITF, it is possible to arrange data reducing unit 35 in the location indicated by the dotted rectangular in
Accordingly, instead of having preprocessing units 4a,4b separate from filtering units 5a,5b, the preprocessing units 4a,4b are quasi comprised in filtering units 5a and 5b, respectively.
Just like in the other embodiments and as shown in
Typically, as shown in
Said separate filtering is indicated in
Instead of first adding up said audio signal streams obtained by filtering audio signal streams S2a in filtering sub-units 50a in summing unit 51a and then delaying the resulting audio signals in delay unit 54a, it is also possible to first delay each of said audio signal streams obtained by filtering audio signal streams S2a in filtering sub-units 50a and then summing up the delayed audio signal streams. The latter variant, however, is not shown in
Said first-mentioned variant has the advantage, that the audio signals outputted by summing unit 51a can, even without further processing, be used as preprocessed audio signals S4a to be transmitted to device 1b.
Since, after said filtering in sub-units 50a,50b, basically only an adding of audio signals takes place in filtering units 5a,5b before obtaining audio signals S5a,S5b, the particular way of preprocessing according to the embodiment of
Accordingly, this embodiment provides—with respect to embodiments with preprocessors 4a,4b separate from filtering units 5a,5b carrying out separate calculations—an enhanced noise reduction at practically no computing cost, and—with respect to an embodiment, in which all audio signals S2a,S2b are transmitted to the respective other device—a reduced amount of data to be transmitted at nearly the same noise reduction performance.
It is, as shown in
In embodiments as described with respect to
It is to be noted that in an embodiment as shown in
In a particular view onto the invention, the present invention concerns an improvement of the binaural multi-channel Wiener filtering based noise reduction algorithm. The goal of this extension is to preserve both the interaural time delay (ITD) and interaural level difference (ILD) of the speech and noise components. This is done by extending the underlying cost function to incorporate terms for the interaural transfer functions (ITF) of the speech and noise components. Using weights, the emphasis on the preservation of the ITFs can be controlled in addition to the emphasis on noise reduction. Adapting these parameters allows one to preserve the ITFs of the speech and noise component, and therefore ITD and ILD cues, while enhancing the signal-to-noise ratio. Additionally, the desired ITFs can be replaced by known ITFs for a specific direction of arrival. Preserving these desired ITFs allows one to change the direction of arrival of the speech and noise sources.
Note: Several documents are cited throughout the text of this specification. Each of the documents herein (including any manufacturer's specifications, instructions etc.) are hereby incorporated by reference; however, there is no admission that any document cited is indeed prior art of the present invention.
[1] T. Klasen, T. Van den Bogaert, M. Moonen, and J. Wouters, “Binaural noise reduction algorithms for hearing aids that preserve interaural time delay cues,” IEEE Trans. on Sig. Proc., vol. 55, no. 4, April 2007.
[2] T. Van den Bogaert, T. Klasen, L. Van Deun, J.Wouters, and M. Moonen, “Horizontal localization with bilateral hearing aids: without is better than with,” J. Acoust. Soc. Amer., col. 119, no. 1, January 2006.
[3] J. Desloge, W. Rabinowitz, and P. Zurek, “Microphone-Array Hearing Aids with Binaural Output-Part I: Fixed-Processing Systems,” IEEE Trans. Speech Audio Processing, vol. 5, no. 6, pp. 529-542, November 1997.
[4] N. Erber, “Auditory-visual perception of speech,” J. Speech Hearing Dis., vol. 40, pp. 481-492, 1975.
[5] M. L. Hawley, R. Y. Litovsky, and J. F. Culling, “The benefit of binaural hearing in a cocktail party: Effect of location and type of interferer,” J. Acoust. Soc. Amer., vol. 115, no. 2, pp. 833-843, February 2004.
[6] J. Peissig and B. Kollmeier, “Directivity of binaural noise reduction in spatial multiple noise-source arrangements for normal and impaired listeners,” J. Acoust. Soc. Amer., vol. 101, no. 3, pp. 1660-1670, 1997.
[7] W. Hartmann, “How We Localize Sound,” Physics Today, pp. 24-29, November 1999.
[8] S. Doclo, R. Dong, T. Klasen, J. Wouters, S. Haykin, and M. Moonen, “Extension of the multi-channel Wiener filter with ITD and ILD cues for noise reduction in binaural hearing aids,” in Proc. IWAENC, Eindhoven, The Netherlands, September 2005.
[9] A. Spriet, M. Moonen, and J. Wouters, “Spatially pre-processed speech distortion weighted multi-channel Wiener filtering for noise reduction,” Signal Processing, vol. 84, no. 12, pp. 2367-2387, December 2004.
[10] S. Doclo and M. Moonen, “GSVD-Based Optimal Filtering for Single and Multi-Microphone Speech Enhancement,” IEEE Trans. Signal Processing, vol. 50, no. 9, pp. 2230-2244, September 2002.
[11] S. Doclo, A. Spriet, J. Wouters, and M. Moonen, Speech Enhancement. Springer-Verlag, 2005, ch. Speech Distortion Weighted Multichannel Wiener Filtering Techniques for Noise Reduction, pp. 199-228.
[12] M. Nilsson, S. Soli, and J. Sullivan, “Development of the hearing in noise test for the measurement of speech reception thresholds in quiet and in noise,” J. Acoust. Soc. Amer., vol. 95, pp. 1085-1096, 1994.
[13] J. Greenberg, P. Peterson, and Z. P. M., “Intelligibility-weighted measures of speech-to-interference ratio and speech system performance,” J. Acoust. Soc. Amer., vol. 94, no. 5, pp. 3009-3010, November 1993.
[14] Acoustical Society of America, “American National Standard Methods for Calculation of the Speech Intelligibility Index,” in ANSI S3.5-1997, 1997.
1 hearing system, binaural hearing system
1
a device, hearing device, hearing-aid device
1
b device, hearing device, hearing-aid device
2
a input transducer unit
2
b input transducer unit
21
a,21b,22a,22b input transducer
3 ITF means, ITF unit
3
a,3b ITF unit
30,30a,30b ITF, data representative of interaural transfer function
30′ data-reduced ITF
35 data reducing unit
4,4a,4b preprocessing unit, preprocessor
5 noise reduction means
5
a,5b filtering unit, adaptive filter, Wiener filter
50
a,50b filtering sub-unit
51
a,51b summing unit
52
a,52b summing unit
54
a,54b delay unit
55
a,55b control input
6
a,6b detecting unit, voice activity detector
60
a,60b control signal, indication, voice activity signal
7,71a,72b,73b sender, sending unit
8,81b,82a,83a receiver, receiving unit
9,9a,9b output transducer unit, output transducer, loudspeaker
10 individual, user
11
a,11b signals to be perceived by user
14 source of wanted signals, speaker
15 source of unwanted signals
78 link, communication link, wireless link
S2a,S2b audio signals
S4,S4a,S4b preprocessed audio signals
S5a,S5b noise-filtered audio signals
Number | Date | Country | Kind |
---|---|---|---|
0609248.0 | May 2006 | GB | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2007/054468 | 5/9/2007 | WO | 00 | 5/1/2009 |