The present invention relates generally to sound processing, and more particularly, to sound processing based on a confidence measure.
Auditory or hearing prostheses include, but are not limited to, hearing aids, middle ear implants, cochlear implants, auditory brainstem implants (ABI's), auditory mid-brain implants, optically stimulating implants, middle ear implants, direct acoustic cochlear stimulators, electro-acoustic devices and other devices providing acoustic, mechanical, optical, and/or electrical stimulation to an element of a recipient's ear. Such hearing prostheses receive an electrical input signal, and perform processing operations thereon so as to stimulate the recipient's ear. The input is typically obtained from a sound input element, such as a microphone, which receives an acoustic signal and provides the electrical signal as an output. For example, a conventional cochlear implant comprises a sound processor that processes the microphone signal and generates control signals, according to a pre-defined sound processing strategy. These control signals are utilized by stimulator circuitry to generate the stimulation signals that are delivered to the recipient via an implanted electrode array.
A common complaint of recipients of conventional hearing prostheses is that they have difficulty discerning a target or desired sound from ambient or background noise. At times, this inability to distinguish target and background sounds adversely affects a recipient's ability to understand speech.
Aspects of the present invention are generally directed to providing a noise reduction process. This aspect of the invention implements an insight identified by the inventors that auditory stimulation device recipients tend to deal poorly with a competing noise when trying to perceive speech and that by relatively aggressively removing noise from signals used to stimulate the auditory stimulation device, speech perception may be enhanced. This can be implemented by providing a signal processing system which outputs a noise reduced signal that has a relatively high distortion ratio.
Embodiments of the present invention are described below with reference to the drawings in which:
Certain aspects of the present invention are generally directed to a system and/or method for noise reduction in a sound processing system. In the illustrative method, a sound signal, having both noise and desired components, is received as an electrical representation. At least one estimate of a noise component is generated based thereon. This estimate, referred to herein as a noise component estimate, is an estimate of one noise component of the received sound. Such noise component estimates may be generated from different sounds, different components of a sound, and/or generated using different methods.
The illustrative method in accordance with embodiments of the present invention further includes generating a measure that allows for objective or subjective verification of the accuracy of the noise component estimate. The measure, referred to herein as a confidence measure, allows for the determination of whether the noise component estimate is likely to be reliable. In some embodiments, the noise component estimate is based on one or more assumptions. In certain such embodiments, the confidence measure may provide an indication of the validity of such assumptions. In another embodiment, the confidence measure can indicate whether a noise component of the received sound (or the desired signal component) possesses characteristics which are well suited to the use of a given noise component estimation technique.
As described in greater detail below, the confidence measure is used during sound processing operations to process the received electrical representation. For example, in the noted application of a hearing prosthesis, the output is usable for generating stimulation signals (acoustic, mechanical, electrical) for delivery to a recipient's ear. In certain embodiments, generating an estimate of a noise component may include, for example, generating a signal-to-noise ratio (SNR) estimate of the component.
The confidence measure may be used during processing for a number of different purposes. In certain embodiments, the confidence level is used in a process that selects one of a plurality of signals for further processing and use in generating stimulation signals. In other embodiments, the confidence level is used to scale the effect of a noise reduction process based on a noise parameter estimate. In such embodiments, the confidence measure is used as an indication of how well the noise parameter estimate is likely to reflect the actual noise parameter in the electrical representation of the sound. In specific such embodiments, a plurality of noise parameter estimates are generated and the confidence measure is used to choose which of the noise parameter estimates should be used in further processing.
The confidence measure may be generated using a number of different methods. In one embodiment, in a system with multiple input signals, the confidence measure is determined by comparing two or more of the input signals. In one example, a coherence between two input signals can be calculated. A statistical analysis of a signal (or signals) can be used as a basis for calculating a confidence measure.
Additionally, certain embodiments of the present invention are generally directed to a method of selecting which of a plurality of input signals should be selected for use in generating stimulation signals for delivery to a recipient via electrodes of an implantable electrode array. That is, embodiments of the present invention are directed to a channel selection method in which input signals are selected on the basis of the psychoacoustic importance of each spectral component, and one or more additional signal characteristics. In certain embodiments, the psychoacoustic importance is a speech importance weighting of the spectral component. The additional channel characteristics may be, for example, channel energy, channel amplitude, a noise component estimate of the sound input signal (such as a noise or SNR estimate), and/or a confidence measure associated with a noise component estimate. In certain embodiments, the channel selection method is part of an “n of m” channel selection strategy, or a strategy that selects all channels fulfilling a predetermined channel selection criterion.
Still other aspects of the present invention are generally directed to a system and/or method that generates a signal-to-noise ratio (SNR) estimate on the basis of two or more independently-derived SNR estimates. The generated SNR estimate is used to generate a noise reduced signal. In such embodiments, the independent SNR estimates can be derived either from different signals and/or using different SNR estimation techniques. In certain embodiments, the system includes multiple microphones each of which may generate an independent sound input signal. An SNR estimate can be generated for each sound input signal. In an alternative embodiment, sound input signals may be generated by combining the outputs of different subsets of microphones. If the inputs come from different sources, the same SNR estimation technique may be used for each input. However, if the sound input signals come from the same source, then different SNR techniques are needed to give independent estimates.
The process for generating an SNR estimate from the two or more independently-derived SNR estimates may be performed in a number of ways, such as averaging more than one SNR estimate, choosing one of the multiple SNR estimates based on one or more criteria. For example, the highest or lowest SNR estimate could be selected. The independently-derived SNR estimates may be derived using a conventional method, or derived using one of the novel SNR estimation techniques described elsewhere herein.
In some embodiments, an SNR estimate may be used in the processing of a frequency channel (either a frequency channel from which it has been derived, but possibly a different frequency channel) to generate an output signal having a reduced noise level. In one embodiment, this may include using the SNR estimate to perform noise reduction in the channel. In another embodiment the SNR estimate may, additionally or alternatively, be used as a component (or sole input in some cases) in a channel selection algorithm of cochlear implant. In yet another embodiment, the SNR estimate can, additionally or alternatively, be used to select an input signal to be used in either of the above processes.
In another embodiment there is provided a method which uses a confidence measure in the combination or selection of SNR estimates. In one form, the method uses a single confidence measure to reject a corresponding SNR estimate. Other embodiments may be implemented in which each SNR estimate has an associated confidence measure that is used for combining the SNR estimates, by performing a weighted sum or other combination technique.
In one embodiment, two SNR estimates are generated for each input signal. The two SNR estimates include one assumptions-based SNR estimate and one statistical model-based SNR estimate. Most preferably the assumptions-based SNR estimate is based on a directional assumption about the noise or signal and the statistical model-based SNR estimate is non-directional. In some circumstances the statistical model-based estimate will provide a more reliable estimate of SNR (e.g., circumstances with stationary noise) and in other circumstances the assumptions-based SNR estimate will work well (e.g. in circumstances where the assumptions on which the SNR estimate hold). A confidence measure for each SNR estimate can be used to determine which SNR estimate should be used in further processing of the input signal. The selection of the SNR estimate with the best confidence measure allows this embodiment to the changing circumstances.
In another embodiment an SNR estimate can be used in a channel selection process in a neural stimulation device. In certain embodiments, a so called “n of m” channel selection strategy is performed. In this process up to n channels are selected for continued processing from the possible m channels available, on the basis of an SNR estimate.
In some embodiments a combination of an SNR estimate and one or more additional channel based criteria, including but not limited to, speech importance, amplitude, masking effects, can be used for channel selection.
In an additional aspect there is provided an method of performing a statistical model-based noise estimation. The method uses an analysis window which varies with channel frequency when determining channel statistics. In a preferred form a short analysis window is used for high frequency channels and longer analysis windows for lower frequency channels.
In an additional aspect there is provided an assumptions-based SNR estimation method. This SNR estimation method is based on assumptions about the spatial distribution of certain components of a received sound.
For a received sound signal one or more spatial fields are defined e.g. by filtering inputs from an array of omnidirectional microphones or using directional microphones. The spatial fields can then be defined as either being “signal” or “noise” and SNR estimates calculated. In one embodiment it is assumed that a desired signal will originate from an area that is in front of a user, and noise will originate from either behind or areas other than in front of the user. In this case the front and rear spatial components can be used to derive a SNR estimate, by dividing the front spatial component by the rear spatial component.
Monaural or binaural implementations are possible. In one binaural implementation, a common “noise” component is used for calculating both the left- and right-side SNR estimates. In this case, each of the left and right channels maintain separate front facing signal components.
In another aspect, there is provided a method of compensating for, or correcting, noise estimates in a sound processing system. In this method a frequency dependent compensation factor is generated by applying a calibration sound with equal (or at least known) energy (signal and noise) in each frequency channel. The outputs of the noise estimation process at a plurality of frequencies are analyzed and a correction factor is determined for each channel that, when applied, will cause the noise or SNR estimates to be substantially equal (or correctly proportioned if a non-equal calibration signal is used).
In yet another aspect, there is provided a noise reduction process. The noise reduction process includes, applying a gain to the signal that at least partially cancels a noise component therein. The gain value applied to the signal is selected from a gain curve that varies with SNR.
In one form the gain function is a binary mask, which applies a gain of zero (0) for signals with an SNR worse than a preset threshold, and a gain of one (1) for SNR better than the threshold. The threshold SNR level is preferably above 0 dB.
Alternatively, a smooth gain curve may be used. Such gain curves can be represented by a parametric Wiener function. In one embodiment the gain curve has an absolute threshold (or −3 dB knee point) at around 5 dB or higher.
In one embodiment implemented in cochlear implants, a gain curve that has any section which lies between a parametric Wiener gain function parameter values of a=0.12 and R=20, and a parametric Wiener gain function parameter values of a=1 and f3=20, over the range of instantaneous SNRs between the −5 and 20 dB instantaneous SNR range is suitable. In some cases a substantial portion of the gain curve for a region between the −5 and 20 dB instantaneous SNR levels lies within the parametric Wiener gain functions noted above. A majority, or all, of the gain curve used can lie in the specified region.
If the SNR estimate has an associated confidence measure, the confidence measure can be used to modify the application of gain to the signal. Preferably, if the SNR estimate has a low confidence measure the level of gain application is reduced (possibly to 1, i.e., the signal is not attenuated), but if the confidence measure related to the SNR estimate is high, the noise reduction is performed.
In another aspect, a signal selection process can be performed prior to either noise reduction or channel selection as described above.
In some embodiments a sound processing system can generate multiple signals which could be used for further sound processing, for example, a raw input signal or spatially limited signal generated from one or more raw input signals. In the case where the assumptions underpinning the generation of a spatially limited signal hold, the spatially limited signal is already noise reduced, because it is limited to including sound arriving from a direction which corresponds to an expected position of a wanted sound. In contrast, in certain environments, e.g. places with echoes, the spatially limited signal will include noise. Thus the process includes selecting a signal, from the available signals, for further processing. The selection is preferably based on a confidence measure associated with an SNR estimate related to one or more of the available signals.
Illustrative embodiments of the present invention will be described with reference to one type of processing system, a hearing prosthesis referred to as a cochlear implant. A cochlear implant is one of a variety of hearing prostheses that provide electrical stimulation to a recipient's ear. Other such hearing prostheses include, for example, ABIs and AMIs. These and other hearing prostheses that provide electrical stimulation are generally and collectively referred to herein as electrical stimulation hearing prostheses. However, it would be appreciated that embodiments of the present invention are applicable to sound processing systems in general, and thus may be implemented in other hearing prosthesis or other sound processing systems.
In a fully functional ear, outer ear 101 comprises an auricle 110 and an ear canal 102. An acoustic pressure or sound wave 103 is collected by auricle 110 and is channeled into and through ear canal 102. Disposed across the distal end of ear cannel 102 is the tympanic membrane 104 which vibrates in response to the sound wave 103. This vibration is coupled to oval window or fenestra ovalis 112 through three bones of middle ear 105, collectively referred to as the ossicles 106 and comprising the malleus 108, the incus 109 and the stapes 111. Bones 108, 109 and 111 of middle ear 105 serve to filter and amplify sound wave 103, causing oval window 112 to articulate, or vibrate in response to vibration of tympanic membrane 104. This vibration sets up waves of fluid motion of the perilymph within cochlea 140. Such fluid motion, in turn, activates tiny hair cells (not shown) inside of cochlea 140. Activation of the hair cells causes appropriate nerve impulses to be generated and transferred through the spiral ganglion cells (not shown) and auditory nerve 114 to the brain (also not shown) where they are perceived as sound.
Cochlear implant 100 comprises an external component 142 which is directly or indirectly attached to the body of the recipient, and an internal component 144 which is temporarily or permanently implanted in the recipient. External component 142 typically comprises one or more sound input elements, such as microphone 124 for detecting sound, a sound processing unit 126, a power source (not shown), and an external transmitter unit 128. External transmitter unit 128 comprises an external coil 130 and, preferably, a magnet (not shown) secured directly or indirectly to external coil 130. Sound processing unit 126 processes the output of microphone 124 that is positioned, in the depicted embodiment, adjacent to the auricle 110 of the user. Sound processing unit 126 generates encoded signals, which are provided to external transmitter unit 128 via a cable (not shown).
Internal component 144 comprises an internal receiver unit 132, a stimulator unit 120, and an elongate electrode assembly 118. Internal receiver unit 132 comprises an internal coil 136, and preferably, a magnet (also not shown) fixed relative to the internal coil. Internal receiver unit 132 and stimulator unit 120 are hermetically sealed within a biocompatible housing, sometimes collectively referred to as a stimulator/receiver unit. The internal coil receives power and stimulation data from external coil 130, as noted above. Elongate electrode assembly 118 has a proximal end connected to stimulator unit 120, and a distal end implanted in cochlea 140. Electrode assembly 118 extends from stimulator unit 120 to cochlea 140 through the mastoid bone 119, and is implanted into cochlea 140. In some embodiments, electrode assembly 118 may be implanted at least in basal region 116, and sometimes further. For example, electrode assembly 118 may extend towards apical end of cochlea 140, referred to as the cochlear apex 134. In certain circumstances, electrode assembly 118 may be inserted into cochlea 140 via a cochleostomy 122. In other circumstances, a cochleostomy may be formed through round window 121, oval window 112, the promontory 123 or through an apical turn 147 of cochlea 140.
Electrode assembly 118 comprises an electrode array 146 including a series of longitudinally aligned and distally extending electrodes 148, disposed along a length thereof. Although electrode array 146 may be disposed on electrode assembly 118, in most practical applications, electrode array 146 is integrated into electrode assembly 118. As such, electrode array 146 is referred to herein as being disposed in electrode assembly 118. Stimulator unit 120 generates stimulation signals which are applied by electrodes 148 to cochlea 140, thereby stimulating auditory nerve 114.
Because the cochlea is tonotopically mapped, that is, partitioned into regions each responsive to stimulus signals in a particular frequency range, each electrode of the implantable electrode array 146 delivers a stimulating signal to a particular region of the cochlea. In the conversion of sound to electrical stimulation, frequencies are allocated to individual electrodes of the electrode assembly. This enables the hearing prosthesis to deliver electrical stimulation to auditory nerve fibers, thereby allowing the brain to perceive hearing sensations resembling natural hearing sensations. In achieving this, processing channels of the sound processing unit 126, that is, specific frequency bands with their associated signal processing paths, are mapped to a set of one or more electrodes to stimulate a desired nerve fiber or nerve region of the cochlea. Such sets of one or more electrodes for use in stimulation are referred to herein as “electrode channels” or “stimulation channels.”
In cochlear implant 100, external coil 130 transmits electrical signals (i.e., power and stimulation data) to internal coil 136 via a radio frequency (RF) link. Internal coil 136 is typically a wire antenna coil comprised of multiple turns of electrically insulated single-strand or multi-strand platinum or gold wire. The electrical insulation of internal coil 136 is provided by a flexible silicone molding (not shown). In use, implantable receiver unit 132 maybe positioned in a recess of the temporal bone adjacent auricle 110 of the recipient.
As will be appreciated, embodiments of the present invention may be implemented in a mostly or fully implantable hearing prosthesis, bone conduction device, middle ear implant, hearing aid, or other prosthesis that provides acoustic, mechanical, optical, and/or electrical stimulation to an element of a recipient's ear. Moreover, embodiments of the present invention may also be implemented in voice recognition systems or a sound processing codec used in, for example, telecommunications devices such as mobile telephones and the like.
In embodiments of the present invention, the primary input to input signal generator 202 will be the electrical outputs of one or more microphones that receive an acoustic sound signal. However, other types of transducers, such as a telecoils (T-mode input), or other inputs may also be used. In implementations that are used to provide hearing assistance to a recipient of a cochlear implant or other hearing prosthesis, the input signal may be delivered via a separate electronic device such as a telephone, computer, media player, other sound reproduction device, or a receiver adapted to receive data representing sound signals, e.g. via electromagnetic waves. An exemplary input signal generator 202 is described further below with reference to
As shown in
As shown, noise estimator 204 includes three noise component estimators 205. A first noise component estimator 205A uses a statistical model based process to create at least one noise component estimate 213A. A second noise component estimator 205B creates a second noise component estimate 213B on the basis of a set of assumptions of, for example, such as the directionality of the sound received. Other noise estimates 213C may additionally be generated by noise component estimator 205C.
Noise estimator 204 also includes a confidence determinator 207. Confidence determinator 207 generates at least one confidence measure for one or more of the noise component estimates generated in blocks 205. A confidence measure may be determined for each of the noise estimates 213 or, in some embodiments, a single confidence measure for one of the noise estimates could be generated. A single confidence measure may be used in, for example, a system where only two noise estimates are derived.
The confidence measure(s) are processed, along with the noise estimate and a corresponding input signal. For example, the confidence measure(s) for one or more of the noise estimates can be used to create a combined noise estimate that is used in later processing, as described below. Additionally, a confidence value for one or more noise estimates could be used to select or scale an input signal during later processing. In this case the confidence measure may be viewed as an indication of how well the noise component estimate is likely to reflect the actual noise component of the signal representing the sound. In some embodiments, a plurality of noise component estimates can be made for each signal. In this case the confidence measure can be used to choose which of the noise component estimates to be used in further processing or to combine the plurality of noise component estimates into a single, combined noise component estimate for the signal.
The confidence measure is calculated to reflect whether or not a noise component estimate is likely to be reliable. In one embodiment the confidence measure can indicate the extent to which an assumption on which a noise parameter estimate is based holds. In another embodiment, the confidence measure can indicate whether a noise parameter of a sound (or desired signal component) possesses characteristics which are well suited to the use of a given noise parameter estimation technique. In a system with multiple input signals, the confidence measure can be determined by comparing two or more of the input signals. In one example, coherence between two input signals can be calculated. A statistical analysis of a signal (or signals) can be used as a basis for calculating a confidence measure.
Noise estimation block 204 also includes an estimate output stage 209 in which a plurality of noise estimates are processed to determine a final noise estimate 211. Stage 209 generates the final output by, for example, combining the noise component estimates or selecting a preferred noise estimate from the group. Noise estimation within noise estimation block 204 may be performed on a frequency-by-frequency basis, a channel-by-channel basis, or on a more global basis, such as across the entire frequency spectrum of one or a group of input signals.
System 200 also includes a noise compensator 206 that compensates for systematic over or under or, estimation of one or more of the noise estimation processes performed by noise estimator 204. Additionally, system 200 includes a signal-to-noise (SNR) estimation block 208. SNR estimation block 208 operates similar to block 204, but instead of generating noise estimates, SNR estimates are generated. In this regard, SNR estimator 208 includes a plurality of component SNR estimators 215. SNR estimators 215 may operate by processing a signal estimate with a corresponding noise estimate generated by a corresponding noise estimation block 205 described above. Each of the generated SNR estimates 223 may be provided to confidence determinator 217 for an associated confidence measure calculation. The confidence measure for an SNR estimate can be the confidence measure from a noise estimate corresponding to the SNR estimate or a newly generated estimate. As with the noise estimator 204, the SNR estimator 208 may include an output stage 219 in which a single SNR estimate 221 is generated from the one or more SNR estimates generated in blocks 215.
As shown in
SNR reducer 210 also includes a gain determinator 227 that uses a predefined gain curve to determine a gain level to be applied to an input signal, or spectral component of the signal. Optionally, the application of the gain curve can be adjusted in by gain scaler 229 based on, for example, a confidence measure corresponding to either a SNR or noise value of the corresponding signal component. Next, gain stage 231 applies the gain to the signal input to generate a noise reduced output 233.
System 200 also includes a channel selector 212 that is implemented in hearing prosthesis, such as cochlear implants, that use different channels to stimulate a recipient. Channel selector 212 processes a plurality of channels, and selects a subset of the channels that are to be used to stimulate the recipient. For example, channel selector 212 selects up to a maximum of N from a possible M channels for stimulation.
The utilized channels may be selected based on a number of different factors. In one embodiment, channels are selected on the basis of an SNR estimate 235A. In other embodiments, SNR estimate 235 may be combined at stage 239 with one or more additional channel criteria, such as a confidence measure 235B, a speech importance function 235C, an amplitude value 235D, or some other channel criteria 235E. In certain embodiments, the combined values may be used in stage 241 for selecting channels. The channel selection process performed at stage 239 may implement an N of M selection strategy, but may more generally be used to select channels without the limitation of always selecting up to a maximum of N out of the available M channels for stimulation. As will be appreciated, channel selector 212 may not be required in a non-nerve stimulation implementation, such as a hearing aid, telecommunications device or other sound processing device.
As such, embodiments of the present invention are directed to a noise cancellation system and method for use in hearing prosthesis such as cochlear implant. The system/method uses a plurality of signal-to-noise-Ratio (SNR) estimates of the incoming signal. These SNR estimates are used either individually or combined (e.g., on a frequency-by-frequency basis, channel by channel basis or globally) to produce a noise reduced signal for use in a stimulation strategy for the cochlear implant. Additionally, each SNR estimate has a confidence measure associated with it, that may either be used in SNR estimate combination or selection, and may additionally be used in a modified stimulation strategy.
In accordance with certain embodiments of the present invention, sound processing system 230 may, for example, form part of a signal processing chain of a Nucleus® cochlear implant, produced by Cochlear Limited. In this illustrative implementation, the outputs from FFT stages 236A, 236B will be summed to provide 22 frequency channels which correspond to the 22 stimulation electrodes of the Nucleus® cochlear implant.
The outputs from the two FFT stages 236A, 236B are passed to a noise estimation stage 238, and a signal-to-noise ratio (SNR) estimator 240. In turn, the SNR estimator 240 will pass an output to a gain stage 242 whose output will be combined with the output of processor 244 prior to downstream channel selection by the channel selector 246. The output of the channel selector 246 can then be provided to a receiver/stimulator of an implanted device e.g. device 132 of
As noted above with reference to
In component noise estimator 250, a minimum statistics algorithm is used to determine the environmental noise power on each channel through a recursive assessment of input signal 252. The statistical model based noise estimator 250 used in this example includes three main sub blocks:
In use, the current signal estimate, SE that is output from signal estimator 254 is fed back to the input (SE in) of signal estimation block 254 via a unit delay block 260. Similarly, value alpha (a), from block 256, is passed back to the input (Alpha) of signal estimator 254 via a unit delay block 262. Thus, the signal estimate input (SE in) and Alpha inputs to the signal estimator 254 are from a previous time period.
In certain embodiments of the present invention, the statistics based noise estimation process described in connection with
Following noise estimation, it may be necessary to compensate the noise estimates in some frequency bands to correct for systematic errors. To this end the noise estimator 250 can be followed by a bias compensation block 264 that corresponds to noise compensator 206 described above with reference to
Bias compensation block 264 applies a frequency dependent bias factor to scale the ENE value 266 at each frequency. In order to calibrate the biasing gain applied by the block 264, white noise is provided as an input signal 252 to the system 250, and the output ENE 266 values are recorded for each frequency band. The ENE value 266 in each frequency band is then biased so that in each band the average of the white noise applied is estimated. These calibration biasing factors are then stored for future use.
The noise estimate generated using this statistical model based approach can also be used in a subsequent SNR estimation process (such as is described above with reference to SNR estimator 208 of
For each channel or frequency band, a signal-to-noise ratio is able to be calculated from the estimate of environmental noise (ENE) and the input signal (SIG) itself using the equations below:
If the estimate of the noise is assumed to be the actual noise floor, then ENE=noise2 and,
Accordingly the SNR can be calculated from the input signal (SIG), which equals (signal+noise)2 and the ENE, by
where:
SIG is the input signal to the system; and
Accordingly, using the processing system of
As described above with reference to confidence determinator 207 of
where:
conf is the confidence measure of the associated noise or SNR estimate; SIGas is the signal during periods of predominantly noise; ENE dB is the environmental noise estimate during periods of predominantly noise; and k is a pre defined constant that can be used to vary system sensitivity by scaling the confidence value.
When the confidence measure (conf) is high, (i.e., close to 1), then the statistics based noise estimate is providing a good estimate of the noise level. If conf is low, (i.e., close to 0) then the statistics bases noise estimate is providing a poor estimate of the noise level.
Such a confidence calculation can be performed on the noise estimate for each frequency band or channel. However, in certain embodiments, the confidence measure for multiple channels can be combined to provide an overall confidence measure for whole noise or SNR estimation mechanism. Combination of the confidence measures of several channels may be performed by multiplying the channel confidence values for each the group of channels together, or through some other mechanism, such as averaging.
The SNR estimate generated from the statistical-model-based method may also have a confidence measure associated with it either by assigning it the confidence measure associated with its corresponding noise estimation, or by calculating a separate value.
As noted above with reference to
Further embodiments of the present invention are described below. The first embodiment, described with reference to
The system 300 receives a sound signal at the omnidirectional microphones 301 of microphone array 391, and generates time domain analog signals 302. Each of the inputs 302 are converted to digital signals (e.g. using ADCs, such as ADCs 234 from
Embodiments of the present invention are generally described in a manner that will optimize performance when sounds of interest arrive from the front of the recipient, such as in a typical conversation. Accordingly, in this case, the first polar response pattern is a front facing cardioid, which effectively cancels all signal contribution from behind. The second polar response pattern is a rear facing cardioid which effectively cancels all signal contribution from the front. These directional signals are directly used to represent the signal and noise components of a received sound signal. Alternatively, these directional signals may be averaged across multiple FFT frames so as to introduce smoothing over time into the signal and noise estimates.
Each polar response pattern is created from the input signal data 306A, 306B by applying a complex valued frequency domain filter (T,N) (308, 310) to one of the input signals. In this case, only the processed input 306B enters the filters 308, 310. The filtered outputs 312A, 312B are then subtracted from the unfiltered signal 306A of the other microphone.
The filter coefficients T and N of filters 308 and 310 respectively, are chosen to define the sensitivity of the front facing and rear facing cardioids. More specifically, the coefficients are chosen such that the front facing cardioid has maximum sensitivity to the forward direction and minimal sensitivity to the rear direction when the microphone array is worn by a user. The coefficients are shown such that the rear facing cardioid is the opposite, and has maximum sensitivity to the rear direction and minimum sensitivity to the front direction.
Returning to
The output 306B from FFT stage 304B is also passed to a second signal path and filtered by filter N 310, before being subtracted from the output 306A derived from the first microphone 301A. This signal 314B is converted to an energy value in block 318, by squaring the real and imaginary components in each bin and summing them. This generates an output value (cb). Because of the assumptions on which this processing scheme is based, the value cb is assumed to be an estimate of the noise energy in the sound signal received at microphones 301A, 301B. Thus, calculation of the value cb provides an example of the generation of a noise estimate as performed in block 215B of
Next in block 320 a corresponding signal-to-noise ratio is calculated by dividing cf by cb, which effectively represents a ratio of the forward facing energy in the received sound signal (cf) and the rearward facing energy in the received sound signal (cb). Next 322, this signal-to-noise ratio is converted to decibels. Thus, blocks 320,322 implement the block 208B illustrated in
As would be appreciated, it is desired to calibrate the system for proper filter coefficients T and N. The two filters can be calibrated by placing the device, or more specifically microphone array 391 in an appropriate acoustic environment and using a least means square update procedure to minimize the cardioid output signal energy.
Sound processing system 500 of
For the directional noise and SNR estimates described above, a measure of confidence may also be generated. In certain embodiments, the confidence measure may be based on the coherence of the two microphone input signals 302A, 302B that are used to create the directional signals. High coherence (i.e., close to 1) indicates high correlation between the two microphone outputs and indicates that there is strong directional information in the received sound signals. This correlation consequently indicates that there is a high confidence in the measured signal-to-noise ratio. On the other hand, a low coherence (i.e., close to 0), indicates uncorrelated microphone signals, such as can occur in conditions of high reverberation, turbulent air flow etc. This low coherence indicates low confidence in the measured signal-to-noise ratio. The coherence between the microphone inputs can be calculated as follows in a two microphone system.
Where Sx and Sy are the complex frequency spectrums of the two microphones' signals 302A and 302B used to create cf:
is the coherence.
The auto-power spectrums, Pxx and Pyy, are preferably averaged across multiple FFT frames which introduces smoothing over time into the confidence measure.
As previously noted, a coherence value Cxy that is close to 1 indicates that the assumptions on which the noise and SNR estimate is based, namely that the one discernable spatial characteristic in the sound, is holding. A low coherence value indicates that the spatial characteristics cannot be discerned and as such the noise or SNR estimations are likely to be inaccurate.
Other embodiments of the present invention may use binaural sound receiving devices and provide binaural outputs. A bilateral cochlear implant is an example of such an arrangement. In such embodiments, a modified signal-to-noise ratio (SNR) estimator is used.
In this binaural implementation, in addition to the microphone arrays, system 600 also includes a two way communication link 612 between the left and right signal processing sub-systems 600A, 600B. In this example, for each microphone array, 601, 602, a front facing cardioid cf is generated as described above for the monaural implementation. However, instead of using a rear facing cardioid cb, a binaural “
In a similar manner to that described in relation to the monaural implementation, the output 614B derived from one of the microphones on the left side is filtered and subtracted from the other left side signal 614A. For example, input 614B is filtered using the LT filter 618 and the output 619 is subtracted from signal 614A derived from the left microphone 601A. The output of this subtraction is then converted to an energy value at 622 in the same manner as described in relation to the last embodiment, to generate Lcf. Similarly, a common “FIG. 8” output is generated to act as a binaural example of an assumptions based noise estimate. This is performed by subtracting the output 616A, derived from the right microphone 602A, from the output 614A of the left microphone 601A. This signal is converted to an energy value in blocks 624 to generate the “
Next, left and right SNR estimates can be generated as follows. The Lcf signal is divided by the “
This binaural signal-to-noise ratio estimation can be particularly effective because the binaural nature of the output signals is maintained. As with the monaural embodiment, a confidence measure for each noise estimate or SNR estimate can be generated using a correlation method similar to that described in relation to the monaural implementation.
As discussed in connection with
In one embodiment, individual noise or SNR estimates and their associated confidence measures can be combined in a variety of different ways, including, but not limited to: (1) selecting the noise or SNR estimate with the best associated confidence measure; (2) scaling each noise or SNR estimate by its normalized confidence measure (normalized such that the sum of all normalized confidence measures is one) and summing the scaled noise or SNR estimates to obtain a combined estimate; or (3) using the noise or SNR estimates from the estimation technique which produced the greatest (or smallest) noise or SNR estimate at a particular frequency. This selection process can be performed on a channel by channel basis, for groups of channels, or globally across all channels.
The resulting noise or SNR estimate 808 for each signal component, along with corresponding confidence measures 810, are output. The outputs 808 and 810 are then used in further processing stages of the sound processing device (e.g. by subsequent noise reducer 210 or by channel selector 212 in a cochlear implant).
In system 1000 of
The SNR estimate 1006 is used to calculate a gain between 0 and 1 for each frequency bin using a masking function in block 1014. In the simplest case, the gain function used is a binary mask. This mask applies a gain of 0 to each frequency bin having a SNR that is less than a threshold, while a gain of 1 is applied to each frequency bin where the SNR is greater than or equal to the threshold. This has the effect of applying no change to frequency bins with good SNR, while excluding from further processing frequency bins with poor SNR.
It will be appreciated that coherence can be calculated on a per channel basis, and the confidence scaling is also applied on a per channel basis. This allows one channel to have good confidence while another does not. In addition, the confidence measure can be time-averaged to control the responsiveness of the system.
The inventors have determined that improved system performance, in terms of speech perception of recipients, can be obtained in cochlear implants, by carefully selecting the gain curve parameters. As such, alternative masking functions are within the scope of the present invention. Previous mathematically defined gain functions have treated errors of including noise and errors of reducing speech as equal. More recent work with psychometrically motivated gain functions has demonstrated that a preference for a negative gain function threshold was chosen by normal listeners.
This observation was further supported by ideal binary mask studies, which suggest that best speech performance can be achieved with gain threshold between 0 and −12 dB.
One prior art approach is to use ideal binary mask (IdBM) which removes masker dominated and retains target dominated components from a noisy signal. Studies which have investigated the gain application threshold (GT) proposed the use of threshold values between −12 dB and 0 dB, or −20 dB and 5 dB in the special case when the SNR is known. Outside of this threshold range, speech perception is conventionally believed to degrade quickly. Generally, since 0 dB is at the edge of the range, a lower threshold of −6 dB has been proposed so as to allow the greatest room for error in SNR estimation in real-world systems. A subsequent IdBM study has used a GT of −6 dB with normal listeners and hearing impaired, showing that this significantly improves speech perception. The underlying premise of these noise reduction thresholds is that they remove half or less of the noise on average to produce maximal speech improvement. This has lead to the acceptance by those skilled in the art of a gain function for cochlear implant applications that has a threshold SNR value of less than OdB.
However it has been recognized by the inventors that this approach of using a binary mask with a negative GT for cochlear implant noise reduction assumes that the GT for normal listening and cochlear implant recipients is the same. Moreover, in practice the true SNR is not known, and therefore the IdBM cannot be calculated.
Experiments performed by the inventors, using an SNR estimate (as opposed to a known SNR) show improvements in speech perception of a noise reduction system using a binary mask with a GT much higher than previously expected. In this respect, the present inventors propose a positive SNR threshold. More specifically test results showed improvements in speech perception using a binary mask with a GT of above 0 dB and up to 15 dB.
The experimental results of the inventor's show a preference of cochlear implant recipients for a GT of above approximately OdB, and more preferably above approximately 1 dB and less than about approximately 5 dB for stationary white noise, and around 5 dB and for 20-talker babble.
Previous mathematically defined gain functions have treated errors of including noise and errors of reducing speech as equal. Accordingly, some prior art proposes that a Wiener Function (threshold=0 dB) is optimal. Such gain functions used in known cochlear implant noise reduction algorithms retain signals with positive SNR and apply different levels of attenuation to signals with negative SNRs. More recent prior art with psychometrically motivated gain functions has demonstrated that a preference for a negative gain function threshold was chosen by normal listeners.
A second study performed by the inventors also supported the inventor's view. Specifically, it was determined that the most suitable gain function for noise reduction, with respect to speech perception and quality factors for cochlear implant recipients, differ from the mathematically optimized gain functions, normal listening psychometrically motivated gain functions and proposed cochlear implant gain functions of the prior art.
In this study, a parametric Wiener gain function was used to describe the gain curve instead of the binary mask. The parametric Wiener gain function is described by
where Gw is the gain applied, ζ is the a priori SNR estimate and α and β are the parametric Wiener variables.
A range of threshold and slope values were selected by the recipient's as their most preferred gain threshold, showing a wide range of gain curve shapes. In continuous stationary white noise conditions, a gain threshold above approximately 0 and up to approximately 5 dB produced the best speech perception. Results in 20-talker babble showed that a gain threshold of approximately 5 dB produced the best speech perception. In the case where only one gain function threshold is selected for all noise conditions, these results suggest that a gain threshold of approximately 5 dB would be most suitable.
As will be appreciated, both the threshold value and slope value, play a part in the overall attenuation outcome. However, if a noise reduction method uses an estimate of the signal noise, such as Spectral Subtraction techniques or SNR-Based noise reduction techniques, the inventors have determined that improved performance can be obtained for cochlear implant recipients using a gain function that has any section which lies between a parametric Wiener gain function parameter values of α=0.12 and 62=20, and a parametric Wiener gain function parameter values of α=1 and β=20, over the range of instantaneous SNRs between the −5 and 20 dB instantaneous SNR range.
Because of the variations in preferred slope and threshold values between recipients, it is also useful to compare gain curves by considering an absolute threshold of the gain curve (as distinct to the Wiener gain function threshold “threshold value” set out above). The absolute threshold can be defined as the level at which the output of the system would be half the power of the input signal, which is the approximate−3 dB knee point.
In this regard, in the inventor's testing, it was found that the preferred absolute threshold of the gain curve for cochlear implant recipients should be at an instantaneous SNR of greater than approximately 3 dB, but less than approximately 10 dB. Most preferably it should be between approximately 5 dB and approximately 8 dB. Although the knee point could lie outside this range, say between approximately 5 dB and approximately 15 dB.
Gain curves 1606 and 1608 define the preferred gain curve region proposed in accordance with embodiments of the present invention. Specifically, curve 1606 defines the “low side” of the preferred region of the operation, while curve 1608 defines the “upper side” of the region.
Additionally, rather than the confidence measure directly scaling the gain curve as previously described, the gain of the signal can be scaled using confidence measure in the dB domain.
More generally the inventors have identified that recipients of electrical stimulation hearing prostheses, including, but not limited to cochlear implant recipients, can understand speech with a fraction of the speech content used to stimulate electrodes, but tend to deal poorly with background noise. This principle is applied in the described embodiments by “over” removing noise from input signals 203. Embodiments could be used in a spectral subtraction noise reduction system where over-subtraction could remove more of the noise (in preference to maximizing the retention of the speech signal). Similarly, embodiments can be used in a modulation detection system that uses strong attenuation when noise is detected. Furthermore, a histogram method or a domain subspace method could use this principle in an auditory stimulation device noise reduction method to ‘over’ remove noise.
In a more general approach, which is not necessarily constrained by using the SNR to estimate noise, as described in the embodiments above, the estimation error ε(W) between a noise reduced signal and an original clean signal is represented by the equation:
ε(ω)={circumflex over (X)}(ω)−X(ω),
where, X(ω) is the clean signal, and {circumflex over (X)}(ω) is the noise reduced signal. This equation is further described in Loizou 2007, Speech Enhancement—Theory and Practice.
The estimation error ε(ω) can be further divided into two components: εx (W) and εd(ω),as illustrated by the equation:
ε(ω)=εx(ω)+εd(ω),
where, εx(ω) represents the errors in signal components representing speech; and εd (ω) represents the error in components of the signal that represent noise.
The overall mean squared estimation error E[ε(ω)]2 can then be defined as the sum of its two components, namely the distortion of the speech, E [εx (ω)]2, and the distortion of the noise, E [εd (ω)]2, as illustrated by the equation:
E[ε(ω)]2=E[εx(ω)]2+E[εd(ω)]2.
This value can also be represented by the following equation:
dT(ω)=dx(ω)+dD(ω),
where, dT (ω), the total distortion, equals E[ε(ω)]2, dx(ω), the speech distortion, equals E [E (ω)]2; and d D (ω), the noise distortion, equals E[Ed(ω)]2.
A distortion ratio (DR(ω)) can then be defined as the speech distortion dx(ω) divided by the noise distortion d D (ω), as shown in the following equation:
This function describes the relative distortion components in a manner that is not affected by the absolute signal or noise levels. Advantageously, the distortion ratio defined herein can be determined for a sound processing system irrespective of the mechanism used by the system to reduce noise because the distortion ratio is dependent on the clean signal and the noise reduced signal output by the system.
By expressing the distortion ratio in terms of signal power, the speech distortion component, dx(ω), and noise distortion component dD(ω) can be described respectively as illustrated by the equations:
dx(ω)=PS(ω)(H(ω)−1)2
dD(ω)=PD(ω)H(ω)2
where, PS is the power of the signal, PD is the power of the noise, and H(ω) is the parametric Wiener function defined by:
where ζ is the a priori SNR estimate and α and β are the parametric Wiener variables.
In this case the distortion ratio DR(ω) can be described as:
which allows the distortion ratio to be represented as a function of the a priori SNR through the equation
Prior art systems using a generalized Wiener function (variable=2),
generate an output with a distortion ratio along line 1802.
For systems using Spectral Subtraction-based and SNR-based noise suppression methods embodiments of the present invention should generate output signals that have a distortion ratios that lies above that of the generalised Wiener function (variable=2) over most (and preferably all) SNRs over −5 dB. Curves 1804 and 1806 together define a region for SNRs between 5 and 15 dB in which embodiments of the present invention can advantageously operate. The inventors have found that systems having noise reduction characteristics that produce an output signal having a distortion ratio that lies above a curve 1804, defined by
and below a curve 1806 defined by
for at least some and possibly all, SNR values (ζ) between −5 and 15 dB, provide acceptable speech perception for cochlear implant recipients. Moreover, embodiments in which the noise reduction characteristic of the system produce an output signal having a distortion ratio that lies substantially on the curve 1808, defined by
for at least some, and preferably all, SNR values (0 between −5 and 15 dB, may perform particularly well.
Alternative embodiments can be implemented that use different noise suppression techniques. For example, embodiments may also perform noise reduction using one of the following methods: a modulation detection method that applies strong attenuation when noise is detected; a histogram method; a reverberation noise reduction method; a wavelet noise reduction method; a subspace noise reduction method, where the noise is generated by a separate source to the speech signal, or where the noise is an echo or reverberation of the speech signal, or the noise is a mixture of both.
More particularly embodiments of the invention implemented such that the system output has a distortion ratio that lies between the lines 1902 and 1904 on
and below a curve 1902 defined by the following equation:
for some, and preferably all, SNR values (0 between −5 and 15 dB, provide acceptable speech perception for CI recipients.
As noted above, the several embodiments described herein generate an output signals having a distortion ratio DR(ω) in the preferred regions described above, for signals having an SNR at some (and possibly all) values between −5 and 15 dB. However it is preferable that the distortion ratio DR(ω) of the output signals lies in the preferred regions for signals having an SNR some (and possibly all) values between 0 and 10 dB. In some embodiments, at higher SNR values (e.g. SNR greater than 10 dB) the received signal may be clean enough to use less aggressive noise reduction, and still retain acceptable speech perception.
While the distortion ratio defines the system behaviour in quantitative terms,
When noise is added, to the desired signal, the level (number) of stimulations may increase, and a noise suppression technique can be used to remove this unwanted noise, as described above.
The noise reduction schemes described herein can be performed on a signal representing the full bandwidth of the original sound signal or other input signal, or a portion of it, e.g. embodiments of the noise reduction scheme can be performed on a signal limited to one or more FFT bins, channels or arbitrarily selected frequency band in the input signal. Thus the noise reduced signal output by the scheme can similarly represent the full bandwidth of the input signal or a portion of it. In the event that the output signal represents only a portion of the input signal, that output signal can be combined with other processed or unprocessed portions of the original signal to generate a control signal to be applied to one or several electrodes of the auditory prosthesis. In one example, a subset of channels having a high psychoacoustic importance can be processed according to an embodiment of the present invention, whereas the remaining channels having a relatively lower psychoacoustic importance can be processed in a conventional manner. The signals for all channels can then be processed together to generate a control signal for controlling stimulation of the array of electrodes of the auditory prosthesis.
Further improvements in noise reduction may be provided by implementing a process for choosing an input signal on which noise reduction will be performed, as illustrated in block 225 of
The chosen input signal then has the determined gain applied, by the gain application stage 1014 to generate a noise reduced output 1016. The noise reduced output 1016 is then used for further processing in the sound processing system.
As discussed above, in connection with channel selector 212 of
Known channel selection algorithms used in cochlear implants typically only choose channels based on the signal energy in each frequency channel. However, the inventors have determined that this approach may be improved by using additional channel selection criteria. Accordingly, other embodiments of the present invention utilize a measure of a channel's psychoacoustic importance, possibly in combination with other channel parameters to select those channels are to be applied to the electrodes of the cochlear implant. For example, in specific embodiments, a very high frequency channel may be present in a signal and have a low SNR level. However, a high frequency signal will not contribute greatly to the speech understanding of a recipient. Therefore, if a suitable channel exists, it may be preferable to select a lower frequency channel having a lower SNR in place of the high frequency channel in order to achieve a more optimal outcome in terms of speech perception for the user.
In one illustrative example, 2 kHz is more important for speech understanding than a channel at 6 kHz. To address this issue, a Speech Importance Function, such as that described in the ANSI standard s3.5-1997 ‘Methods for Calculation of the Speech Intelligibility Index’ may be used. This speech importance function is illustrated in
It is also possible that while weighting the signal to noise estimates with the speech importance function the channels with large amplitudes may be still excluded if the speech importance weighted SNR is worse than other channels. Amplitude based criterion can also be incorporated into the channel selection algorithm. In order to do this, the relative level of each frequency channel can be calculated in block 1109 by dividing signal energy in each band by the total energy in the signal. The speech importance weighted SNR 1110 is then multiplied by the normalized signal value at each frequency and the channels are sorted in block 1112 to select channels for application to the electrodes of the cochlear implant. As noted above, the channel selection may be part of an n of m selection strategy, as shown in block 1106 of the system 1100, or another strategy not limited to always selecting n of m channels. It should also be appreciated that an approach which simply scales amplitude by signal-to-noise ratio may also be used in channel selection.
The channel selection strategy can be a so-called n of m strategy, in which each stimulation time period up to a maximum of n channels are selected from a total of m available channels. In this case, even if there are more than n channels which have potentially useful signals, only n will be selected. Alternatively, a channel selection strategy may be employed where all channels that meet certain criteria will be selected.
In addition to selecting channels based on factors such as SNR, amplitude and speech importance, the spectral spread of information may also be used in channel selection. In this regard, where adjacent channels both meet the criteria for selection, it may be that the application of both of these channels would provide no additional information to a recipient due to masking effects. In such cases, one or the other of the channels may be dropped from the stimulation scheme, and one or more other channels picked up as substitutes. The selection of the other substitute channel(s) may be based on the criteria described above, but additionally include spectral considerations to avoid masking by adjacent channels. Such an approach may be similar to the MP3000 stimulation strategy used by Cochlear Limited. This method determines where a channel will be effectively masked by a neighboring channel. In this case, the least important of the two channels will be masked and no upstream stimulation performed. Extending this idea, it is also possible that, where a large number of channels containing beneficial information are present, to temporally spread the stimulation by splitting the stimulation of some electrodes into one temporal group and the stimulation of other electrodes into a second temporal group. For example, if all 22 channels have positive signal-to-noise ratio, but only 8 channels are able to be stimulated every frame, then rather than discarding 14 potentially useful signals, the channels can be split into a number of groups and each group stimulated in successive frames. For example, the 8 largest “odd channels” may be placed in one group, and the 8 largest “even channels” may be placed in another group and each group can then be stimulated in successive frames.
Process 1700 begins at step 1702, by receiving a sound signal at a microphone. The output from each microphone is then used in step 1704 to generate a signal representing the received sound. This is performed in a manner similar to that described in
In the next step 1706, the frequency bins are combined into a predetermined number of signals or channels for further processing. In certain embodiments, there are 22 channels that correspond to the 22 electrodes in a cochlear implant.
In step 1708, a noise estimate for each channel is created using a minimum statistics-based approach in a manner described in connection with the above in connection with
where all of the terms in the formula have the meanings defined above.
In the next step 1712, for each channel, the SNR estimate is multiplied by the relative speech importance of the central frequency of the channel, and then the normalized amplitude of the signal in the channel, to generate an overall channel importance value. The relative speech importance of the central frequency of the channel may be derived using the speech importance function described in
In the next step 1714, up to n channels having the highest channel importance value are selected from the m channels. In certain embodiments, n=8 and m=22. The chosen channels are further processed in the cochlear implant to generate stimuli for application to the recipient via the electrodes.
As will be appreciated, the present exemplary process can obtain benefits of at least one aspect of the present invention, but would not require the complexity of the system able to implement all sub-blocks of the functional block diagram of
Process 1800 begins at step 1802 by receiving a sound at a beam forming array of omnidirectional microphones, of the type illustrated in
In step 1810, the directional noise estimate is converted to a SNR ratio estimate, also as described in connection with
In step 1814, at each frequency, a confidence measure is generated for each of the SNR estimates determined in steps 1810 and 1812. At each frequency, the SNR estimate having the highest associated confidence value is selected in step 1816 as the final SNR estimate for the channel. Next, in step 1818, the selected SNR value is used to determine the gain to be applied to a channel using a binary mask having a threshold at 0db.
In step 1820, the effect of the gain value determined in step 1818 is varied to account for the confidence level of the SNR estimate on which it is based. This is performed by scaling the gain level associated SNR estimate by its associated confidence measure to determine a modified gain value to apply to the signal. The gain is applied to the signal in step 1822 to generate a noise reduced output signal for further processing by the hearing aid.
Again, it can be seen from this example that advantages of certain aspects of the present invention can be obtained without implementing each of the functional blocks of
In alternative embodiments of the present invention, noise estimator 250 shown in
It should be appreciated that the noise and SNR estimation techniques described herein are performed on spectrally limited channels. As noted earlier, similar noise and SNR estimation techniques may be used on a range of different spectrally limited signals. For example, noise and SNR estimation by be performed on an FFT bin basis, on a channel-by-channel basis on some predetermined or arbitrarily selected frequency band in the input signal, or on the entire signal.
In embodiments in which noise or SNR estimation and noise estimation is performed on a single FFT bin basis, a noise or SNR estimate for a corresponding channel could be calculated from some or all of the FFT bins that contribute to that channel. For example, each of the noise or SNR estimations for the contributing FFT bins to each channel could be combined either by: averaging, by selecting a maximum, or through any other form of combination to derive the noise or SNR estimation for the channel.
It is also possible that the noise or SNR estimation may be performed on signals having a spectral bandwidth that differs from that of the signal itself. For example, double the number of FFT bins may be used to estimate the noise level SNR for a channel, e.g. by using surrounding FFT bins as well as contributing FFT bins.
Similarly, a noise or SNR estimation for the channel may be derived from only one contributing component. A variation on this scheme allows noise or SNR estimation from one spectral band to be used to influence a estimate of another spectral band. For example, neighboring bands' estimates can be used to moderate or otherwise alter the noise or SNR estimate of a target frequency band. For example, extreme, or otherwise anomalous SNR estimates may be adjusted or replaced by noise or SNR estimates derived from other, typically adjacent, frequency bands.
As can be seen from the foregoing, a system as described herein, using multiple signal-to-noise ratio estimates, has the freedom to select which signal-to-noise ratio estimates to use, for a given frequency bin, channel or frequency band, and/or how multiple SNR estimates can be combined. Moreover, the system can be set up to additionally enable a selection of the type of SNR estimates are available in different listening environments. For example, rather than always using a directional signal-to-noise ratio estimate and a minimum statistics derived signal-to-noise ratio estimate other noise estimation techniques could be used, including but not limited to: maximum noise estimation; minimum noise estimation; average noise estimation; environment specific noise estimation; noise level specific noise estimation; patient input noise estimation; and confidence measure based noise estimation.
For example, in a user selected mode for “driving” a noise specific noise estimate (tuned to estimate road noise) and a minimum statistics noise estimation can be used. In this case a directional measure of noise cancelling may be inappropriate as it may mask important sounds such as sirens of emergency vehicles approaching from behind. On the other hand, a “conversation” specific noise estimation is likely to benefit from the inclusion of a directional SNR estimate.
It will be understood that the invention disclosed and defined in this specification extends to all alternative combinations of two or more of the individual features mentioned or evident from the text or drawings. All of these different combinations constitute various alternative aspects of the invention.
The invention described and claimed herein is not to be limited in scope by the specific preferred embodiments herein disclosed, since these embodiments are intended as illustrations, and not limitations, of several aspects of the invention. Any equivalent embodiments are intended to be within the scope of this invention. Indeed, various modifications of the invention in addition to those shown and described herein will become apparent to those skilled in the art from the foregoing description. Such modifications are also intended to fall within the scope of the appended claims. All documents, patents, journal articles and other materials cited in the present application are hereby incorporated by reference.
This application is a Continuation of U.S. application Ser. No. 17/405,328, filed on Aug. 18, 2021, which is a Continuation of U.S. application Ser. No. 16/566,054, filed on Sep. 10, 2019, now U.S. Pat. No. 11,127,412, which is Continuation of U.S. application Ser. No. 13/287,112, filed on Nov. 1, 2011, now U.S. Pat. No. 10,418,047, which is a Continuation-in-part of U.S. application Ser. No. 13/047,325, filed on Mar. 14, 2011, now U.S. Pat. No. 9,589,580, the contents of which are hereby incorporated by reference herein in their entirety.
Number | Date | Country | |
---|---|---|---|
Parent | 17405328 | Aug 2021 | US |
Child | 18451399 | US | |
Parent | 16566054 | Sep 2019 | US |
Child | 17405328 | US | |
Parent | 13287112 | Nov 2011 | US |
Child | 16566054 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13047325 | Mar 2011 | US |
Child | 13287112 | US |