The present invention relates to sound processing devices in which an acoustic sound input or an electric or digital representation of an acoustic sound input is processed and converted to an acoustic or electric sound output, and in particular relates to the processing of sound in noisy environments to improve speech intelligibility, sound quality, and naturalness of the sound. Sound processing devices of this kind are often used in hearing aids, assistive listening devices (ALD), and consumer audio devices such as radios, television sets, CD players, MP3 players, stereo systems, headsets, telephones, and mobile phone handsets. The Global Medical Device Nomenclature Agency (GMDNS) definition of an ALD is an amplifying device, other than a hearing aid, for use by a hard of hearing person. In the case of an electric sound output, sound processing devices of this kind are used in cochlear implants.
Sound processing devices, including hearing aids, ALDs, cochlear implants, and consumer audio devices are being used more frequently in noisy environments. Normally, people make good use of both ears to separate the sounds they want to listen to from the other noises in the environment that they want to ignore. Present day consumer audio devices, hearing aids, cochlear implants, and ALDs also rely on these internal binaural perceptual processes to be able to function adequately in noisy environments. In addition to the internal perceptual processing, many audio devices include various external noise reduction schemes aimed at improving speech intelligibility, sound quality, and listening comfort in noisy environments. These noise reduction schemes typically use information that is available from a single microphone, or an array of closely-spaced microphones that may be worn on one side of the head. They rely on directional information, and spectral and temporal information to separate desired sounds from other noises in the environment. For example, some schemes seek to improve signal-to-noise ratios by expanding the intensity differences between more intense parts of the sound and less intense parts of the sound. A noise reduction scheme based on spectral information may apply more gain to the peaks in the spectrum than to the troughs. A noise reduction scheme based on temporal information may apply more gain at times when the sound is above a certain intensity threshold than when the sound is below this threshold. A noise reduction scheme based on directional information may apply more gain to sounds from the front of the listener than sounds from other directions. There is clear evidence that directional microphones can improve sound quality, comfort, and intelligibility. It is also clear that spectral and temporal noise reduction improves comfort, but the effects of spectral and temporal noise reduction on intelligibility and sound quality are more controversial.
One potential reason for the uncertainty about the effects of external spectral and temporal noise reduction schemes on intelligibility and sound quality is that they are changing the spectral and temporal cues that are used by the internal perceptual processes. If these cues are changed differently in the left and right ears, they may also disrupt the internal binaural processes that most listeners rely upon most heavily in noisy situations. There are at least three important perceptual processes that are important in binaural sound perception:
a) Integration of information from both ears. This includes integration of information about both the desired sounds and the other noises in the environment.
b) The ability to separate sounds from different sources and to pay attention to the sounds from one ear or the other when it is advantageous to do so.
c) The ability to use small timing and intensity differences between the ears.
Bregman (1990) uses the term “auditory streaming” to describe the perceptual process that separates sounds from different sources and groups together sounds from the same source. A stream is a series of sequential and overlapping sound events that come from the same source. An example of a stream is the speech from a single person speaking. A word or a sentence spoken by this person must be perceived as a connected series of sound events to be understood, while being kept separate from the other sounds in the environment. Important sound events include the onsets and offsets of sounds, and changes in intensity and spectrum. The spectral and temporal noise reduction schemes referred to above introduce onsets and offsets, changes in intensity, and spectral changes in the noise that are correlated with the onsets and offsets and spectral changes in the desired signals. The perceptual effects of introducing these artificial streaming cues are difficult to predict. On one hand, they may emphasize the temporal and spectral characteristics of the desired sounds. On the other hand they will make it more difficult for the internal auditory streaming processes to separate the desired sound events and streams from the noise events and streams. If the artificial streaming cues created by the external noise reduction are different in the two ears, they will add further to the confusion between what is the desired sound stream and what is the noise stream. It is therefore important that the external noise reduction processing operates in a coordinated and consistent manner in the two ears.
In addition to avoiding the creation of artificial events or streaming cues, and avoiding the creation of artificial differences between the ears, an effective binaural sound processing strategy also needs to be able to “pay attention to only one ear when it is advantageous to do so”. This corresponds to point (b) above. In order to emulate this aspect of the internal binaural processing, a binaural sound processing device needs to continuously assess the signal-to-noise ratio in each ear, select the ear with the higher signal-to-noise ratio, and allow that ear to control the noise reduction processing for both ears. This is the essence of the present invention.
Any discussion of documents, acts, materials, devices, articles or the like which has been included in the present specification is solely for the purpose of providing a context for the present invention. It is not to be taken as an admission that any or all of these matters form part of the prior art base or were common knowledge in the field relevant to the present invention as it existed before the priority date of each claim of this application.
Throughout this specification the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps.
According to a first aspect the present invention provides a method for controlling a sound processing device with a binaural input and binaural output, where “binaural input” means at least one microphone mounted in or near each ear of the device user, and “binaural output” means at least one output signal directed to each ear. The method comprises:
transduction of the sound at each ear by the at least one microphone in or near the ear;
estimation of the signal-to-noise ratio present at each ear;
selection of the ear with the greater signal-to-noise ratio;
control of identical noise reduction processing based on the spectral and temporal information present in the signal at the selected ear;
amplification of the processed signals at each ear; and
presentation of the appropriately processed signals to each ear.
According to a second aspect the present invention provides a sound processing device with a binaural input and binaural output, where “binaural input” means at least one microphone mounted in or near each ear of the device user, and “binaural output” means at least one output signal directed to each ear. The device may be comprised of two parts connected by a wired or wireless link. The device comprises:
at least one microphone in or near each ear for the transduction of the sound at each ear;
a signal-to-noise estimation module to estimate the signal-to-noise ratio present at each ear;
a comparison and selection module to compare the signal-to-noise ratios present at the two ears and select the ear with the greater signal-to-noise ratio;
a noise reduction control module that uses the spectral and temporal information from the selected ear signal to control two identical noise reduction modules;
two identical noise reduction modules that process the signals from the two ears, under the control of the control module; and
two output modules that amplify the output signals from the noise reduction modules appropriately for each ear and present the amplified signals as sound or other signals to each ear of the device user.
According to a third aspect the present invention provides a computer program product comprising computer program code means to make a computer execute a binaural noise reduction sound processing procedure, the computer program product comprising:
computer program means accepting at least one input signal representing sound from each ear of a listener;
computer program means for estimating the signal-to-noise ratio present at each ear;
computer program means for comparing the signal-to-noise ratios present at the two ears and selecting the ear with the greater signal-to-noise ratio;
computer program means for using the spectral and temporal information from the selected ear signal to control two identical noise reduction processes;
computer program means for reducing noise in the signals from the two ears in an identical manner, under the control of the aforesaid computer program noise reduction control means; and
computer program means for amplifying the output signals from the noise reduction means appropriately for each ear and presenting the amplified signals as sound or other signals to each ear of the device user.
The amplifier for each ear preferably comprises a conventional wide dynamic range compression (WDRC) or adaptive dynamic range optimization (ADRO) sound amplifier. See Dillon 2001 for a review of the WDRC prior art. See Blamey et al, U.S. Pat. No. 6,731,767 for a description of the ADRO sound processing. The variable gain in each channel of the amplifier may also be controlled according to the information derived from the ear with the greater signal-to-noise ratio, and the overall gain of the ear with the lower signal-to-noise ratio may be reduced relative to the overall gain of the ear with the higher signal-to-noise ratio.
In one embodiment of the invention, the noise reduction scheme is a multichannel expansion scheme or spectral subtraction scheme which temporarily reduces the gain applied to frequency bands that are thought to be primarily noise, and increases the gain in frequency bands that are thought to be primarily signal. The choice between whether a frequency band contains primarily noise or signal is preferably based on instantaneous amplitude and dynamic range of the sound in that frequency band in the selected ear. The reduction or increase in gain is applied equally and simultaneously to the signal for both ears. The control signals derived from the selected ear signal can be particularly simple in this case, for example, a 32-channel noise reduction scheme can be controlled by sending 32 bits to encode whether each channel is primarily signal (bit value=1) or noise (bit value=0).
In a second embodiment of the invention, the gains or gain reductions for each frequency channel are transmitted from the selected ear to the unselected ear and applied simultaneously to the signal for each ear.
In a third embodiment of the invention, the amplitude and dynamic range (or signal-to-noise ratio) for each frequency band are transmitted from the selected ear to the unselected ear and applied in identical noise reduction algorithms in both ears simultaneously.
The changes to the gains in individual frequency channels of the noise reduction processing or in the individual frequency channels of the amplifiers are preferably made slowly enough and over a time scale that is long enough to avoid the generation of artificial sound events and streaming cues. Any faster changes that may be necessary to avoid discomfort or damage to hearing are preferably applied across a broad frequency range and are also applied identically and simultaneously to each ear.
The operation of controls on the device, such as a volume control and program selection switch are preferably linked so that any change initiated by the control is applied to both ears simultaneously in a coordinated manner.
The sound processing in the two signal paths for the two ears is preferably configured to have minimum delay (Dickson and Steele, 2006) and to have equal delay from input to output to preserve fine temporal differences between the ears to the maximum extent possible.
The wired or wireless communication link between the two devices is preferably disabled when the signal-to-noise-ratio in each ear is greater than a configurable threshold value, and enabled when the signal-to-noise-ratio is below the configurable threshold. The purpose of this refinement is to save power when binaural noise reduction is not required or would not provide any discernable improvement to sound quality or speech intelligibility.
UP STEP. If the intensity value is less than the percentile value, the percentile value is decremented by a small fixed amount, the DOWN STEP. If the ratio of the UP STEP to the DOWN step is 9:1 then the percentile value will tend towards an intensity value at the upper end of the intensity range that is exceeded 10% of the time (9 smaller DOWN STEPS will be balanced by one larger UP STEP). Similarly, if the ratio of the UP STEP to the DOWN STEP is 3:7 then the percentile value will tend towards an intensity value at the lower end of the range that is exceeded 70% of the time (3 larger DOWN STEPS will be balanced by 7 smaller UP STEPS). Other percentages may be selected for the high and low percentile estimators provided that the ratio of the UP STEP to the DOWN STEP is greater for the high percentile estimator than the low percentile estimator. Assuming that the peaks at the upper end of the intensity range are a measure of the signal level and the valleys at the lower end of the intensity range are a measure of the noise level, then the difference 205 between the high percentile value and the low percentile value provides a measure related to the signal to noise ratio (SNR). The difference value is smoothed by module 206 to reduce random variations in the SNR estimates.
Many alternative noise reduction algorithms may be adapted for use in the binaural noise reduction scheme the subject of this invention. In one embodiment of the invention, the noise reduction scheme is a multichannel scheme which temporarily reduces the gain applied to frequency channels that are thought to be primarily noise, and increases the gain in frequency channels that are thought to be primarily signal. The choice between whether a frequency channel contains primarily noise or signal is preferably based on instantaneous amplitude 402 and signal-to-noise ratios 403 of the sound in that frequency channel in the selected ear. In a preferred embodiment of this type, the 30th and 90th percentiles of the amplitude are calculated in each frequency channel. If the amplitude is below the 30th percentile, or the 90th percentile is less than 2 dB above the 30th percentile, the frequency channel is judged to contain mostly noise, otherwise the frequency channel is judged to contain primarily signal. The reduction in gain for channels that are primarily noise and increase in gain for frequency channels that are mostly signal are applied equally and simultaneously to the signal for both ears. The control signals derived from the selected ear signal can be particularly simple in this case, for example, a 32-channel noise reduction scheme can be controlled by sending 32 bits to encode whether each channel is primarily signal (bit value=1) or noise (bit value=0). Preferably, a maximum cumulative gain reduction and a maximum cumulative gain increase are applied in each frequency channel.
In a second embodiment of the invention, the gains or gain reductions for each frequency channel are calculated in the selected ear in the same manner as for a conventional monaural noise reduction scheme and transmitted from the selected ear to the unselected ear and applied simultaneously to the signal for each ear.
In a third embodiment of the invention, the amplitude and dynamic range (or signal-to-noise ratio) for each frequency band are transmitted from the selected ear to the unselected ear and applied in identical noise reduction algorithms in both ears simultaneously.
The advantages of these embodiments of the present invention comprise: more accurate assessment of signal and noise levels in the unselected ear by utilizing information from the ear with the better SNR; avoidance of the creation of artificial streaming events that could disrupt the normal binaural processing of sounds; emphasize the signal relative to the noise in such a manner as to improve the signal-to-noise ratio in the unselected ear; minimizing the data transmission requirements and hence minimizing the additional power consumption of the devices; intelligently switching data transmission from one ear to the other to halve power consumption relative to a device that always transmits data in both directions; and intelligently switching off data transmission when binaural noise reduction is not required to reduce battery consumption.
Some portions of this detailed description are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent series of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
As such, it will be understood that such acts and operations, which are at times referred to as being computer-executed, include the manipulation by the processing unit of the computer of electrical signals representing data in a structured form. This manipulation transforms the data or maintains it at locations in the memory system of the computer, which reconfigures or otherwise alters the operation of the computer in a manner well understood by those skilled in the art. The data structures where data are maintained are physical locations of the memory that have particular properties defined by the format of the data. However, while the invention is described in the foregoing context, it is not meant to be limiting as those of skill in the art will appreciate that various of the acts and operations described may also be implemented in hardware.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the description, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.
Dillon, H., Hearing aids, Boomerang Press, 2001
U.S. Pat. No. 6,731,767; Adaptive Dynamic Range of Optimization Sound Processor; Blamey P J, James C J, Wildi K, McDermott H J, Martin L F A.
Bregman, A. S. “Auditory Scene Analysis: The Perceptual Organization of Sound, ” MIT Press, Cambridge Mass. 1990.
PCT/AU2006/001778; Method and Device for Low Delay Sound Processing; Dickson B, Steele B R (2006).
Number | Date | Country | Kind |
---|---|---|---|
2008904473 | Aug 2008 | AU | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/AU2009/001104 | 8/27/2009 | WO | 00 | 10/18/2011 |