The present invention relates to processing a sound signal in order to adjust characteristics of the sound signal to meet respective target levels. Such sound signal processing is of application in hearing aid sound signal processing, telecommunications sound signal processing, and the like.
Processing of sound for audio applications usually requires the sound signal to be amplified or adjusted to fall within a target dynamic range across the audio band, generally considered to be 20 Hz-20 kHz or a sub-range thereof. The target dynamic range is typically determined by the next stage of processing or, where the signal is to be optimised for a listener, by the intensity range that is both audible and comfortable at each frequency to a human listener.
In the case of a sound transmission system such as a telephone or a sound recording system, the target dynamic range will be the operating dynamic range of the transmission line or storage medium. Some or all of the frequency components of the sound signal being processed may fall outside of the target dynamic range, or there may be a mismatch between the dynamic range of the signal and the target dynamic range.
In the case of a human listener, there is usually an additional requirement that the target dynamic range must be matched or set in a controlled manner across the audible frequency band to produce matched loudness or a prescribed frequency response for the system as a whole. Such a prescribed frequency response generally aims to maximise the intelligibility of speech sounds without compromising the comfort of the listener or the quality of the sound. For musical sounds the target dynamic range and/or frequency response may be chosen to achieve a particular tone or balance of high and low pitch sounds, according to the preference of the listener.
Further, for each human listener, the target dynamic range may vary considerably across frequency, and may be narrow in extent between a minimum audible threshold and a maximum comfort threshold, particularly for a listener with impaired hearing. Similarly, the useable or optimal dynamic range for a listener with normal hearing may vary considerably across frequency and may be narrow in extent when there is ambient noise that masks the lower part of the listener's dynamic range.
A simple approach to address such problems is the use of a linear amplifier/attenuator designed to maximise the overlap of the dynamic range of the sound signal with the target dynamic range. A further refinement is to provide a sound processor that provides differing amounts of gain at different frequencies to optimise the match of the output signal to the target dynamic range in each frequency band. A subsequent processing stage may truncate the output dynamic range, for example at an upper end by saturation of an input mechanism, and at a lower end by thresholding or resolution limitations. However, where the signal is for delivery to a human listener, a lack of any truncation of the upper end of the output dynamic range may result in discomfort, trauma, or damage to the auditory system. For these and other reasons, a maximum power output level or other type of limiting mechanism is usually applied to the output of a linear sound processing system.
A more complex solution to the problems described above involves use of a compression scheme. Compression usually applies more gain to softer sounds and less gain to louder sounds such that the output dynamic range is less than or “compressed” relative to the input dynamic range. Thus, compression is a non-linear signal processing scheme. The ratio of the input dynamic range to the output dynamic range is known as the compression ratio. Compression parameters are often described in terms of a fixed input/output function at each frequency, as illustrated by the input/output functions 110, 120, 130 of
The compression ratio is the inverse of the slope of the input/output function. As shown in
Among the most sophisticated signal processing techniques addressing such issues is the adaptive dynamic range optimisation (ADRO) technique set out in U.S. Pat. No. 6,731,767, the content of which is incorporated herein by reference. Rather than to focus on prescriptive gain or gain compression profiles, the approach adopted by the ADRO technique is to define a target dynamic range which is desired for the output sound signal, and to adjust a gain applied to an input signal in order to maintain a close match of the actual output dynamic range to the target dynamic range. The output level of the ADRO signal processor is thus constrained by a set of processing rules defined by fixed parameters. While the processing rules are satisfied, the signal processor operates as a linear amplifier. Should the processing rules not be satisfied, the gain applied by the processor is adaptively altered until the processing rules are satisfied.
For each frequency band, the ADRO signal processor determines the accuracy of the match of the output dynamic range to the target dynamic range, by taking a statistical measure of percentile estimators. A 30th percentile estimator provides a measurement of a level below which the output signal remains for 30% of the measurement period. Where the signal is being processed for a human listener, the lower end of the target dynamic range is predefined by determining an audibility threshold of the listener. Should the 30th percentile estimator be below the audibility threshold, the gain is increased slowly. A 90th percentile estimator provides a measurement of a level below which the output signal remains for 90% of the measurement period. Again, where the signal is being processed for a human listener, the upper end of the target dynamic range is predefined by determining a boundary comfort level of the listener. Should the 90th percentile estimator be above the boundary comfort level, the gain is decreased slowly. The 30th and 90th percentile estimators are thus of use in determining how well the output dynamic range matches the target dynamic range.
Two further rules are imposed in each frequency band when ADRO is applied for a human listener. The maximum output rule compares the magnitude of the output signal with a fixed maximum output limit. If the magnitude of the output signal is greater than the fixed maximum output limit, the magnitude is capped to the maximum output limit. The maximum gain rule compares the gain with a fixed maximum gain limit, and prevents the gain from exceeding the fixed maximum gain limit.
The ADRO processing scheme has been shown to provide improved audibility of soft sounds, improved intelligibility of speech both in quiet and in noise, and increased comfort and listener preferences relative to linear amplification and compression schemes. The ADRO processing scheme adapts the gain of the amplifier independently in each frequency band to provide optimum listening conditions based on the fixed parameters.
Any discussion of documents, acts, materials, devices, articles or the like which has been included in the present specification is solely for the purpose of providing a context for the present invention. It is not to be taken as an admission that any or all of these matters form part of the prior art base or were common general knowledge in the field relevant to the present invention as it existed before the priority date of each claim of this application.
Throughout this specification the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps.
According to a first aspect, the present invention provides a method of processing at least one input sound signal to meet a target dynamic range, the method comprising:
applying at least one input sound signal-specific gain to the at least one input sound signal to produce a processed sound signal;
measuring a dynamic range of the processed sound signal;
determining a match of the measured dynamic range with the target dynamic range; and
adjusting each input sound signal-specific gain in accordance with at least one input sound signal-specific parameter to improve the match of dynamic range of the processed sound signal to the target dynamic range, wherein the at least one input sound signal-specific parameter is adaptive in response to at least one monitored signal condition.
According to a second aspect, the present invention provides a device for processing at least one input sound signal to meet a target dynamic range, the device comprising:
a gain stage for applying at least one input sound signal-specific gain to the at least one input sound signal to produce a processed sound signal;
an analyser for measuring a dynamic range of the processed sound signal and for determining a match of the measured dynamic range with the target dynamic range; and
a gain controller for adjusting each input sound signal-specific gain in accordance with at least one input sound signal-specific parameter to improve the match of dynamic range of the processed sound signal to the target dynamic range, wherein the at least one input sound signal-specific parameter is adaptive in response to at least one monitored signal condition.
According to a third aspect, the present invention provides a computer program for processing at least one input sound signal to meet a target dynamic range, the computer program comprising:
code for applying at least one input sound signal-specific gain to the at least one input sound signal to produce a processed sound signal;
code for measuring a dynamic range of the processed sound signal;
code for determining a match of the measured dynamic range with the target dynamic range; and
code for adjusting each input sound signal-specific gain in accordance with at least one input sound signal-specific parameter to improve the match of dynamic range of the processed sound signal to the target dynamic range, wherein the at least one input sound signal-specific parameter is adaptive in response to at least one monitored signal condition.
According to a fourth aspect, the present invention provides a computer program element comprising computer program code means to make a computer execute a procedure for processing at least one input sound signal to meet a target dynamic range, the computer program element comprising:
computer program code means for applying at least one input sound signal-specific gain to the at least one input sound signal to produce a processed sound signal;
computer program code means for measuring a dynamic range of the processed sound signal;
computer program code means for determining a match of the measured dynamic range with the target dynamic range; and
computer program code means for adjusting each input sound signal-specific gain in accordance with at least one input sound signal-specific parameter to improve the match of dynamic range of the processed sound signal to the target dynamic range, wherein the at least one input sound signal-specific parameter is adaptive in response to at least one monitored signal condition.
The at least one input sound signal may comprise a single sound signal such as a sound signal obtained from a microphone or a sound signal obtained from a transmission medium. Alternatively, the input sound signal may comprise a transformation of a single sound signal.
Alternatively, the input sound signal may comprise a portion of a sound signal and/or may comprise a transformation of a portion of a sound signal. In such embodiments, a plurality of input sound signals may be processed in accordance with the present invention, each input sound signal corresponding to a unique portion of a single sound signal.
The at least one input sound signal may comprise a portion of a sound signal obtained by frequency domain filtering, such that the at least one input sound signal comprises only those frequency components of the sound signal falling within a constrained frequency band. A plurality of such input sound signals, having a one-to-one correspondence with a plurality of frequency bands, may be processed in accordance with the present invention.
Additionally or alternatively the at least one input sound signal may comprise a portion of a sound signal obtained by a frequency transform approximation, such as a sine wave basis function transform. Additionally or alternatively the at least one input sound signal may comprise a portion of a sound signal obtained by time domain processing. Additionally or alternatively the at least one input sound signal may comprise a portion of a sound signal obtained by use of wavelet functions.
One input sound signal-specific gain may be applied to the or each input sound signal. Alternatively, a plurality of input sound signal-specific gains may be applied to the or each input sound signal.
In some embodiments of the invention, the monitored signal condition may comprise a measurement of a mismatch between the measured dynamic range and the target dynamic range. In such embodiments, the at least one input sound signal-specific parameter preferably comprises a gain slew rate of the gain adjustment, and such embodiments may further comprise controlling the gain slew rate to be larger when the mismatch is larger, and controlling the gain slew rate to be smaller when the mismatch is smaller. Such embodiments may be of use in providing a speedy settle time of the input sound signal-specific gain in response to a mismatch between the output dynamic range and the target dynamic range, even where the mismatch is large. Such embodiments may thus provide both for speedy suppression of overly loud audio signals such as an alarm, and for more measured gain refinements in the absence of a large mismatch.
In embodiments where the at least one input sound signal-specific parameter comprises gain slew rate, the gain slew rate for an increase in gain may be controlled to be different to the slew rate for a decrease in gain. For example, the gain slew rate for a reduction in gain may be permitted to be large, while the gain slew rate for an increase in gain may be limited to a moderate gain slew rate. Such embodiments may provide for swift suppression of audio shock signals such as facsimile tones or alarms, while providing for restrained gain increases, for example to avoid overly hasty gain increases during quiet signal periods.
In some embodiments of the invention, the at least one monitored signal condition may comprise an ambient noise signal condition. The ambient noise signal condition may be monitored from the same signal as is to be processed by the sound processor. Additionally or alternatively, the ambient noise signal condition may be monitored from at least one other signal, obtained from at least one microphone in the environment of a listener of the processed sound signal. In such embodiments, the at least one input sound signal-specific parameter may comprise one, and preferably comprises both, of a target audibility level and a target comfort level.
In some embodiments of the invention, the monitored signal condition may comprise monitoring for the presence of audio shock, in order to detect facsimile tones, alarms, loud speech and/or other types of audio shock. In such embodiments, the at least one input sound signal-specific parameter may comprise gain slew rate, wherein a large gain reduction slew rate is imposed for gain reduction in response to detection of presence of an audio shock. In such embodiments, the at least one input sound signal-specific parameter may additionally comprise a maximum output limit, wherein the maximum output limit is reduced in response to detection of presence of an audio shock.
In further embodiments of the invention, the gain may be prevented from increasing during periods in which no signal-of-interest is present. Such embodiments of the invention preferably further comprise monitoring an input signal in order to determine periods in which a signal-of-interest is present, and periods in which no signal-of-interest is present.
In embodiments of the invention in which the at least one monitored signal condition comprises ambient noise, the target dynamic range of the at least one input sound signal may be adaptive in response to the ambient noise. In such embodiments, a lower end of the target dynamic range may be increased in response to an increase in ambient noise level, in order to maintain the target dynamic range above the ambient noise level. Additionally or alternatively, in such embodiments an upper end of the target dynamic range may be increased in response to an increase in ambient noise, by an amount corresponding to an increase in listener's comfort level with ambient noise. Such embodiments are advantageous in providing a signal processing scheme whereby the target dynamic range is adaptive to allow for changes in ambient noise level. Further, such embodiments recognise that a listener's comfort level is often higher in the presence of greater ambient noise than in the presence of lesser ambient noise, and thus adapt the target dynamic range accordingly.
Further, in embodiments of the invention in which the at least one monitored signal condition comprises ambient noise, the target dynamic range of at least one high frequency band is preferably raised more than the target dynamic range of at least one low frequency band. Such embodiments recognise that low frequency noise impacts the intelligibility of high frequency components of the signal, that telephone speakers typically have greater high frequency capabilities, that speech typically shifts towards higher frequencies with increasing volume, and recognises the high frequency character of Hoth noise.
In embodiments of the invention, one or more of the following parameters relating to one or more input sound signals may be adaptive in response to the at least one monitored signal condition: maximum output limit(s), comfort target(s), audibility target(s), background noise target(s), maximum gain(s), minimum gain(s), increasing gain slew rate(s), decreasing gain slew rate(s), increasing percentile estimate slew rate(s) and decreasing percentile estimate slew rate(s).
In some embodiments of the invention, a plurality of input sound signals may be processed. In such embodiments, the at least one input sound signal-specific parameter of a first input sound signal may differ from the at least one input sound signal-specific parameter of a second input sound signal. For example where the present invention is implemented in a telephone system with a send signal and receive signal, the input sound signal-specific parameters of the receive signal may be controlled in response to ambient noise in the send signal. Where the present invention is implemented in a stereo listening device or a pair of hearing aids, the at least one input sound signal-specific parameter may be controlled in response to monitored conditions of two signals.
In preferred embodiments of the invention, a plurality of frequency bands of the sound signal are each processed in accordance with the method of the present invention. In such embodiments, the sound signal is preferably earlier divided by a filter bank into a plurality of frequency bands for separate processing. Alternatively, the present invention may be applied in a single frequency band of the sound signal, for example in embodiments where the sound signal is processed as a single band signal, or in embodiments where only one of a plurality of bands of the signal is desired to be processed in accordance with the present invention. For example, a frequency band encompassing facsimile tone frequencies may be a sole band in which the processing of the present invention is applied in a multi-band processing scheme.
Embodiments of the present invention may be applied in conjunction with the ADRO technique set out in U.S. Pat. No. 6,731,767. However, embodiments of the present invention may be applied in conjunction with any sound processing technique in which a signal is processed to be matched to a parameter-defined target dynamic range.
It is to be appreciated that the phrase “sound signal” is used herein to refer to any signal conveying or storing sound information, and includes an electrical, optical, electromagnetic or digitally encoded signal.
Examples of the invention will now be described with reference to the accompanying drawings in which:
a and 3b are schematics of a sound processing scheme in a duplex system in which a monitored signal condition of a line out influences signal processing parameters of the line in, in accordance with a fifth embodiment of the present invention;
a and 6b are graphs of the magnitude of a difference between an upper end and a lower end of a target dynamic range for varying signal to noise ratio (SNR), in bands centred at 250 Hz and 1 kHz respectively, suitable for use as look-up tables to determine signal activity in the first to fourth embodiments of
a illustrates variation of target dynamic range parameters with varying ambient noise in accordance with the fifth embodiment of
b illustrates the variation of sound level resulting from the parameter variation of
c illustrates variation of the signal to noise ratio resulting from the parameter variation of
d illustrates the improved intelligibility resulting from the parameter variation of
e illustrates variation of perceived loudness of the output signal in the presence of ambient noise resulting from the parameter variation of
a is a spectrogram of a sound signal processed by ADRO with fixed gain slew rate in which an alarm commences and then halts;
b is a spectrogram of a sound signal processed by ADRO with adaptive gain slew rate in which an alarm commences and then halts;
c is a plot of gain vs. time for a particular frequency band containing alarm frequency components for both the fixed gain slew rate of
The adaptive parameter processor 286 takes inputs both from the line input 280 and from a secondary source such as an ambient noise microphone 290. For example, in a duplex system the ambient noise microphone signal 292 may be from a headset or handset voice microphone used to obtain a voice signal from the listener. Alternatively the ambient noise microphone signal 292 may be from another microphone measuring the acoustic environment in the proximity of the listener. The adaptive parameters processor 286 can also use statistics such as output percentile estimates 294 from the ADRO processor 282.
In the fourth embodiment of
An adaptive parameter processor 310 monitors a signal condition of microphone signal 312, and influences signal processing parameters applied by ADRO processor 350 to the line in signal 352. Adaptive parameter processor 310 comprises a signal activity detector 314 which monitors microphone signal 312 to determine whether a signal of interest is present on microphone signal 312, or whether ambient noise is the only signal present on microphone signal 312.
Adaptive parameter processor 310 further comprises an environmental noise estimator-316. Should signal activity detector 314 indicate that a signal of interest is present on microphone signal 312, operation of environmental noise estimator 316 may be paused to ensure that ambient noise measurements are not corrupted by non-noise signals. Environmental noise estimator 316 monitors microphone signal 312 in order to determine properties of the environmental noise in a listener's vicinity. Such properties can include estimates of ambient or environmental noise level, noise dynamic range, noise modulation or other properties useful for adapting the ADRO target levels. Such properties may be determined for the noise signal as a whole, or for noise signal sub-components determined by a frequency or transform domain filter-bank (not shown).
Adaptive parameter processor 310 further comprises an adaptive targets processor 318, which adapts dynamic range target parameters such as comfort and audibility targets in each band in response to an estimate of the environmental noise produced by environmental noise estimator 316. An output error estimator 320 determines an output error by measuring a mismatch between the output dynamic range defined by percentile estimates obtained by percentile estimator 360 and a target dynamic range defined by adaptive targets which are controlled by adaptive targets processor 318.
An adaptive rate processor 322 controls slew rates of variable gain controller 358, in particular a gain slew rate and a percentile estimate slew rate. As discussed inn more detail with reference to
A filter signal activity detector 324 monitors the output of filter bank analyser 354 and, with reference to the current percentile estimates obtained by percentile estimator 360, assesses whether a signal-of-interest is present, or whether only noise is present. Such an assessment may then be used to influence the adaptive gain and/or the adaptive gain slew rate. For example, during a period in which filter signal activity detector 324 determines that no signal-of-interest is present, adaptive rate processor 322 may prevent any increase in gain. Such control may prevent processor gain increasing during a pause in input signal, only for the gain to have become excessive by the time the input signal resumes.
An activity estimator 580 uses an output of a 50th percentile estimator 570 and the modulation estimate 560 to provide a signal activity level 590. The modulation criterion is based on a well defined relationship between speech to noise ratio (SNR) and modulation levels as measured by the percentile estimate difference. For example,
The output 590 of a signal activity detector 500, when implemented as signal activity detector 314 in the sound processing apparatus 300 of
The calibration and weighting filters 420 initially applied to the microphone signal 410 measured for environmental noise estimation purposes is used to alter the spectral content of the signal to make it more suitable for processing, and/or to compensate and calibrate for microphone properties. For example, a filter with an ‘A’ weighting response can be used to emphasise those frequencies in the signal that are perceived more loudly by normal hearing listeners. In another example, a differentiator filter (y[n]=x[n]−x[n−1]) can be used to produce a high pass response, and thereby remove low frequency noise and transients commonly present in the microphone signal but not relevant for the noise estimate performed by environmental noise estimator 400/316.
The output of environmental noise estimator 316 is used by the adaptive target processor 318 to adapt the ADRO target dynamic range, particularly by varying the comfort target, audibility target and maximum output limit parameters. The primary objectives of such parameter variations are to maintain the intelligibility, audibility and comfort of the received signal, despite significant noise level variation. This parameter variation can be based on a simple linear or non-linear relationship between noise and target levels, or a more complex relationship that takes into account priorities for comfort, audibility and/or intelligibility depending on application and/or personal preference.
An increase in MOL and comfort level will usually be acceptable due to the increased ability of the listener to handle loud noise in the presence of substantial ambient noise. An increase in the audibility target parameter maintains the target dynamic range above the ambient noise, even as the ambient noise increases. Above a second threshold, further increases in the target dynamic range parameters are not permitted even with further increases in ambient noise, to prevent hearing damage to the listener by the output sound signal.
b illustrates the variation of sound level resulting from the parameter variation of
The variation of parameters can be made in common across all ADRO processing channels, or made independently so that the parameter adaptation is customised to each frequency band or filter-bank channels. Further, parameter adaptation can be customised to each frequency band in response to band-specific properties such as a noise level or noise property estimate in the frequency band or filter-bank sub-band. Such band-specific parameter adaptation may cause the target dynamic range variation in one ADRO band to respond to the noise properties at common masking frequencies more than to the noise at other frequencies.
Such a variation with frequency recognises the upward spread of masking, a phenomenon in psychoacoustics which suggests that the threshold for audibility of hearing at one frequency is conditioned by the presence of interfering noise components at lower frequencies more than at higher frequencies. That is, a noise component will tend to mask signals occurring at and above the masking frequency, more than at and below the masking frequency.
The frequency variable adaptation of parameters shown in
The frequency variable adaptation of parameters shown in
Still further, the frequency variable adaptation of parameters shown in
Again, in each frequency band a difference between the comfort level target 940 and the audibility target 930 (high ambient noise), is less than a difference between the comfort level target 920 and the audibility level target 910 (low ambient noise). In general, the response to higher levels of ambient noise can be improved by reducing the target dynamic range and/or compressing the input dynamic range. Additional audibility can be obtained without exceeding comfort limits by compressing the signal or raising the audibility target towards an upper limit such as the comfort target. This is particularly useful at higher frequencies (above 2 kHz), where output levels are closer to comfort or maximum output limits, and where significant information for speech intelligibility in noise is still contained.
When the audibility target is raised towards the comfort target, the ADRO processor will more often increase gain in soft periods of the signal where the comfort target is not active, such that in these periods there is improved audibility over the background noise. With this arrangement, the signal dynamic range is minimally compressed or distorted over the short term, but has improved audibility without violating comfort targets over the longer term.
Alternatively or additionally, the dynamic range of the signal can be directly compressed before application of the ADRO processor rules, based on for example the proximity of the lower end of the signal dynamic range to the middle or upper end of the noise dynamic range. This allows the time constant and ratio of any compression effect to be set independently of the ADRO processor rules, but causes higher distortion due to the increased rates of gain change commonly used with compression systems. This process is therefore most useful only when the audibility of the signal is more important than sound quality of the signal, as can be the case when the ambient noise level is particularly high.
Further, it is noted that adaptive rate processor 322 provides for adaptive control over gain slew rate in response to a monitored signal condition. The present invention recognizes that existing implementations of ADRO adapt gain levels at a constant slew rate, typically 3 dB per second. This rate is constant under all circumstances, regardless of the magnitude of change or rate of change in the input audio conditions. While this helps to ensure less distortion and ‘pumping’ of the gain levels in response to small input changes typical in speech, the present invention recognises that such a low constant slew rate causes a slower response than may be required when the changes in input signal are more significant. Thus, adaptive rate processor 322 may be configured to provide for a more rapid gain slew rate in response to sudden large input signal changes. For example, adaptive rate processor 322 may be configured to provide for a more rapid gain slew rate at hearing aid turn-on, such as the initial response to a particularly quiet or loud audio environment Further, adaptive rate processor 322 may be configured to provide for a more rapid gain slew rate during or after testing, in which an extended high output may otherwise result after maximised gain due to very quiet initial ambient testing levels. Further, adaptive rate processor 322 may be configured to provide for a more rapid gain slew rate to suppress a source of acoustic startle such as an alarm or facsimile tone.
In implementing the adaptive rate processor 322, the present invention applies the following design principles: avoiding unnecessary increases in slew rate, particularly during normal conditions in speech or music; avoiding slew rates so high that they virtually remove or unduly reduce a cue for a change in levels; and avoiding slew rates faster than a ‘slow time constant’ rate (e.g. up to 20 dB/sec) for reasons of sound quality and numerical stability.
In one embodiment, the adaptive rate processor may comprise a non-linear function or look-up table using a measure of ‘distance’ of the current output dynamic range to the target dynamic range, to determine an adjustment to the slew rate. The non-linear function is very small or 0 for relatively small distances (eg during speech or music), but becomes larger when conditions have changed more strongly and the distance is more significant. Some examples of modelled analytical functions for gain slew rate that include such a non-linear term are given below:
In these equations, k is an index identifying the signal or part of a signal being controlled, |f(k)| is the ‘distance’ metric, and K, q and M are constants that shape and position the non-linear response. A minimum gain slew rate of 3 dB/sec is assumed in these particular equations, however the gain slew rate could be made slower than 3 dB/s. A slower slew rate when the distance is very small could improve sound quality slightly by ensuring the gain is more stable when the input signal level is at an equilibrium.
Sample ‘distance’ metrics which may be applied to determine a magnitude of a mismatch between an output signal dynamic range and a target dynamic range include a measure of difference between the dynamic range targets and the percentile estimates, as these are the pre-existing parameters in the system. For example:
f(Pk,Tk)=TComfort,k−P90,k
f(Pk,Tk)=TAudibility,k−P30,k
f(Pk,Tk)=(TComfort,k+TAudibility,k)−(R90,k+P30,k)
f(Pk,Tk)=(TComfort,k+TAudibility,k)/2−P50,k
f(Pk,Tk)=(2·TComfort,k−25)−(P90,k+P30,k)
where Pk and Tk are the sets of percentile estimates and targets for the kth signal or part of a signal, respectively.
The temporal behaviour of these distance metrics is dictated by the relative step rates of the percentile estimate. Hence the first two metrics tend to cause asymmetric responses, resulting in faster slew rates to change gain in one direction (up or down) than the other. The third and fourth metrics average these results to produce a more symmetric slew rate response. The final metric above replaces the audibility target with a ‘comfortable speech 30th percentile target’: Tcomfort-25, to provide a balanced response, with less bias when at equilibrium.
Notably, following commencement of the alarm, the fixed slew rate gain plot 1320 decreases only at the allowed fixed rate of 3 dB/s. Consequently, during the period 6.5 seconds to about 20 seconds, the fixed slew rate system of
Thus, in a sound environment of a listener with normal hearing listening to a telephone signal transmitted by a telephone line or a mobile telephone, the present invention recognizes that the audibility of the signal will depend on the masking effect of the ambient noise in the listener's noise environment. Accordingly, in the present embodiment the ambient noise of the listener's noise environment is monitored and used as a monitored signal condition for adaptively varying at least one band-specific parameter. The masking effects of such ambient noise may depend on the type of noise and on the level of the noise. One, some or all of the ADRO parameters (including maximum output limit, comfort target, audibility target, maximum gain and gain slew rate, for each band) may be made adaptable to maintain the signal at an audible level relative to the listener's ambient noise conditions, while still maintaining the comfort of the listener. As such parameters are varied over time in response to the ambient noise, the normal adaptive function of ADRO will simultaneously compensate for changes in the input signal to fit the output dynamic range to the target dynamic range, in accordance with the rules specified by the adaptive band-specific parameter(s).
The detection of acoustic shock or startle frequency components in the presence of speech can be based on a number of criteria, including:
1. Signal level. Only those components with sufficiently high level are usually candidates for causing acoustic shock or startle symptoms. Acoustic shock or startle components often have a higher narrowband level than speech at the same frequency.
2. Modulation or dynamic range. Shock signals characteristically have lower modulation properties than that of speech, and this difference can be used to further discriminate between speech and non-speech components. Refer to the plots of average estimate range vs. signal to noise ratio in
3. Spectral shape and peaks. The presence of narrowband shock or startle signals of sufficiently high level typically causes components of the frequency spectrum of the input signal to have one or several well defined peaks, with higher relative energy to the other components of the spectrum than is usually found in normal speech.
4. Rate of signal level change (attack or onset time). Acoustic shock or startle signals often commence very rapidly, causing a sudden increase in level of frequency components that is not typical of speech. This difference in onset time can be useful in making an early or initial determination regarding the presence of an acoustic shock signal, when other criteria such as the short term modulation are not yet indicative.
Once acoustic shock or startle components are determined to be present in the line input signal, the startle detector 1452 passes frequency location and other information to the shock or startle signal suppressor 1454. This suppression system controls the adaptation of ADRO parameters for the purposes of removal or attenuation of the shock or startle components by the ADRO processor. The suppression can be achieved by an adaptive ADRO slew rate adapter 1456, an adaptive ADRO target adapter 1458 and an adaptive ADRO state information adapter 1460 according to:
With this arrangement the ADRO processor rapidly attenuates the acoustic shock signals to the level of the comfort target, and guarantees no increase of level at the acoustic shock frequencies for a sufficient time span to avoid the potential of shocks at similar frequencies in the near future.
Similarly to the sound processing apparatus 300 of
It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.