1. Field of the Invention
The present invention relates to a system for speech enhancement in a room comprising a microphone for capturing audio signals from a speaker's voice, an audio signal processing unit for processing the captured audio signals and a loudspeaker arrangement located in the room for generating amplified sound according to the processed audio signals.
By using such a system, the speaker's voice can be amplified in order to increase speech intelligibility for persons present in the room, such as the listeners in an audience or pupils/students in a classroom. However, increased amplification does not necessarily result in increased speech intelligibility.
2. Description of Related Art
U.S. Pat. No. 7,333,618 B2 relates to a speech enhancement system comprising, in addition to the speaker's microphone, a second microphone placed in the audience for capturing both the sound generated by the loudspeakers and ambient noise, a variable amplifier and an ambient noise compensation circuit. The output signal of the variable amplifier is compared to the ambient noise level derived from the signals captures by the second microphone, and the gain applied to the signals from the speaker's microphone is adjusted according to the level of the ambient noise.
European Patent Application EP 1 691 574 A2 relates to an FM (frequency modulation) transmission system for a hearing aid, wherein the gain applied to the audio signals captured by the microphone of the FM transmission unit is adjusted in the FM receiver according to the ambient noise level and the voice activity as detected by analyzing the audio signals captured by the microphone. The gain is automatically increased when as it is detected that the speaker is speaking; the gain is also adjusted as a function of ambient noise level.
It is an object of the invention to provide for a speech enhancement system, whereby speech intelligibility is increased in an efficient manner. It is also an object to provide for a corresponding method of speech enhancement.
According to the invention, these objects are achieved by a speech enhancement method and speech enhancement system as described herein.
The invention is beneficial in that, by determining the gain to be applied to the audio signals captured by the microphone according to a comparison between an estimated ambient noise level and an estimated reverberation level of the sound generated by the loudspeaker arrangement, the signal to noise ratio (SNR) can be optimized at an any time, without applying an unnecessary high gain, thereby increasing speech intelligibility in an efficient manner.
Preferably, the reverberation level is a late reverberation level corresponding to the level of the components of the sound generated by the loudspeaker arrangement having reverberation times above a reverberation time threshold, which threshold is selected such that the late reverberation sound components are perceivable as a hearing sensation separate from perception of the respective non-delayed sound. For example, the reverberation threshold time may be about 50 ms
These and further objects, features and advantages of the present invention will become apparent from the following description when taken in connection with the accompanying drawings which, for purposes of illustration only, show several embodiments in accordance with the present invention.
The purpose of a speech enhancement system in a room is to increase the intelligibility of the speaker's voice. In general, speech intelligibility is affected by the noise level in the room (ambient noise level) and the reverberation of the useful sound, i.e., the speaker's voice, in the room. At least part of the reverberation acts to deteriorate speech intelligibility. The total reverberation signal may be split into an early reverberation signal (corresponding to reverberation times of e.g. not more than 50 ms) and a late reverberation signal (corresponding reverberation times of more than 50 ms). The early reverberation signal is integrated with the direct sound by the human hearing, i.e., it is not perceivable as a separate signal, and therefore does not deteriorate speech intelligibility. The late reverberation signal is not integrated with the direct sound by the human hearing, it is perceivable as a separate signal, and therefore has to be considered as part of the noise.
Hence, the acoustic field in a room may be separated into three parts: (1) the useful signal, i.e., the direct field of the speaker's voice and the respective early reverberation signal; (2) the late reverberation signal, e.g. the reverberation signal of the speaker's voice corresponding reverberation times of more than 50 ms; (3) the ambient noise, i.e., the noise from all other sources. By “speaker's voice,” here, the speaker's voice as reproduced by the loudspeaker arrangement 24 is meant.
When the gain applied in the audio signal processing unit 20 is increased, both the level of the “useful signal” and the level of the “late reverberation signal” will increase, whereas the level of the “ambient noise” is independent of the speaker's voice level and hence will not increase when the gain is increased. However, of course, the ambient noise level may vary in time when, for example, some of the listeners 26 start talking, etc.
As shown in
However, since the level of the late reverberation signal increases in parallel with the level of the useful signal, a further increase in gain will not result in a corresponding increase in SNR once the ambient noise is masked by the late reverberation signal. It can be assumed that such masking of the ambient noise occurs when the level of the late reverberation signals is at least about 3 dB higher than the level of the ambient noise. This situation is shown in
In order to optimize the gain (and hence the SNR), it is beneficial to estimate both the actual level of a reverberation signal, which is preferably the late reverberation signal discussed above, and the actual level of the ambient noise.
The threshold of the reverberation time from which on the sound components form part of the (late) reverberation level preferably is selected such that the late reverberation sound components are perceivable as a hearing sensation separate from the perception of the respective non-delayed sound. The threshold in practice corresponds to that reverberation time at which a sound component starts to create a hearing sensation perceived separately from that of the respective non-delayed signal. Typically, the threshold may be set at around 50 ms.
Whereas the ambient noise level is estimated from the audio signals captured by the microphone 12, the (late) reverberation level may be estimated either from the level of the processed audio signals, namely the level of the audio signals at the input of the power amplifier 22, (closed loop configuration) or from the level of the audio signals supplied to audio signal processing unit 20, i.e., from the level of the audio signals prior to being processed (open loop configuration).
Typically, gain changes slowly, with time constants on the order of about 5 s.
In
The voice activity detector 32 analyzes the audio signals captured by the microphone 12 and determines whether the speaker 14 is presently speaking or not and outputs a corresponding VAD status signal. The ambient noise level estimator 34 is active only when the VAD signal supplied from the voice activity detector 32 indicates that the speaker 14 presently is not speaking. The ambient noise level estimator 34, when active, derives from the audio signals captured by the microphone 12, an ambient noise compensation (SNC) signal, which is indicative of the present ambient noise level.
The audio signals captured by the microphone 12, the VAD signal and the SNC signal are supplied to the transmitter 36 for being transmitted via a radio frequency (RF) link, such as an FM link, to an RF receiver 18, which supplies the received signals to the audio signal processing unit 20 which comprises a feedback canceller 38, a SNR optimizer 40, a late reverberation level estimation unit 42 and an automatic gain control unit 44. The audio signals received by the receiver 18 are supplied via the feedback canceller 38 to the automatic gain control unit 44, in order to be transformed into processed audio signals which are supplied as input to the power amplifier 22 which drives the loudspeaker arrangement 24. The late reverberation level estimation unit 42 uses the level of the processed audio signal supplied by the automatic gain control unit 44 to the power amplifier 22 for estimating the late reverberation level by taking into account acoustic room parameters.
In the embodiment of
The feedback canceller 38 analyses the audio signals received by the receiver 18 in order to determine whether there is a critical feedback level caused by feedback of sound from the loudspeaker arrangement 24 to the microphone 12 (Larsen effect), As a result the feedback canceller 38 outputs a status signal indicating the presence or absence of critical feedback, which status signal is supplied to the SNR optimizer 40, together with a signal indicative of the late reverberation level estimated by the unit 42 and the SNC and VAD signals received by the receiver 18. Based on the information provided by these input signals, the SNR optimizer 40 outputs a control signal acting on the automatic gain control unit 44 for controlling the gain, in order to optimize the SNR, as will be illustrated by reference to
During times when the VAD signal indicates that the speaker 14 is not speaking, the ambient noise estimator 34 determines the ambient noise level (SNC-signal) from the audio signals presently captured by the microphone 12. This situation is shown in
During times when the VAD signal indicates that the speaker 14 is speaking, the gain is increased to the ambient noise level expected to be masked by the late reverberation level. For example, the gain may be increased until the late reverberation level is about 3 dB above the ambient noise level, see
When the ambient noise level estimator 34 determines that the ambient noise level has changed, the gain will be adjusted by the SNR optimizer 40, with a certain time constant, to the presently estimated ambient noise level. In other words, when the ambient noise level is found to decrease, the gain is decreased accordingly, and when the ambient noise level is found to increase, the gain is increased accordingly, see
However, for high ambient noise levels it might be necessary to increase the gain to a value at which the system starts to have feedback problems. Once such condition is determined by the feedback canceller 38, a further increase of the gain will be stopped by the SNR optimizer. Under such conditions, the ambient noise level may become higher than the late reverberation level, so that the SNR then will be lower than at lower ambient noise levels, see
While
In
In the calibration mode, the unit 142 generates a test signal which is supplied via the power amplifier 22 to the loudspeaker arrangement 24 for reproducing a corresponding test sound which is captured by the microphone 12 as test audio signals from which the SNC signal, which corresponds to the level of the test sound, is derived by the ambient noise level estimator 34, with the SNC signal being supplied to the unit 142. The unit 142 analyzes the SNC signal corresponding to the test signal level, and a ratio of the level of the signal at the input of the power amplifier 22 and the test audio signal level determined by the unit 142 is calculated and stored in a memory 146 connected to the unit 142.
In other words, in the calibration mode, a test signal having a known level is generated via the loudspeaker arrangement 24, the test signal is captured by the microphone 12, and the correction factor to be applied to the level of the processed audio signals at the input of the power amplifier 22 in order to estimate the late reverberation level is determined from the level of the test audio signals captured by the microphone 12. In the speech enhancement mode of the system, the correction factor us retrieved from the memory 146.
The system of
In
In other words, the correction factor to be applied to the level of the processed audio signals at the input of the power amplifier 22 is determined from the level of the late reverberation components of the test audio signals as captured by the microphone 12. To this end, a ratio of the audio signal level at the input of the power amplifier 22 (i.e., the level of the processed test audio signals) and the late reverberation level of the test audio signals as measured by the unit 142 is calculated and stored in the memory 146. In the speech enhancement mode, the value stored in the memory 146 then is used to estimate the late reverberation level from the audio signal level at the input of the power amplifier 22.
Although the system of
In
Although the system of
In all embodiments, the transmission unit 16 may be compatible with hearing aids having a wireless audio interface, such as hearing aids having an FM receiver unit connected via an audio shoe to the hearing aid or hearing aids having an integrated FM receiver.
While various embodiments in accordance with the present invention have been shown and described, it is understood that the invention is not limited thereto, and is susceptible to numerous changes and modifications as known to those skilled in the art. Therefore, this invention is not limited to the details shown and described herein, and includes all such changes and modifications as encompassed by the scope of the appended claims.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2009/064142 | 10/27/2009 | WO | 00 | 4/30/2012 |