1. Field of the Invention
The present invention relates to a system for speech enhancement in a room, comprising a microphone for capturing audio signals from a speaker's voice, an audio signal processing unit for processing the captured audio signals and a loudspeaker arrangement located in the room for generating sound according to the processed audio signal.
2. Description of Related Art
Speech enhancement systems of the initially mentioned type are used for amplifying the speaker's voice in order to enhance intelligibility of the speech by the listeners. U.S. Pat. No. 7,822,212 relates to such a speech enhancement system, wherein the shape of the frequency response curve applied to the audio signals in the audio signal processing unit is selected as a function of the ambient noise level in the room as estimated by the system. At higher ambient noise level frequency response curves, the lower frequency cutoff level is increased.
Often HiFi systems include a function labeled “loudness” or “contour”, which changes the frequency response as a function of the sound level in order to take into account that the frequency response of the hearing depends on the loudness level. In the case of U.S. Pat. No. 7,822,212, the frequency response of the gain function is determined so as to compensate for the removal of the lower frequency ranges by increasing the gain in the remaining frequency gain bandwidth and can be compensated according to human hearing perception.
It is an object of the invention to provide a speech enhancement system which allows speech intelligibility to be optimized. It is a further object to provide for a corresponding speech enhancement method.
According to the invention, these objects are achieved by a speech enhancement method and a speech enhancement system as described below.
The invention is beneficial in that, by selecting the frequency response curve applied by the audio signal processing unit according to the estimated overall gain and the acoustic parameters of the room and the loudspeaker arrangement located in the room, speech intelligibility can be increased; in particular, the frequency response curve may be selected in such a manner that the free field frequency response of the speaker's voice is approximated as close as possible at a listener's position in the room.
These and further objects, features and advantages of the present invention will become apparent from the following description when taken in connection with the accompanying drawings which, for purposes of illustration only, show several embodiments in accordance with the present invention.
In the audio signal processing unit 20, the audio signals captured by the microphone 12 undergo pre-amplification and frequency filtering prior to being amplified by the power amplifier 22. The system acts to increase the level of the voice of the speaker 14 at the position of the listeners 26 by amplifying the voice captured by the microphone 12. The goal of such a system is to enhance speech intelligibility at the position of the listeners 26. Typical speech enhancement systems of the prior art are designed to linearly amplify the voice of the speaker 14. Such an approach does not take into account that (1) the frequency response of an acoustic source in a room is modified by its power response and by the acoustic adsorption of the room; and that (2), depending on the gain of the system, the mixing ratio of the direct voice and the voice as amplified by the system is different. These two phenomena have a negative impact on the speech intelligibility.
When a person (speaker) is speaking in the direction of another person (listener) in free field, the sound travels directly from the mouth of the speaker (source) to the listener's ear (listening point) without any modification. In the absence of noise, the speech transmission index (STI) is maximal under such conditions which are characterized by the absence of reverberation and by a frequency response which is not affected by the directivity of the source.
For the following discussion, the free field frequency response is considered to be flat from 100 Hz to 10 kHz and is considered as a normalized reference, see
When such a source is placed into a reverberant room, the frequency response of the total reverberant field looks like the power response of the source, because the energy radiated in all directions is acoustically summed due to the reflections at the walls.
In addition, the adsorption coefficient in a typical room depends on frequency and usually is higher at high frequencies than at low frequencies. A typical measure for the adsorption coefficient of a room is the RT60, which is the time needed for the reverberant field to decrease by 60 dB after excitation by an impulse noise. In
In a standard classroom, most of the students are placed at a position in the reverberant field, where the level of the sum of the reverberation signals is higher than the level of the direct voice of the teacher (i.e., the critical distance is shorter than the distance from the source to the listening point). Due to the directivity of the human mouth, this phenomenon is accentuated when the teacher is not speaking into the direction of the students. As can be seen in
When the speech enhancement system uses standard loudspeakers having a flat frequency response at 0° and having a directivity coefficient which increases with increasing frequency exactly like a human mouth, the result of the speech amplification provided by the system would be only a level shift of almost the same curve, which often would not result in an actual increase in speech intelligibility, since the level of the disturbing late reflections at low frequencies also would increase, see
However, speech intelligibility could be significantly enhanced by amplifying only that part of the signal, which is missing or weak in the reverberant field at the listening point. Hence, by selecting the appropriate frequency response curve applied to the audio signals in the audio signal processing unit 20 as a function of the total gain provided by the speech enhancement system, the free field frequency response (i.e. a flat curve in the normalized representation) may be approximated. This goal can be achieved by selecting the frequency response curve in such a manner that the amplified sound mixes with the direct sound in such a manner that the total level approaches the flat reference curve of the free field frequency response.
In
If the total gain of the system is less than 1, it is not possible to approximate the free field frequency response, since, then, the “loss” at higher frequencies in the reverberant field cannot be fully compensated.
If the gain of the system is increased beyond 1, the loudspeaker arrangement 24 radiates more acoustic power than the speaker's mouth, so that, if the frequency response curve of
In order to achieve the desired approximation of the free field frequency response, it is necessary to select the shape of the frequency response curve applied in the audio signal processing unit 20 as a function of the total gain of the system. With increasing total gain, the level of the low frequencies relative to the level of the higher frequencies has to be progressively increased in order to compensate for the relative lack in low frequency level in the sound radiated by the speaker's mouth compared to the amplified sound, see
In
In
As an optional feature, the system may include a compensation with regard to the level dependence of the equal loudness contours (also called Fletcher-Munson-curves). This is shown in
The various threshold values of the total gain of the system thus define a plurality of operation modes:
(1) a first mode, wherein the gain does not significantly exceed a value of 1 and wherein a fixed first frequency response curve is selected, which has a shape so as to selectively increase the level at higher frequencies so as to approximate the free field frequency response of the speaker's voice by mixing sound reproduced by the loudspeaker arrangement with the reverberant sound field of the speaker's voice;
(2) a second mode, wherein the gain is between the first threshold and a second threshold which corresponds to the gain at which the sound from the loudspeaker arrangement is expected to partially mask the sound from the speaker (i.e., the gain at which the reverberant field of the sound from the loudspeaker arrangement is expected to partially mask the reverberant field of the sound from the speaker), and wherein a variable frequency response curve is selected which has a shape so as to progressively increase the level at lower frequencies with increasing overall gain relative to the level at higher frequencies in order to approximate the free field frequency response of the speaker's voice by mixing the sound reproduced by the loudspeaker arrangement with the reverberant sound field of the speaker;
(3) a third mode wherein the gain is between the second threshold and a third threshold corresponding to the gain at which the level of the sound reproduced by the loudspeaker arrangement at a listener's position in the room is expected to completely mask the level of the speaker's voice at the speaker's mouth, wherein a fixed second frequency response curve is selected having a shape so as to approximate, by the sound reproduced only by the loudspeaker arrangement, the free field frequency response of the speaker's voice;
(4) a fourth mode wherein the gain is above the third threshold and wherein a variable frequency response curve is selected having a shape so as to decrease the level at lower frequencies with increasing overall gain relative to the level at higher frequencies in order to compensate for the level dependence of the contours of equal loudness according to the difference between the level of the sound reproduced by the loudspeaker arrangement at the listener's position in the room and the level of the speaker's voice at the speaker's mouth.
The shape of the selected frequency response curve is determined according to the estimated overall gain and according to the acoustic parameters of the room and the loudspeaker arrangement. Preferably, the overall gain is estimated from the adjustment position of the gain control element and the acoustic parameters of the room and the loudspeaker arrangement. The acoustic parameters of the room may be predefined as that of a typical room in which the loudspeaker arrangement is to be used, or they may be determined in situ in a calibration mode of the system prior to starting speech enhancement operation. In such calibration mode a test signal may be supplied from the audio signal processing unit to the loudspeaker arrangement and the resulting test sound is captured by the microphone as test audio signals. The frequency response of the diffuse field and/or the RT60 may be estimated from the test audio signals. The acoustic parameters of the loudspeaker arrangement may be factory-programmed.
The level of the reverberant field of the speaker's voice may be estimated from the signal level of the audio signals captured by the microphone. The level of the reverberant field of the sound reproduced by the loudspeaker arrangement may be estimated from the levels of the processed audio signals at the input of the power amplifier.
A block diagram of a first embodiment of a speech enhancement system according to the invention is shown in
The gain control element 32 may be manually adjustable by the user of the system. Alternatively, it may be realized as an automatic gain control unit 132 (shown in dotted lines) which optimizes the gain of the system according to the presently prevailing use conditions (for example, as a function of the voice level and the ambient noise level) and supplies a corresponding gain adjustment signal to the gain control unit 30.
An alternative embodiment of a speech enhancement system is shown in
In
In
In this case, the speaker's microphone 12 can be used as the measurement microphone, since it can be easily placed in the listening area of the room 10.
While various embodiments in accordance with the present invention have been shown and described, it is understood that the invention is not limited thereto, and is susceptible to numerous changes and modifications as known to those skilled in the art. Therefore, this invention is not limited to the details shown and described herein, and includes all such changes and modifications as encompassed by the scope of the appended claims.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP09/64145 | 10/27/2009 | WO | 00 | 4/30/2012 |