The present invention relates to quality evaluation and in particular to a System and method for evaluating the quality of multi-channel audio signals.
Since listening-adapted digital coding methods have been standardized, they have been used to an increasing extent. Examples for such cases of use are the digital compact cassette, the minidisk, digital terrestrial radio broadcasting and the digital video disk. When coding is effected by means of listening-adapted coding methods, artificial products or artifacts may, however, occur, which did not occur in analog audio signal processing.
For judging or evaluating a specific encoder, listening test with test persons were carried out in the past. Although the average result provided by such listening tests is comparatively reliable, there is still a subjective component. Furthermore, listening tests executed with a certain number of test persons are comparatively complicated and therefore comparatively expensive. Hence, measurement methods have been developed for a listening-adapted evaluation of audio signals.
Such a measurement method is described e.g. in DE 196 47 399 C1. The method of listening-adapted quality evaluation described in this publication models all non-linear hearing effects onto a reference signal as well as onto a test signal. The listening-adapted quality evaluation is carried out by means of a comparison in the cochlear domain. In so doing, the excitations caused in the ear by the test signal and by the reference signal are compared. For this purpose, both the audio reference signal and the audio test signal are divided into their spectral components by a filter bank. By means of a large number of filters whose frequencies overlap, a sufficient resolution with respect to time as well as frequency is guaranteed. Hence, a mono audio test signal, which is derived from an audio reference signal by coding and subsequent decoding, can be evaluated with regard to its quality.
The measurement method described in DE 196 47 399 D1 also permits an evaluation of the quality of stereo signals, i.e. two-channel signals. For this purpose, non-linear preprocessing is carried out with the left and with the right channel of the audio test signal and of the audio reference signal; this preprocessing emphasizes transients in a frequency-selective manner and reduces stationary signals. In particular, different detections of the error probability are carried out with the left channel of the audio reference signal and with the left channel of the audio test signal as input signals, with the right channel of audio reference signal and with the right channel of the audio test signal as input signals, with the left channel of the preprocessed audio reference signal and with the left channel of the preprocessed audio test signal as input signals and with the right channel of the preprocessed audio reference signal and with the right channel of the preprocessed audio test signal as input signals so as to obtain a measure of the quality of the stereophonic audio test signal.
A disadvantage of the known method for listening-adapted quality evaluation of audio signals is the fact that the stereo ability is limited to a reproduction by headphones alone. In other words, the audio test signal which enters the ear of a listener is compared with the audio reference signal which enters the ear of a listener. This means that effects produced by a room, such as reflections on the walls, on the ceiling and on the floor, multiple reflections, attenuations, etc., are not taken into account. Furthermore, known quality-evaluating methods are not able to take into account any directional characteristic of the human ear, i.e. it makes no difference whether a signal comes from the rear, from the front or from the side. Known measurement methods are only applicable to headphone reproduction in the case which the acoustic signal is emitted by the headphone loudspeaker, which is normally arranged directly on the ear, and is introduced in the ear or the quality-evaluating process.
The known method is also disadvantageous insofar as a listening-adapted quality evaluation of multi-channel signals, such as e.g. 5-channel signals, which become more and more common and which are known under the headword “Dolby surround”, has been absolutely impossible up to now.
It is the object of the present invention to provide an improved concept for evaluating the quality of audio signals in the case of which room effects are additionally taken into account.
In accordance with a first aspect of the invention, this object is achieved by a system for evaluating the quality of an audio test signal derived from an audio reference signal by coding and decoding, said audio test signal and said audio reference signal each comprising a plurality of channels, each channel being adapted to be made audible by one loudspeaker of a plurality of loudspeakers which are positioned at different positions in an at least fictitious room, and two listening reference points being defined with respect to the positions of the plurality of loudspeakers, said system comprising: a unit for converting the audio reference signal into a first audio reference sum signal at the first reference point and into a second audio reference sum signal at the second reference point and for converting the audio test signal into a first audio test sum signal at the first reference point and into a second audio test sum signal at the second reference point, the audio reference sum signals and the audio test sum signals at the first and second reference points being a superposition of the respective channels, which can be emitted by said plurality of loudspeakers, weighted with a respective transfer function between the respective loudspeaker and the reference point in question; and a unit for evaluating the quality of the audio test sum signals while taking into consideration the audio reference sum signals so as to provide an indication of the quality of the audio test signal.
In accordance with a second aspect of the invention, this object is achieved by a method for evaluating the quality of an audio test signal derived from an audio reference signal by coding and decoding, said audio test signal and said audio reference signal each comprising a plurality of channels, each channel being adapted to be made audible by one loudspeaker of a plurality of loudspeakers which are positioned at different positions in an at least fictitious room, and two reference points being defined with respect to the positions of the plurality of loudspeakers, said method comprising the following steps: converting the audio reference signal into a first audio reference sum signal at the first reference point and into a second audio reference sum signal at the second reference point; converting the audio test signal into a first audio test sum signal at the first reference point and into a second audio test sum signal at the second reference point; weighting the respective channels, which can be emitted by said plurality of loudspeakers, with a respective transfer function between the respective loudspeaker and the reference point in question; superimposing the weighted channels at said first and at said second reference point so as to obtain the audio reference sum signals and the audio test sum signals; and conducting the audio test sum signals and the audio reference sum signals to a unit for evaluating the quality of the audio test sum signals while taking into consideration the audio reference sum signals so as to obtain an indication of the quality of the audio test signal.
The present invention is based on the finding that, although signals comprising an arbitrary number of channels exist, the human listener, who counts in the final analysis, always has only two ears at his disposal. Directional hearing is caused by the different impulse responses for different incidence directions of sound signals into the human ear. The different impulse responses for different incidence directions are referred to as “head-related transfer functions” in the field of technology. In reality, there are not only the direct sound paths between the ear and the loudspeaker, but reflections on the walls, on the ceiling and on the floor occur as well. This can be summarized as room impulse response. The HRTFs and the room impulse response lead, in combination, to a change of sound which can, according to the present invention, also be evaluated by measurement systems without explicit modelling of binaural effects, such as different masking thresholds for binaural signals in comparison with monaural signals, perception of phase shifts, precedence effects, etc.
When audio signals are evaluated by means of listening tests, standardized listening rooms, which have been standardized e.g. according to ITU-R BS.1116, are normally used. The size, the loudspeaker arrangement and the reverberation time are largely determined in this case. When a more comprehensive quality evaluation of audio signals is carried out in accordance with the present invention, both the head-related transfer functions (HRTFs) as well as the room impulse responses can be taken into account. For the listening-adapted quality evaluation according to the present invention it is, furthermore, of no importance whether a signal is a stereo signal which is emitted by two loudspeakers for the left and for the right channel, or whether the signal is a multi-channel signal comprising e.g. five channels and emitted by five loudspeakers which are positioned with respect to a listener e.g. in such a way that the loudspeakers are arranged at the rear left, front left, rear right, front right and at the front.
The quality-evaluating system according to the present invention comprises for this purpose a unit for converting the audio reference signal into a first audio reference sum signal at a first reference point and into a second audio reference sum signal at a second reference point and a unit for converting the audio test signal into a first audio test sum signal at the first reference point and into a second audio test sum signal at the second reference point, the audio reference sum signals and the audio test sum signals at the first and second reference points being a superposition of the respective channels, which can be emitted by the plurality of loudspeakers, weighted with a respective transfer function between the respective loudspeaker and the reference point in question. The audio reference sum signals and the audio test sum signals are finally fed into a quality-evaluating unit so as to obtain an indication for the quality of the audio test signal. The quality-evaluating unit can be an arbitrary known unit of the type disclosed e.g. in DE 196 47 399 C1 or of the type specified in the international standard ITU-R BS 1387 (PEAQ).
The method according to the present invention is advantageous with regard to the fact that, when the audio signal is a stereo signal, the influences of the listening room on the signal propagation from each loudspeaker to each reference point, i.e. each ear, can be taken into account.
Another advantage is to be seen in the fact that the method is applicable to audio signals comprising an arbitrary number of channels, since the channels are converted into two sum signals via respective transfer functions modelling the propagation of a signal from one loudspeaker to one ear, in such a way that an arbitrary quality-evaluating method, which is suitable for two channels, can be used.
Normally, the individual transfer functions can be gained by measurement making use of built-in microphones with an artificial head or of probe microphones with a human listener. The method according to the present invention will, however, be particularly advantageous when the head-related transfer functions of arbitrary persons are already known and can e.g. be downloaded via the internet from a suitable server. In this case, the room impulse response which occurs in a listening room and which can be measured or simulated can be convoluted with a specific existing HRTF so as to obtain a transfer function. This will be advantageous especially in cases where the listening room does not yet exist, i.e. where the acoustic properties of a room are simulated prior to actually constructing the room so as to simulate the acoustic properties when e.g. concert halls or sound studios are planned and so as to optimize the listening room already prior to its construction.
In the following, preferred embodiments of the present invention will be explained in detail making reference to the drawings enclosed, in which:
In the following, the conversion unit 19 will be explained. This unit comprises a plurality of transfer functions OF11 to OF52, which are either the HRTFs, when an anechoic room, i.e. a room in which no reflections occur, is considered, or which are the whole transfer function of the room from one of the loudspeakers are weighted with the respective transfer functions. The output signals produced when the input signals are weighted with the respective transfer functions are added by means of a first adder 22 so as to obtain first audio sum signals. Analogously, a second adder 23 is provided for the second reference point 18 so as to add the output signals of the transfer functions from the respective loudspeakers 11 to 15 to the second reference point 18 so as to provide the second audio sum signals. It goes without saying that the audio test signal as well as the audio reference signal are processed by means of the conversion unit 19 in such a way that the same conditions prevail for both the audio reference signal and the audio test signal in such a way that the unit 20 for evaluating the quality of 2-channel signals will only measure the quality of coding/decoding and that no other effects will disturb the measurement result.
Although
With respect to the notation of the individual transfer functions reference should be made to the fact that the first figure always refers to the loudspeaker, whereas the second figure refers to the reference point, i.e. reference point No. 1 (17) or reference point No. 2 (18).
As has already been mentioned, the scenario in
In the following, the determination of the individual transfer functions OF11 to Of52 (
The first possibility is to position the loudspeakers 11 to 15 relative to the reference points 17 and 18 in the manner shown in
The transfer function from the first loudspeaker to the first reference point 17 (OF11 in
If, as has been stated, such measurements take place in a real space with non-absorbing walls, etc., the whole transfer function, which comprises the room impulse response and the head-related transfer functions (HRTFs) for the individual loudspeaker positions, will be determined directly. If such measurements are carried out in an anechoic room, i.e. in a fully sound-absorbing room, the HRTFs can be determined directly in this way; these HRTFs are then the transfer functions OF11 to OF52.
Irrespectively of whether the measurement is carried out by means of two built-in microphones and an artificial head or by means of two probe microphones and a test person, such sound measurements are complicated and expensive not least in view of the very expensive probe microphones.
If, however, head-related transfer functions (HRTFs) are known for specific persons or also for an “average person”, these head-related transfer functions can be used for being convoluted with the impulse response of a room; this impulse response can also be simulated. In this case, no measurements will be necessary for determining the transfer functions OF11 to OF52. A substantial advantage of this method is that it can also be used for simulating rooms which have not yet been constructed so as to design e.g. a sound studio for an optimum sound propagation for specific loudspeaker configurations prior to actually constructing this sound studio. It follows that, in this case, it can no longer be said that the room in which the quality of a coded and subsequently decoded audio test signal is to be evaluated actually exists. Instead, the room only exists in a simulated form and is therefore a fictitious room.
Irrespectively of whether the room actually exists or whether it only exists as a fictitious room on the basis of a simulation, it is normally assumed that test persons are seated or stand in such a listening room, which may e.g. be a standardized listening room, at the best possible listening position. However, many test persons move their head to the front, to the rear, to the left or to the right while the test is taking place; this is also referred to as translation. In addition, the persons will normally move slightly away from the optimum listening position, i.e. they move their heads to the left and to the right, this being also referred to as bearing movements or rotation. Hence, a possibly existing middle loudspeaker, i.e. the loudspeaker 13, will no longer be located precisely in the middle. This happens because the directional perception is often unsure precisely at the front. In particular, the front and the back are confused in many cases. This is also referred to as “front-back confusion” in the field of technology. Making reference to
In order to cope with this situation, the quality-evaluating method carried out by the quality-evaluating system shown in
Various possibilities exist for evaluating the different quality indications. On the one hand, an average value can be assumed so as to be able to make a general statement to the effect that a certain coding/decoding method may perhaps be optimal, if the position of the head is not changed at all, or that this method is less advantageous than some other coding method in the case of certain translations or bearing movements or rotations of the head.
On the other hand, the “worst case” of the individual measurements can be found out so as to be able to make a statement whether a certain coding/decoding method is sub-optimal in the case of a specific position of the head with respect to the five loudspeakers when 5-channel audio signals are processed. It will be advantageous to carry out such quality evaluations for a plurality of positions of the reference points 17, 18 close to the optimum reference listening position on the one hand. On the other hand, such measurements can also be carried out for other sites which are not located at the reference listening position so that e.g. certain other seats in a sound studio can be judged so as to find out whether or not coding/decoding errors can be heard there.
The above description shows clearly that the system according to the present invention and the method according to the present invention provide existing quality-evaluating systems and methods with a substantial amount of flexibility in such a way that is not only possible to evaluate the quality of audio signals with more than two channels but that it is also possible to act out quality evaluations for different scenarios of positioning the reference points 17, 18 relative to the loudspeakers 11 to 15, and that the system according to the present invention and the method according to the present invention can even be used for designing sound studios or other listening rooms, such a cinemas, so as to be able to carry out a listening-adapted evaluation of the quality of specific coding/decoding methods in a specific room. Furthermore, the method according to the present invention and the system according to the present invention can be used for designing listening rooms so that the optimum coding method among a large number of possible coding methods can be selected for a specific room.
The transfer functions OF11–OF52 can be realized in the field of circuit technology in different ways. Preferably, they are realized through an FIR filter for each impulse response. Reference should be made to the fact that for large rooms the FIR filters may have a considerable length; in the case of a sampling frequency of 48 kHz their length may e.g. exceed 100,000 sampling values. In this case, it will be advisable to represent the first milliseconds of this length, where the reflections occurring are primarily discrete reflections, more precisely than the time domain towards the end of the filter, where the reflections occurring are primarily diffuse reflections.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP99/09979 | 12/15/1999 | WO | 00 | 10/19/2001 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO00/44196 | 7/27/2000 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6118875 | M.o slashed.ller et al. | Sep 2000 | A |
6223090 | Brungart | Apr 2001 | B1 |
6271771 | Seitzer et al. | Aug 2001 | B1 |
Number | Date | Country |
---|---|---|
WO9823130 | May 1998 | DE |
196 47 399 | Nov 1996 | EP |
0165733 | Dec 1985 | JP |