The invention relates to a method for recording sound signals of one or more sound sources located in a recording space and having time-variable directional characteristics and for reproducing the sound signals in an area of reproduction. The invention also relates to a system for carrying out the method.
Various methods are known, which attempt to record and to reproduce the impression of the sound arising in a room. The best known method is the stereo method and the further developments thereof, in which the location of a sound source is detected during the recording process and reproduced during the reproduction process. In the reproduction process however there is only a restricted region in which the location of the recorded sound source is correctly reproduced. Other reproduction methods which synthesise the recorded sound field, such as for example Wave Field Synthesis, can on the other hand reproduce the location of the sound source correctly independently of the position of the listener.
In none of these methods is temporally variable information recorded or reproduced about the direction of emission of a sound source. If sound sources with temporally variable directional characteristics are recorded, information is therefore lost. For transmitting a video conference for example, in which one participant can communicate with different participants and address them specifically, with the known methods this directional information is not detected, recorded or reproduced.
The problem addressed by the invention is to produce a method for the recording, transmission and reproduction of sound, with which the information-bearing properties of the sound sources are reproduced true to life and in particular can be transmitted in real time.
The problem is solved by means of a method for recording sound signals of a sound source located in a recording space with time variable directional characteristics using sound recording means and for reproducing the sound signals in an area of reproduction using sound reproduction means, which is characterised in that the main direction of emission of the sound signals emitted by the sound source is detected in a time-dependent manner and the reproduction takes place in a manner dependent on the detected main direction of emission.
A sound source with time variable directional characteristics can be in particular a participant of a video conference, who can address other participants and therefore speak in different directions. The emitted sound signals are recorded and their main direction of emission simultaneously detected.
The recording of the sound signals can be performed in the conventional manner with microphones or also with one or more microphone arrays. The means for detecting the main direction of emission can be of any type. In particular, acoustic means can be used. To this end, multiple microphones and/or one or more microphone arrays can be used, which detect the level and/or phase differences of the signal in different directions, from which the main direction of emission can be determined by means of a suitable signal processing system. If the position of the acoustic means, the directional characteristics thereof, and/or the position of the sound source are known, this information can be appropriately taken into account by the signal processor in determining the main direction of emission. In the same way, knowledge of the geometry of the environment and its associated sound propagation properties, as well as reflection properties can also be taken into account in determining the main direction of emission. It is particularly advantageous if information on the measured, approximated or simulated directional characteristics of the sound source can also be incorporated in determining the main direction of emission. This applies particularly in cases where the main direction of emission is only to be determined approximately, which is sufficient for many applications.
To detect the main direction of emission however, optical means can also be used, such as e.g. a video detection process with pattern recognition. In the case of participants in a video conference, it can be assumed that the speaking direction corresponds to the viewing direction. Using pattern recognition it can therefore be determined in which direction a participant is looking, and thereby the speaking direction can be determined. In particular, a combination of acoustic and optical means with appropriate signal processing can also be used. If necessary the acoustic means can also be used for recording the sound signals while simultaneously detecting the main direction of emission, and vice versa.
It is often sufficient to detect the main direction of emission approximately. A classification into 3 or 5 categories, e.g. straight, right and left or straight, diagonally to the right, right, diagonally to the left and left, can fully suffice to communicate the essential information.
The main direction of emission can advantageously be the main direction of emission in that frequency range which carries the information. To this end, the frequency range applied to determine the main direction of emission can be restricted, e.g. by using a frequency filter.
The reproduction of the sound signals should take place in accordance with the detected main direction of emission. The purpose of this is to simulate the directed emission of the original source. This can be done either by a real directed emission of the sound signal or by a simulated directed reproduction, which is perceived by the listener as directed reproduction, without it being actually physically directed in the conventional sense. The applicable methods differ among other things in the accuracy with which the directional characteristics can be reconstructed. In practice, the perceptual naturalness of the reconstruction or simulation is crucial. In the following, all such methods are summarized under the term “directed reproduction”.
In the inventive method, the reproduction of the sound signals can be carried out with a first reproduction unit associated with the sound source and at least one second reproduction unit spaced apart from the first reproduction unit. The position of this first reproduction unit in the area of reproduction can correspond to a virtual position of the sound source in the area of reproduction. The second reproduction unit(s) can be used to relay the directional information of the sound reproduction. Preferably, two second reproduction units are used, one of which can be positioned on one side and the other on the other side of the first sound reproduction unit. Instead of using a second reproduction unit on each side of the first sound reproduction unit respectively, multiple second reproduction units can be arranged respectively spaced apart from one another, preferably in each case two second reproduction units.
The sound signals recorded in the recording space of the sound source can be reproduced in the area of reproduction of a first reproduction unit, such as e.g. a loudspeaker. This loudspeaker can be placed in the area of reproduction in such a way that it is located at the virtual position of the sound source in the area of reproduction. The sound source is so to speak “attracted” into the area of reproduction. The first reproduction unit can also be generated however with multiple loudspeakers, with a group of loudspeakers or with a loudspeaker array. For example it is possible by means of wave field synthesis to place the first reproduction unit as a point source at the virtual position of the sound source in the area of reproduction, such that the sound source is virtually attracted into the area of reproduction. This is advantageous e.g. for video conferences in which as far as possible the impression of an actual conference with the presence of all participants is to be achieved. The sound source would then be a participant in the recording space. The reproduction would be carried out via a first reproduction unit, which would be placed at the point in the area of reproduction at which the participant in the recording space would be virtually present in the area of reproduction.
The information on the direction of emission can be relayed by the fact that the reproduction with the second reproduction unit(s) takes place relative to the first reproduction unit with a time delay τ relative to the first reproduction unit. This time delay can be different for each of the second reproduction units. It has been shown that information regarding the direction of emission of a sound source can be communicated to the human ear by a type of echo or reflection of the sound signal being emitted by one or more sound sources spaced apart with a small time delay. The time delay at positions for participants, at which a participant in e.g. a video conference can be placed, should have a value between 2 ms and 100 ms so that the echo or reflection is not processed as a separate sound event. The time delay τ of the second reproduction unit or units can therefore be preferably chosen such that the actual time delay between the sound signals has a value at least in partial regions of the area of reproduction between 2 ms and 100 ms, preferably between 5 ms and 80 ms and in particular between 10 ms and 40 ms.
The reproduction due to the second reproduction unit(s) can take place in accordance with the spatial characteristics of the area of reproduction with a reduced level, in particular with a level reduced by 1 to 6 dB and preferably by 2 to 4 dB. According to the directional characteristics to be simulated, before the reproduction by the second reproduction unit(s) the sound signal can also be processed with a frequency filter, for example a high-pass, low-pass or band pass filter. The parameters of the frequency filter can be either fixed in advance or be controlled depending on the main direction of emission.
The second reproduction unit(s) can, as can the first reproduction unit also, be one or more loudspeakers or a virtual source, which is generated with a group of loudspeakers or with a loudspeaker array, for example using wave field synthesis.
For the best possible true to life reproduction of the information about the direction of emission of a sound source, the reproduction level of the first and second reproduction units can also be adapted depending on the directional characteristics to be simulated. For this purpose the reproduction levels are adjusted such that the perceivable loudness differences resulting from the directional characteristics can be appropriately approximated at different listener positions. The reproduction levels of the individual reproduction units determined in this way can be defined and stored for different main directions of emission. In the case of time variable directional characteristics, the detected main direction of emission then controls the reproduction levels of the individual reproduction units.
The method described above can of course also be applied to multiple sound sources in the recording space. For the reproduction of multiple sound sources with the described method it is particularly advantageous to have the sound signals of the individual sound sources to be transmitted provided separately from one another. Different methods for recording the sound signals are therefore conceivable. For recording the sound signals, sound recording means can be associated with the individual sound sources. This association can either be 1:1, so that each sound source has its own sound recording means, or so that groups of multiple sound sources are associated to one sound recording means respectively. The position of the active sound source at a given moment can be determined both with conventional localisation algorithms and also with video acquisition and pattern recognition. In synchronous sound emission from more than one sound source, with a grouping of the sound sources to one sound recording means, the sound signals of the individual sound sources can be separated from each other with conventional source separation algorithms such as for example “Blind Source Separation”, “Independent Component Analysis” or “Convolutive Source Separation”. If the position of the sound sources to be recorded is known, as a sound recording means for a group of sound sources a dynamic direction-selective microphone array can also be used, which processes the received sound signals according to the pre-specified positions and combines them together for each sound source separately.
The detection of the main direction of emission of the individual sound sources can be done on the same principles as described for one sound source. To do this, appropriate means can be associated with the individual sound sources. The association can be such that each sound source has its own direction sensing means, or in such a way that groups of multiple sound sources are associated to one direction sensing means. In grouped sound sources the detection of the main direction of emission occurs as for the case of one sound source, when at the given point in time only one sound source is emitting sound. If two or more sound sources emit sound, then in the first processing step of the direction sensing means the received signals (for example sound signals or video signals) are first associated with the corresponding sound sources. In the case of optical means, this can be done using object recognition algorithms. In the case of acoustic means, the sound signals of the sound sources recorded separately with the previously described sound recording means can be used for associating the received signals to the corresponding sound sources. When the position of the sound sources is known, the transmission function between the sound sources and the acoustic direction sensing means can preferably be taken into account, as well as the directional characteristics of both the direction sensing means and the sound recording means. Only after the assignment of the received signals to the relevant sound sources is the main direction of emission determined separately for the individual sound sources, for which purpose the same methods described above for one sound source can be used.
The quality of the reproduction can be improved by suppressing sound signals from a sound source which are received by recording means, or direction sensing means, not associated with the sound source, using acoustic echo cancellation or cross talk cancellation. The minimisation of acoustic reflections and extraneous noises with conventional means can also contribute to improving the reproduction quality.
For reproducing the sound signals, a first reproduction unit can be associated with each sound source. This association can take place either on a 1:1 basis, so that each sound source has its own first reproduction unit, or in such a way that groups of multiple sound sources are associated to one reproduction unit. Depending on the association, the spatial information reproduced in the area of reproduction is more or less accurate.
As an alternative to the above described reproduction technique the reproduction can also be carried out using wave field synthesis. For this purpose, instead of the point source normally used, the directional characteristics of the sound source must be taken into account for synthesising the sound field. The directional characteristics to be used for this are preferably stored in a database ready for use. The directional characteristics can be for example a measurement, an approximation obtained from measurements, or an approximation described by a mathematical function. It is equally possible to simulate the directional characteristics using a model, for example by means of direction dependent filters, multiple elementary sources or a direction dependent excitation. The synthesis of the sound field with the appropriate directional characteristics is controlled using the detected main direction of emission, so that the information on the direction of emission of the sound source is reproduced in a time dependent way. The method described above can of course also be applied to multiple sound sources in the recording space.
As well as the reproduction techniques described up to now, a multi-loudspeaker system (multi-speaker display device) known from the prior art can also be used for the directed reproduction of the sound signals, the reproduction parameters of which are also controlled by the main direction of emission determined in a time dependent way. Instead of controlling the reproduction parameters, control of a rotatable mechanism is also conceivable. If there are multiple sound sources present in the recording space, in the area of reproduction for each sound source a multi-loudspeaker system can be provided.
Other known reproduction methods from the prior art can also be used for the directed reproduction of the sound signals, the reproduction parameters of which in order to do this must be controlled according to the main direction of emission determined in a time dependent manner.
A further problem addressed by the invention is to create a system which facilitates the recording, transmission and true to life reproduction of the information-bearing properties of the sound sources.
The problem is solved using a system for recording sound signals from one or more sound sources with time variable directional characteristics with sound recording means in a recording space and for reproducing the sound signals with sound reproduction means in an area of reproduction, which is characterised in that the system has means for detecting, in a time dependent manner, the main directions of emission of the sound signals emitted by the sound source(s) and means for reproducing the transmitted sound signals in dependence on the detected directions.
The system can have at least two sound recording units associated with a sound source for recording the sound signals emitted by this sound source and the main direction of emission thereof. Alternatively or additionally to this the system can also have optical means for detecting the main direction of emission thereof.
Means for detecting the main direction of emission can be e.g. microphones or microphone arrays or means for video acquisition, in particular with pattern recognition.
The reproduction of the sound signals can be carried out with a first reproduction unit associated with the sound source and at least one second reproduction unit spaced apart from the first reproduction unit. The position of this first reproduction unit in the area of reproduction can correspond to a virtual position of the sound source in the area of reproduction.
Reproduction with the second reproduction unit or units can be done with a time delay τ relative to the first reproduction unit for subjectively generating a directed emission of sound. In the case of multiple second reproduction units an individual time delay can be chosen for each one.
The system can be used for e.g. sound transmission in video conferences. In this case there are specified positions at which participants in the conference remain. Depending on the participants' positions the time delay τ of the second reproduction unit or units can be chosen in such a way that the actual time delay between the sound signals at least at the positions of the respective participants in the area of reproduction lies between 2 ms and 100 ms, preferably between 5 ms and 80 ms and in particular between 10 ms and 40 ms.
The reproduction using the first and/or the second reproduction unit(s) can be carried out at a reduced level, in particular at a level reduced by 1 to 6 dB and preferably by 2 to 4 dB, and/or in particular in accordance with the main direction of emission.
It is self-explanatory that the system for transmitting the sound signals of one sound source can be extended to the transmission of the sound signals of multiple sound sources. This can be done by simply increasing the number of the means previously described. It can be advantageous however to reduced the required means in such a way that certain means are associated with multiple sound sources on the recording side. Alternatively or additionally reproduction means can also have multiple associations on the reproduction side. The association possibilities for the inventive method described above also apply analogously to the system. In particular the number of sound recording units and/or sound reproduction units can correspond to the number of sound sources plus 2.
Additional embodiments of the method and the system are disclosed in the sub claims.
There follows a detailed description of the invention with reference to the attached illustrations and with the aid of selected examples:
The microphone array MA illustrated in
The main direction of emission of a sound source T is determined with a microphone array MA, that is, a plurality of single microphones M connected together. For this purpose the sound source T is surrounded with these microphones MA in an arbitrary arrangement, for example in a circle, as shown in
In a first step the position of the sound source T with respect to the microphones M is determined, such that all distances r between sound source T and microphones M are known. The position of the sound source T can be specified for example by measurement or with a conventional localisation algorithm. It can be advantageous for specifying the position to use corresponding filters to consider only those frequency ranges which have no marked preferred direction with respect to the sound emission. In many cases this applies to low frequency ranges, in the case of speech for example below about 500 Hz.
The main direction of emission of the sound source T can be determined from the sound levels detected at the microphones M, wherein the different sound attenuation levels as well as transit time differences due to the different distances r between the individual microphones M and the sound source T are taken into account. With direction selective microphones M, the directional characteristics of the microphones M can also be taken into account when determining the main direction of emission.
The more directions are detected by microphones, the more precisely the main direction of emission can be determined. Conversely, the number of necessary microphones can be reduced, (a) when the main direction of emission is only to be detected approximately, for example a classification into 3 or 5 categories may be completely sufficient, and accordingly an arrangement of the direction detecting means in these directions is sufficient, or (b) when the main direction of emission is restricted to a limited angular range; for example the speaking direction in teleconferencing will normally be restricted to an angular range in the forward direction.
The microphones can be used as means for direction detection and also as sound recording means for recording the sound signals from the sound source. Using the position of the sound source and where appropriate also using the determined main direction of emission, a weighting can be defined for the microphones, which regulates the contribution of the individual microphones to the recorded sound signal.
Instead of the relatively costly method of
If one uses a highly simplified reference for the directional characteristics in the case of speech signals for example, as shown schematically by way of example in
If the possible main directions of emission are restricted to a specific angular range, then the reference shown in
The approximation of the directional characteristics of speech with one of the two reference patterns described above has proved to be adequate for many applications, in particular for conferencing applications in which a relatively coarse determination of the main direction of emission is adequate for a natural reconstruction. For a more accurate determination of the main direction of emission, in a videoconference application the one or more optical means with pattern recognition can also be used. It is also possible using upstream frequency filters to limit the determination of the main direction of emission to the information-bearing frequency ranges.
As in
The reference sound level can be detected for example with a clip-on microphone M1, which constantly follows the changes in direction of the sound source T, so that the direction of the sound signals detected therewith is always constant and therefore known. It is advantageous if the direction of the reference sound level is the same as the main direction of emission. The microphone M1 which is used for determining the reference sound level can also be used simultaneously as an acoustic means for recording the sound signals.
If for example the approximation shown in
In this method also, the determination of the main direction of emission can be restricted to the information-bearing frequency ranges by using appropriate frequency filters.
In
If, as shown in
In the example illustrated in
The recording of the sound signals of the sound sources in
In
The sound signals TS of a sound source recorded in the recording space can be reproduced in the area of reproduction with a first reproduction unit WE1 assigned to the sound source. The position of the first reproduction unit WE1 can be chosen to be the same as the virtual position of the sound source in the area of reproduction. For a video conference this virtual position can be for example at the point in the room where the visual representation of the sound source is located.
To communicate the directional information of the sound reproduction, at least one second reproduction unit WE2 spaced apart from the first reproduction unit is used. Preferably two second reproduction units are used, one of which can be positioned on one side and the other on the other side of the first reproduction unit WE1. Such a design allows changes in the main direction of emission of the sound source in an angular range of 180° around the first reproduction unit to be simulated, i.e. around the virtual sound source positioned at this point. The information on the direction of emission can be communicated by the fact that the reproduction with the second reproduction units is delayed relative to the first reproduction unit. The time delay τ used should be chosen so that the actual time delay Δt=twE2−twE1 between the sound signals has a value at least in sub-regions of the area of reproduction between 2 ms and 100 ms, so that for the receivers, i.e. for example for the receiving participants of the video conference, who are located in these sub-regions, the actual time delay lies between 2 ms and 100 ms.
The main direction of emission HR detected in the recording space controls the reproduction levels at the second reproduction units via an attenuator a. In order to simulate a main direction of emission of the sound source for example, which is directed towards the right side of the room, the sound signals to the second reproduction unit, which is located on the left, are completely attenuated and only reproduced via the right-hand second reproduction unit delayed relative to the first reproduction unit.
The method described above can of course also be applied to multiple sound sources in the recording space. For this purpose correspondingly more first and second reproduction units must be used.
The first and also the second reproduction units WE1 and WE2 can, as shown in
In
The basic method described in
One possibility is, instead of a second reproduction unit on each side of the first reproduction unit WE1, to use multiple second reproduction units WE2 spaced apart, as shown in
As shown in
For the best possible true to life reproduction of the information about the direction of emission, the reproduction level of the first and second reproduction units can also be adapted depending on the directional characteristics to be simulated. For this purpose the reproduction levels are adjusted using an attenuator a, such that the perceivable loudness differences at different listener positions resulting from the directional characteristics can be appropriately approximated. The attenuations thus determined for the individual reproduction units can be defined and stored for different main directions of emission HR. In the case of a sound source with time variable directional characteristics, the detected main direction of emission then controls the reproduction levels of the individual reproduction units.
In
The method described above can of course also be applied to multiple sound sources in the recording space. For this purpose correspondingly more first and second reproduction units must be used.
In
If multiple sound sources are present in the recording space, the sound signals of the sound sources, as explained in regard to
In
As shown in
As explained with regard to
In
In
The reproduction of the sound signal TS of the sound source takes place via the first reproduction unit. The sound signal TS can either be the sound signal recorded with its own microphone, or it is formed from the sound signals TR90, TR45, TL90 and TL45, e.g. by the largest of these sound signals or the sum of the four sound signals being used. In
It is true that the sound quality of the reproduction method described can be affected by comb filter effects; nevertheless the method can be of great benefit in some applications due to its simplicity.
Number | Date | Country | Kind |
---|---|---|---|
10 2005 057 406.8 | Nov 2005 | DE | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2006/011496 | 11/30/2006 | WO | 00 | 5/29/2008 |