The Invention relates to a method and a device for sound field reproduction from a first audio input signal using a plurality of loudspeakers aiming at synthesizing a sound field within a preferred listening area in which none of the loudspeakers are located, said sound field being described as emanating from a virtual source, said method comprising steps of calculating positioning filters using virtual source description data and loudspeaker description data according to a sound field reproduction technique which is derived from a surface integral, and applying positioning filter coefficients to filter the first audio input signal to form second audio input signals.
Sound field reproduction refers to the synthesis of physical properties of an acoustic wave field within an extended portion of space. This framework enables to get rid of the well known limitations of stereophonic based sound reproduction techniques concerning listener positioning constraints, the so-called “sweet spot”. The sweet spot is a small area in which the illusion, on which rely stereophonic principles, is valid. In the case of two channels stereophony, the voice of a singer can be located in the middle of the two loudspeakers if the listener is located on the loudspeakers midline. This illusion is referred to as phantom source imaging. It is simply created by feeding both loudspeakers with the same signal. However, if the listener moves, the illusion disappears and the voice will be heard on the closest loudspeaker. Therefore, no phantom source imaging is possible outside of the “sweet spot”.
It is generally assumed that the listener is located at a distance from each loudspeaker which equals the loudspeaker spacing. This enables one to define so-called “panning laws” to position a virtual source at a given angular position from the listener. However, this can only be experienced if the listener is located exactly at the sweet spot.
Sound field reproduction techniques don't make any assumption about the listener position. Virtual sound imaging is realized by synthesizing a target sound field. There are three methods for describing the target sound field:
In the object based description, the target wave field is described as an ensemble of sound sources. Each source is further defined by its position relative to a given reference point and its radiation characteristics. From this description, the sound field can be estimated at any point of space. In the wave based description, the target sound field is decomposed into so-called “spatially independent wave components” that provide a unique representation of the spatial characteristics of the target sound field. Depending on the chosen coordinate, the spatially independent wave components are usually:
For an exact description of the sound field, the wave based description requires an infinite number of spatially independent wave components. In practice, a limited number of components are used which provides a description of the sound field which remains valid in a reduced portion of space.
Finally, the surface description relies on the continuous description of the pressure and/or the normal component of the pressure gradient of the target sound field at the boundaries of a subspace Ω. From that description, the target sound field can be estimated in the complete subspace Ω using so-called surface integral (Rayleigh 1, Rayleigh 2, and Kirchhoff-Helmholtz Integrals).
It should be noted that there exist transformations to transpose the descriptions using one method to another method. For example, the object based description can be easily transformed in the surface description by extrapolating the sound field radiated by the acoustical objects at the boundaries of a subspace Ω.
In the past years, several methods have been developed to enable the synthesis of a target wave field in an extended listening area. One of such method relies on the recreation of the curvature of the wave front of an acoustic field emitted by a virtual source (object based description) by using a plurality of loudspeakers. This method has been disclosed by A. J. Berkhout in “A holographic approach to acoustic control”, Journal of the Audio Eng. Soc., Vol. 36, pp 977-995, 1988, and is known under the name “Wave Field Synthesis”.
A second method relies on the decomposition of a wave field into spatially independent wave field components such as spherical harmonics or cylindrical harmonics (wave based description). This second method has been disclosed by M. A. Gerzon in “Ambisonic in multichannel broadcasting and video”, Journal of the Audio Engineering Society, vol. 33, pp. 859-871, 1985.
Both methods are mathematically linked as disclosed by Jérôme Daniel, Rozenn Nicol and Sébastien Moreau in “Further Investigations of High Order Ambisonics and Wavefield Synthesis for Holophonic Sound Imaging”, Audio Engineering Society, Proceedings of the 114th AES Convention, Amsterdam, The Netherlands, Mar. 22-25, 2003. They are generally referred to as Holophonic methods.
In theory, these methods allow the control of a wave field within a certain listening zone in all three spatial dimensions. However, this is only correct if an infinite number of loudspeakers are used (a continuous distribution of loudspeakers). In practice, a finite number of loudspeakers is used which creates physical inaccuracies in the synthesized sound field.
As an example, Wave Field Synthesis is derived from the Rayleigh 1 integral which requires a continuous planar infinite distribution of ideally omnidirectional secondary sources (loudspeakers). Three successive approximations are used to derive Wave Field Synthesis from the Rayleigh 1 integral assuming that virtual sources and listeners are in the same horizontal plane:
Following these approximations, the loudspeaker array can be regarded as an acoustical aperture through which the incoming sound field (as emanating from a target sound source) propagates into an extended yet limited listening area. Simple geometrical considerations enable one to define a source/loudspeaker visibility area in which the virtual source is “visible” through the loudspeaker array. The term “visible” means here, that the straight line joining the virtual source and the listener crosses the line segment on which loudspeakers are located. This source/loudspeaker visibility area 25 is displayed in
Sources can conversely be located only in a limited zone so that they remain visible from within the entire listening area as disclosed by E. Corteel in “Equalization in extended area using multichannel inversion and wave field synthesis,” Journal of the Audio Engineering Society, vol. 54, no. 12, 2006.
The source positioning area can be extended by adding supplementary loudspeaker arrays around the listening area. Considering the obtained loudspeaker array geometry, Rayleigh 1 integral does not apply anymore. Loudspeaker driving signals are thus derived from Kirchhoff-Helmholtz integral using similar approximations:
In the original formulation of Kirchhoff-Helmholtz integral, the secondary source distribution is composed of ideal omnidirectional sources (monopoles) and ideal bi-directional sources (dipoles). However, as disclosed by R. Nicol in <<Restitution sonore spatialisée sur une zone étendue: application à la téléprésence>>, Ph.D. thesis, Université du Maine, Le Mans, France, 1999, the loudspeakers of the array can be splitted into two categories (relevant and irrelevant loudspeakers) for which:
The sound fields emitted by the monopoles and the dipoles have mostly similar spatio-temporal characteristics. However, relevant monopoles and relevant dipoles are in phase and tend to produce only double sound pressure level whereas irrelevant monopoles and irrelevant dipoles are out of phase and only tend to compensate for each other. Therefore, only relevant monopoles could be used for the synthesis of the target sound field. This is useful since most available loudspeakers have more omnidirectional radiation characteristics. A more general class of sound field rendering techniques based on holophonic principles can be defined using simplifications of the “surface integrals” as disclosed by R. Nicol in <<Restitution sonore spatialisée sur une zone étendue: application àla téléprésence>>, Ph.D. thesis, Université du Maine, Le Mans, France, 1999. The proposed simplifications involve:
The previously defined approximations to these “surface integrals” (Rayleigh 1 and Kirchhoff-Helmholtz) introduce inaccuracies in the synthesized sound field compared to the target sound field as disclosed by E. Corteel in “Caractérisation et extensions de la Wave Field Synthesis en conditions réelles”, Université Paris 6, PhD thesis, Paris, 2004. In the case of Wave Field Synthesis, the reduction of the secondary source surface to a linear distribution in the horizontal plane (approximation 1) limits the technique to the reproduction of virtual sources in the horizontal plane (2D reproduction) and modifies the level of the sound field compared to the target. Approximation 2 introduces diffraction artefacts which can be reduced by tapering loudspeakers located at the extremities of the array. Approximation 1 and 2 mostly reduce the capabilities of the rendering system (size of the listening area, positioning of the virtual sources). They hardly modify the quality of the sound field perceived by a listener in terms of coloration or localization accuracy at a given position within the listening area as disclosed by E. Corteel in “Caractérisation et extensions de la Wave Field Synthesis en conditions réelles”, Université Paris 6, PhD thesis, Paris, 2004. Approximation 3 limits the exact reproduction of the target wave field only below a certain frequency, the Nyquist frequency of the spatial sampling process, that is commonly referred to as “spatial aliasing frequency”. This spatial sampling introduces inaccuracies that are perceived as artefacts in terms of localization of the virtual source and coloration as disclosed by E. Corteel, K. V. NGuyen, O. Warusfel, T. Caulkins, and R. S. Pellegrini in “Objective and subjective comparison of electrodynamic and MAP loudspeakers for Wave Field Synthesis”, 30th international conference of the Audio Engineering Society, 2007.
This spatial sampling process is a mandatory task for any sound field reproduction techniques that are based on surfaces integrals since no currently available transduction technology is capable of continuously controlling the radiation of an acoustical source (continuous loudspeaker distribution). This surface has to be spatially sampled and this creates spatial aliasing artefacts that reduce the quality of the synthesized sound field. The spatial sampling process is a key cost factor for sound field reproduction systems since it determines the number of loudspeakers and channels to control independently using digital signal processing techniques.
A solution to increase the spatial aliasing frequency for Wave Field Synthesis has been proposed by Evert Start in “Direct Sound Enhancement by Wave Field Synthesis”, PhD thesis, Delft University of Technology, the Netherlands, 1997. It consists in synthesizing virtual sources having a directivity index which is an increasing function of frequency which depends on loudspeaker spacing. The proposed method also requires that the loudspeakers have the same radiation characteristics. This method is however putting constraints on the manipulation of the radiation characteristics of the virtual sources and on the required radiation characteristics of the loudspeakers. The latter is the most problematic aspect since most existing loudspeakers do not have the required radiation pattern.
Another solution to increase the spatial aliasing frequency has been proposed by Etienne Corteel in “On the use of irregularly spaced loudspeaker arrays for Wave Field Synthesis, potential impact on spatial aliasing frequency”, DAFX06, 2006, available at—209.pdf. It consists in using irregularly spaced loudspeaker arrays to increase the spatial aliasing frequency for Wave Field Synthesis. It shows that double logarithmically spaced array, the spatial aliasing frequency can be increased by 20% compared to a regularly spaced loudspeaker array having the same number of loudspeakers and same length. However, the increase of aliasing frequency is only effective for sources located outside of the listening area. For sources located within the listening area (alternatively called “focused sources”), this loudspeaker arrangement reduces the spatial aliasing frequency compared to the equivalent regularly spaced array.
Additional rendering inaccuracies are to be expected from the room acoustics of the listening environment as disclosed by E. Corteel and R. Nicol in “Listening room compensation for wave field synthesis. What can be done?”, Proceedings of the 23rd Convention of the Audio Engineering Society, Helsingor, Danemark, June 2003. The rendering sound system always interacts with the listening room, so that the listener does not perceive the target virtual sound field, but a mixture between this latter and the listening room effect. Local reflections and reverberation are added by the listening room to the sound field produced by the loudspeakers, so that the sound field perceived by the listener may differ more or less from the expected result. The most obvious effect relies on the early reflections within the first 10-30 ms that can produce sound coloration, distance perception distortion, and angular localization errors. For small listening room, room modes are also audible at low frequencies, reducing the clarity and producing sound coloration as disclosed by R. S. Pellegrini, “A Virtual Listening Room as an Application of Auditory Virtual Environments”, Ph. D. Thesis, Ruhr-Universität, Bochum, Germany, 2001.
To discard the listening room interaction, one way consists in considering either an anechoic listening environment or playback over headphone. But these solutions are not really convenient for most applications. A more general way to deal with this problem is proposed by the room compensation strategy, that aims at cancelling—or more realistically reducing—the influence of the listening room on the virtual sound field perceived by the listener. Room compensation aims at cancelling out the acoustics of the listening environment using multichannel inverse filtering techniques as disclosed by E. Corteel in “Caractérisation et extensions de la Wave Field Synthesis en conditions réelles”, Université Paris 6, PhD thesis, Paris, 2004. These techniques allow for the reduction of the level of some early reflections within a large listening area. However, they put heavy constraints on the required processing power and they suffer from important practical and theoretical limitations that reduce their efficiency in realistic situations as disclosed by E. Corteel in “Caractérisation et extensions de la Wave Field Synthesis en conditions réelles”, Université Paris 6, PhD thesis, Paris, 2004.
A formula for the calculation of the spatial aliasing frequency has been proposed by Etienne Corteel in “On the use of irregularly spaced loudspeaker arrays for Wave Field Synthesis, potential impact on spatial aliasing frequency”, DAFX06, 2006, available at—209.pdf. In contrary to previously known formulae, the proposed formula enables to account for finite length loudspeaker arrays and the dependency on listening position. It is based on the arrival time of loudspeakers' contribution at a given listening position for the synthesis of a virtual source using Wave Field Synthesis. In
Sound field reproduction techniques make no a priori assumption of the position of the listener enabling the reproduction of the sound field within an extended area. For Wave Field Synthesis, this area may typically span the entire listening room. However, there may be positions in the room where the listeners will never be because there are furniture or simply because their task or the situation does not require that. Therefore a preferred listening area could be defined in which listeners may preferably stand and where sound reproduction artefacts should be limited.
The aim of the invention is to increase the spatial aliasing frequency within a preferred restricted listening area where the listener may stand for a given number and spatial arrangement of loudspeakers. It is another aim of the invention to limit the required number of loudspeakers considering a given aliasing frequency and a given extension of the listening area to produce a cost effective solution for sound field reproduction. It is also an aim of the present invention to limit the interaction of the reproduction system with the listening room so as to automatically reduce the influence of the listening room acoustics on the perceived sound field by the listeners.
The invention consists in a method and a device in which a ranking of the importance of each loudspeaker for synthesizing a target sound field associated to a virtual source within a restricted preferred listening area is defined. Based on this ranking, the loudspeakers' alimentation signals derived from a first input signal are modified so as to increase the spatial aliasing frequency by creating a “virtually shorter loudspeaker array” using only loudspeakers that contribute significantly to the synthesis of the target sound field within a restricted preferred listening area.
Instead of using a physically shorter array that would put restrictions on the positioning of the virtual source, the invention proposes to reduce the level of the alimentation signals of loudspeakers located outside of a source/listener visibility area.
In other words, there is presented a method and a device for sound field reproduction from a first audio input signal using a plurality of loudspeakers aiming at synthesizing a sound field within a preferred listening area in which none of the loudspeakers are located, said sound field being described as emanating from a virtual source. The method comprises steps of calculating positioning filter coefficients using virtual source description data and loudspeaker description data according to a sound field reproduction technique which is derived from a surface integral. The first audio input signal are modified using the positioning filter coefficients to form second audio input signals. Therefore, loudspeaker ranking data representing the importance of each loudspeaker for the synthesis of the sound field within the preferred listening area are calculated. Then, second audio input signals are modified according to the loudspeaker ranking data to form third audio input signals. Finally, loudspeakers are alimented with the third audio input signals and synthesize a sound field.
Furthermore the method may comprise steps wherein the loudspeaker ranking data are defined using the virtual source description data, loudspeaker description data and the listening area description data. And the method may also comprise steps
Moreover the invention comprises a device for sound field reproduction from a first audio input signal using a plurality of loudspeakers aiming at synthesizing a sound field described as emanating from a virtual source within a preferred listening area in which none of the loudspeakers are located. Said device comprises a positioning filters computation device for calculating a plurality of positioning filters using virtual source description data and loudspeaker description data, a sound field filtering device to compute second audio input signals from the first audio input signal using the positioning filters. Said device is characterized by a loudspeaker ranking computation device to compute loudspeaker ranking data representing the importance of each loudspeaker for the synthesis of the sound field within the preferred listening area, a listening area adaptation computation device to modify the second audio input signals according to the loudspeaker ranking and form third audio input signals that aliment the loudspeakers.
Furthermore said device may preferably comprise elements:
The invention will be described with more detail hereinafter with the aid of an example and with reference to the attached drawings, in which
In a first embodiment of the invention, the listening area is restricted to a limited area in which listeners are located (ex: a sofa). In this embodiment, a limited number of loudspeakers can be positioned for example in the frontal area in coherence with a projected image. According to the invention, the number of loudspeakers can be restricted compared to the “full room” listening area with the same quality (i.e. aliasing frequency). For example, in a Wave Field Synthesis reproduction system, this reduces the required hardware effort and cost. This embodiment is shown in
In a second embodiment of the invention listeners may be located at a limited number of pre-defined listening positions (ex: sofa, chair in front of a desk, . . . ). According to the invention, the listeners may create presets so as to optimize the sound rendering quality for these pre-defined locations. The presets can then be recalled directly by the listeners or by detecting the presence of the listener in one of the pre-defined zones.
In a third embodiment of the invention, the position of the listeners may be tracked so as to continuously optimize the sound rendering quality within the effective covered listening area.
A fourth embodiment of the invention is a sound field simulation environment. In this embodiment, the listening area is restricted to a very limited zone around the head of the listener where a physically correct sound field reconstruction is targeted over all or most of the audible frequency range (typically 20-20000 Hz or 100-10000 Hz). The usual approach for a physically correct sound reproduction is to use binaural sound reproduction over headphones as described by Jens Blauert in “Spatial hearing: The psychophysics of human sound localization”, revised edition, The MIT press, Cambridge, Mass., 1997. In practice, the said simulation approach with headphones using head-related transfer functions shows several drawbacks. The localization is disturbed by front-back confusions, out-of-head localization is limited and distance perception does not necessarily match the intended real image. The feeling of wearing a headphone reduces the feeling of being present into the virtual environment. In the past years, this method with headphones has been widely used since in theory it promises to reproduce physically correct ear input signals in order to create a spatial impression of sound. Practice has shown that the spatial impression provided by this method does not necessarily match the intended spatial sonic image and that strong differences in perception may occur from one listener to another due to mismatches of the used HRTFs in the signal processing to individual HRTFs of the listener. Such results have been published e.g. by H. Møller, M. F. Sørensen, C. B. Jensen, D. Hammershøi in “Binaural technique: Do we need individual recordings?”, J. Audio Eng. Soc., Vol. 44, No. 6, pp. 451-469, June 1996 as well as by H. Møller, D. Hammershøi, C. B. Jensen, M. F. Sørensen in “Evaluation of artificial heads in listening tests”, J. Audio Eng. Soc., Vol. 47, No. 3, pp. 83-100, March 1999.
Listener's head movements should also be recorded in order to update binaural sound reproduction such that the listener does not have the impression that the entire sound scene seems to follow her/him. However, the cost of commercially available head-tracking device is usually high and the update of headphone signals may also introduce artefacts. In contrast to this, by creating a physically correct sound field around the head of the listener, there is no need either for individual head related transfer function measurements or for complex compensation of head movements.
Using conventional sound field rendering techniques such as Wave Field Synthesis according to the state of the art, a loudspeaker spacing of about 2 cm would be required to reproduce a physically correct sound field within the required frequency range. This leads to an unpractical loudspeaker setup with very small loudspeakers which may be inefficient at low frequencies (typically below 200/300 Hz). According to the invention, a loudspeaker spacing of 12.5 cm may be sufficient (see center positions in
Applications of the invention are including but not limited to the following domains: hifi sound reproduction, home theatre, interior noise simulation for a car, interior noise simulation for an aircraft, sound reproduction for Virtual Reality, sound reproduction in the context of perceptual unimodal/crossmodal experiments. It should be clear for those skilled in the art that a plurality of virtual sources could be synthesized according to the invention corresponding to a plurality of first audio input signal.
Number | Date | Country | Kind |
07021162 | Oct 2007 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
PCT/EP2008/064500 | 10/27/2008 | WO | 00 | 7/26/2010 |
Publishing Document | Publishing Date | Country | Kind |
WO2009/056508 | 5/7/2009 | WO | A |
Number | Name | Date | Kind |
5233664 | Yanagawa et al. | Aug 1993 | A |
7580530 | Konagai et al. | Aug 2009 | B2 |
7936886 | Kim | May 2011 | B2 |
8160268 | Horbach | Apr 2012 | B2 |
20040223620 | Horbach et al. | Nov 2004 | A1 |
20040228498 | Sekine | Nov 2004 | A1 |
20050031129 | Devantier et al. | Feb 2005 | A1 |
20060269070 | Miura et al. | Nov 2006 | A1 |
20090010455 | Suzuki et al. | Jan 2009 | A1 |
20100177909 | Aarts et al. | Jul 2010 | A1 |
20110135124 | Steffens et al. | Jun 2011 | A1 |
Number | Date | Country | |
20100296678 A1 | Nov 2010 | US |