The present invention generally relates to a parametric spatial audio processing, and in particular to an apparatus and a method for generating a plurality of parametric audio streams and an apparatus and a method for generating a plurality of loudspeaker signals. Further embodiments of the present invention relate to a sector-based parametric spatial audio processing.
In multichannel listening, the listener is surrounded with multiple loudspeakers. A variety of known methods exist to capture audio for such setups. Let us first consider loudspeaker systems and the spatial impression that can be created with them. Without special techniques, common two-channel stereophonic setups can only create auditory events on the line connecting the loudspeakers. Sound emanating from other directions cannot be produced.
Logically, by using more loudspeakers around the listener, more directions can be covered and a more natural spatial impression can be created. The most well known multichannel loudspeaker system and layout is the 5.1 standard (“ITU-R 775-1”), which consists of five loudspeakers at azimuthal angles of 0°, 30° and 110° with respect to the listening position. Other systems with a varying number of loudspeakers located at different directions are also known.
In the art, several different recording methods have been designed for the previously mentioned loudspeaker systems, in order to reproduce the spatial impression in the listening situation as it would be perceived in the recording environment. The ideal way to record spatial sound for a chosen multichannel loudspeaker system would be to use the same number of microphones as there are loudspeakers. In such a case, the directivity patterns of the microphones should also correspond to the loudspeaker layout such that sound from any single direction would only be recorded with one, two, or three microphones. The more loudspeakers are used, the narrower directivity patterns are thus needed. However, such narrow directional microphones are relatively expensive, and have typically a non-flat frequency response, which is not desired. Furthermore, using several microphones with too broad directivity patterns as input to multichannel reproduction results in a colored and blurred auditory perception, due to the fact that sound emanating from a single direction is usually reproduced with more loudspeakers than is useful. Hence, current microphones are best suited for two-channel recording and reproduction without the goal of a surrounding spatial impression.
Another known approach to spatial sound recording is to record a large number of microphones which are distributed over a wide spatial area. For example, when recording an orchestra on a stage, the single instruments can be picked up by so-called spot microphones, which are positioned closely to the sound sources. The spatial distribution of the frontal sound stage can, for example, be captured by conventional stereo microphones. The sound field components corresponding to the late reverberation can be captured by several microphones placed at a relatively far distance to the stage. A sound engineer can then mix the desired multichannel output by using a combination of all microphone channels available. However, this recording technique implies a very large recording setup and hand crafted mixing of the recorded channels, which is not always feasible in practice.
Conventional systems for the recording and reproduction of spatial audio based on directional audio coding (DirAC), as described in T. Lokki, J. Merimaa, V. Pulkki: Method for Reproducing Natural or Modified Spatial Impression in Multichannel Listening, U.S. Pat. No. 7,787,638 B2, Aug. 31, 2010 and V. Pulkki: Spatial Sound Reproduction with Directional Audio Coding. J. Audio Eng. Soc., Vol. 55, No. 6, pp. 503-516, 2007, rely on a simple global model for the sound field. Therefore, they suffer from some systematic drawbacks, which limits the achievable sound quality and experience in practice.
A general problem of known solutions is that they are relatively complex and typically associated with a degradation of the spatial sound quality.
According to an embodiment, an apparatus for generating a plurality of parametric audio streams from an input spatial audio signal acquired from a recording in a recording space may have: a segmentor for generating at least two input segmental audio signals from the input spatial audio signal; wherein the segmentor is configured to generate the at least two input segmental audio signals depending on corresponding segments of the recording space, wherein the segments of the recording space each represent a subset of directions within a two-dimensional plane or within a three-dimensional space, and wherein the segments are different from each other; and a generator for generating a parametric audio stream for each of the at least two input segmental audio signals to acquire the plurality of parametric audio streams, so that the plurality of parametric audio streams each include a component of the at least two input segmental audio signals and a corresponding parametric spatial information, wherein the parametric spatial information of each of the parametric audio steams includes direction-of-arrival parameter and/or a diffuseness parameter.
According to another embodiment, an apparatus for generating a plurality of loudspeaker signals from a plurality of parametric audio streams; wherein each of the plurality of parametric audio streams includes a segmental audio component and a corresponding parametric spatial information; wherein the parametric spatial information of each of the parametric audio steams includes a direction-of-arrival parameter and/or a diffuseness parameter; may have: a renderer for providing a plurality of input segmental loudspeaker signals from the plurality of parametric audio streams, so that the input segmental loudspeaker signals depend on corresponding segments of a recording space, wherein the segments of the recording space each represent a subset of directions within a two-dimensional plane or within a three-dimensional space, and wherein the segments are different from each other; wherein the renderer is configured for rendering each of the segmental audio components using the corresponding parametric spatial information to acquire the plurality of input segmental loudspeaker signals; and a combiner for combining the input segmental loudspeaker signals to acquire the plurality of loudspeaker signals.
According to another embodiment, a method for generating a plurality of parametric audio streams from an input spatial audio signal acquired from a recording in a recording space may have the steps of: generating at least two input segmental audio signals from the input spatial audio signal; wherein generating the at least two input segmental audio signals is conducted depending on corresponding segments of the recording space, wherein the segments of the recording space each represent a subset of directions within a two-dimensional plane or within a three-dimensional space, and wherein the segments are different from each other; generating a parametric audio stream for each of the at least two input segmental audio signals to acquire the plurality of parametric audio streams, so that the plurality of parametric audio streams each include a component of the at least two input segmental audio signals and a corresponding parametric spatial information, wherein the parametric spatial information of each of the parametric audio steams includes direction-of-arrival parameter and/or a diffuseness parameter.
According to another embodiment, a method for generating a plurality of loudspeaker signals from a plurality of parametric audio streams; wherein each of the plurality of parametric audio streams includes a segmental audio component and a corresponding parametric spatial information; wherein the parametric spatial information of each of the parametric audio steams includes a direction-of-arrival parameter and/or a diffuseness parameter; may have the steps of: providing a plurality of input segmental loudspeaker signals from the plurality of parametric audio streams, so that the input segmental loudspeaker signals depend on corresponding segments of a recording space, wherein the segments of the recording space each represent a subset of directions within a two-dimensional plane or within a three-dimensional space, and wherein the segments are different from each other; wherein providing the plurality of input segmental loudspeaker signals is conducted by rendering each of the segmental audio components using the corresponding parametric spatial information to acquire the plurality of input segmental loudspeaker signals; and combining the input segmental loudspeaker signals to acquire the plurality of loudspeaker signals.
According to another embodiment, a computer program including a program code for performing the method according to claim 11 when the computer program is executed on a computer.
According to another embodiment, a computer program including a program code for performing the method according to claim 12 when the computer program is executed on a computer.
The basic idea underlying the present invention is that the improved parametric spatial audio processing can be achieved if at least two input segmental audio signals are provided from the input spatial audio signal, wherein the at least two input segmental audio signals are associated with corresponding segments of the recording space, and if a parametric audio stream is generated for each of the at least two input segmental audio signals to obtain the plurality of parametric audio streams. This allows to achieve the higher quality, more realistic spatial sound recording and reproduction using relatively simple and compact microphone configurations.
According to a further embodiment, the segmentor is configured to use a directivity pattern for each of the segments of the recording space. Here, the directivity pattern indicates a directivity of the at least two input segmental audio signals. By the use of the directivity patterns, it is possible to obtain a better model match of the observed sound field, especially in complex sound scenes.
According to a further embodiment, the generator is configured for obtaining the plurality of parametric audio streams, wherein the plurality of parametric audio streams each comprise a component of the at least two input segmental audio signals and a corresponding parametric spatial information. For example, the parametric spatial information of each of the parametric audio streams comprises a direction-of-arrival (DOA) parameter and/or a diffuseness parameter. By providing the DOA parameters and/or the diffuseness parameters, it is possible to describe the observed sound field in a parametric signal representation domain.
According to a further embodiment, an apparatus for generating a plurality of loudspeaker signals from a plurality of parametric audio streams derived from an input spatial audio signal recorded in a recording space comprises a renderer and a combiner. The renderer is configured for providing a plurality of input segmental loudspeaker signals from the plurality of parametric audio streams. Here, the input segmental loudspeaker signals are associated with corresponding segments of the recording space. The combiner is configured for combining the input segmental loudspeaker signals to obtain the plurality of loudspeaker signals.
Further embodiments of the present invention provide methods for generating a plurality of parametric audio streams and for generating a plurality of loudspeaker signals.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
Before discussing the present invention in further detail using the drawings, it is pointed out that in the figures identical elements, elements having the same function or the same effect are provided with the same reference numerals so that the description of these elements and the functionality thereof illustrated in the different embodiments is mutually exchangeable or may be applied to one another in the different embodiments.
By the apparatus 100 for generating the plurality of parametric audio streams 125, it is possible to avoid a degradation of the spatial sound quality and to avoid relatively complex microphone configurations. Accordingly, the embodiment of the apparatus 100 in accordance with
In embodiments, the segments Segi of the recording space each represent a subset of directions within a two-dimensional (2D) plane or within a three-dimensional (3D) space.
In embodiments, the segments Segi of the recording space each are characterized by an associated directional measure.
According to embodiments, the apparatus 100 is configured for performing a sound field recording to obtain the input spatial audio signal 105. For example, the segmentor 110 is configured to divide a full angle range of interest into the segments Segi of the recording space. Furthermore, the segments Segi of the recording space may each cover a reduced angle range compared to the full angle range of interest.
In embodiments, the directivity pattern 305, qi(θ), is given by
q
i(θ)=a+b cos(θ+Θi) (1)
where a and b denote multipliers that can be modified to obtain desired directivity patterns and wherein θ denotes an azimuthal angle and Θi indicates an advantageous direction of the i'th segment of the recording space. For example, a lies in a range of 0 to 1 and b in a range of −1 to 1.
One useful choice of multipliers a, b may be a=0.5 and b=0.5, resulting in the following directivity pattern:
q
i(θ)=0.5+0.5 cos(θ+Θi) (1a)
By the segmentor 110 exemplarily depicted in
In embodiments, the generator 120 may be configured for performing a parametric spatial analysis for each of the at least two input segmental audio signals 115 (Wi, Xi, Yi, Zi) to obtain the corresponding parametric spatial information θi, Ψi.
In embodiments, the parametric spatial information θi, Ψi of each of the parametric audio streams 125 (θi, Ψi, Wi) comprises a direction-of-arrival (DOA) parameter θi and/or a diffuseness parameter Ψi.
In embodiments, the direction-of-arrival (DOA) parameter θi and the diffuseness parameter Ψi provided by the generator 120 exemplarily depicted in
By providing the apparatus 500 of
In embodiments, the renderer 510 is configured for receiving the plurality of parametric audio streams 125 (θi, Ψi, Wi). For example, the plurality of parametric audio streams 125 (θi, Ψi, Wi) each comprise a segmental audio component Wi and a corresponding parametric spatial information θi, Ψi. Furthermore, the renderer 510 may be configured for rendering each of the segmental audio components Wi using the corresponding parametric spatial information 505 (θi, Ψi) to obtain the plurality of input segmental loudspeaker signals 515.
In embodiments, the segmentor 110 exemplarily shown in
The embodiment of
The example loudspeaker signal computation schematically illustrated in
As also shown in the schematic illustration 800 of
In embodiments, the vector base amplitude panning (VBAP) operation by blocks 822, 824 of the first and the second rendering unit 730-1, 730-2 depends on the corresponding direction-of-arrival (DOA) parameters θi. As exemplarily depicted in
In the schematic illustration 900 of
For example, the apparatus 100 may further comprise a modifier 910 for modifying the plurality of parametric audio streams 125 (θi, Ψi, Wi) in a parametric signal representation domain. Furthermore, the modifier 910 may be configured to modify at least one of the parametric audio streams 125 (θi, Ψi, Wi) using a corresponding modification control parameter 905. In this way, a first modified parametric audio stream 916 of a first segment and a second modified parametric audio stream 918 of a second segment may be obtained. The first and the second modified parametric audio streams 916, 918 may constitute a plurality of modified parametric audio streams 915. In embodiments, the apparatus 100 may be configured for transmitting the plurality of modified parametric audio streams 915. In addition, the apparatus 500 may be configured for receiving the plurality of modified parametric audio streams 915 transmitted from the apparatus 100.
By providing the example loudspeaker signal computation according to
In this context, it should be noted that
In the previous embodiments, the apparatus 100 and the apparatus 500 may be configured to be operative in the time-frequency domain.
In summary, embodiments of the present invention relate to the field of high quality spatial audio recording and reproduction. The use of a segment-based or sector-based parametric model of the sound field allows to also record complex spatial audio scenes with relatively compact microphone configurations. In contrast to a simple global model of the sound field assumed by the current state of the art methods, the parametric information can be determined for a number of segments in which the entire observation space is divided. Therefore, the rendering for an almost arbitrary loudspeaker configuration can be performed based on the parametric information together with the recorded audio channels.
According to embodiments, for a planar two-dimensional (2D) sound field recording, the entire azimuthal angle range of interest can be divided into multiple sectors or segments covering a reduced range of azimuthal angles. Analogously, in the 3D case the full solid angle range (azimuthal and elevation) can be divided into sectors or segments covering a smaller angle range. The different sectors or segments may also partially overlap.
According to embodiments, each sector or segment is characterized by an associated directional measure, which can be used to specify or refer to the corresponding sector or segment. The directional measure can, for example, be a vector pointing to (or from) the center of the sector or segment, or an azimuthal angle in the 2D case, or a set of an azimuth and an elevation angle in the 3D case. The segment or sector can be referred to as both a subset of directions within a 2D plane or within a 3D space. For presentational simplicity, the previous examples were exemplarily described for the 2D case; however the extension to 3D configurations is straightforward.
With reference to
Referring to the embodiment of
According to embodiments, for each sector, a DOA parameter (θi) can be determined together with a sector-based diffuseness parameter (Ψi). In a simple realization, the diffuseness parameter (Ψi) may be the same for all sectors. In principle, any advantageous DOA estimation algorithm can be applied (e.g. by the generator 120). For example, the DOA parameter (θi) can be interpreted to reflect the opposite direction in which most of the sound energy is traveling within the considered sector. Accordingly, the sector-based diffuseness relates to the ratio of the diffuse sound energy and the total sound energy within the considered sector. It is to be noted that the parameter estimation (such as performed with the generator 120) can be performed time-variantly and individually for each frequency band.
According to embodiments, for each sector, a directional audio stream (parametric audio stream) can be composed including the segmental microphone signal (Wi) and the sector-based DOA and diffuseness parameters (θi, Ψi) which predominantly describe the spatial audio properties of the sound field within the angular range represented by that sector. For example, the loudspeaker signals 525 for playback can be determined using the parametric directional information (θi, Ψi) and one or more of the segmental microphone signals 125 (e.g. Wi). Thereby, a set of segmental loudspeaker signals 515 can be determined for each segment which can then be combined such as by the combiner 520 (e.g. summed up or mixed) to build the final loudspeaker signals 525 for playback. The direct sound components within a sector can, for example, be rendered as point-like sources by applying an example vector base amplitude panning (as described in V. Pulkki: Virtual sound source positioning using Vector Base Amplitude Panning J. Audio Eng. Soc., Vol. 45, pp. 456-466, 1997), whereas the diffuse sound can be played back from several loudspeakers at the same time.
The block diagram in
In embodiments, the segmentor 110 may be configured for performing the generation of the segmental microphone signals 115 from a set of microphone input signals 105. Furthermore, the generator 120 may be configured for performing the application of the parametric spatial signal analysis for each sector such that the parametric audio streams 725-1, 725-2 for each sector will be obtained. For example, each of the parametric audio streams 725-1, 725-2 may consist of at least one segmental audio signal (e.g. W1, W2, respectively) as well as associated parametric information (e.g. DOA parameters θ1, θ2 and diffuseness parameters Ψ1, Ψ2, respectively). The renderer 510 may be configured for performing the generation of the segmental loudspeaker signals 515 for each sector based on the parametric audio streams 725-1, 725-2 generated for the particular sectors. The combiner 520 may be configured for performing the combining of the segmental loudspeaker signals 515 to obtain the final loudspeaker signals 525.
The block diagram in
In
As an example last step, the segmental loudspeaker signals 515 can be combined (e.g. by block 520) to obtain the final output signals 525 for loudspeaker reproduction.
Referring to the embodiment of
An embodiment of a sector-based parameter estimation in the example 2D case performed with the previous embodiments will be described in the following. It is assumed that the microphone signals used for capturing can be converted into so-called second-order B-format signals. Second-order B-format signals can be described by the shape of the directivity patterns of the corresponding microphones:
b
W(θ)=1 (2)
b
X(θ)=cos(θ) (3)
b
Y(θ)=sin(θ) (4)
b
U(θ)=cos(2θ) (5)
b
Y(θ)=sin(2θ) (6)
where θ denotes the azimuth angle. The corresponding B-format signals (e.g. input 105 of
b
W
(θ)=qi(θ) (7)
b
X
(θ)=qi(θ)cos(θ) (8)
b
Y
(θ)=qi(θ)sin(θ) (9)
Some examples for the directivity patterns of the described microphone signals in case of an example cardioid pattern qi(θ)=0.5+0.5 cos(θ+Θi) are shown in
Note that for the example case of Θi=0, the signals Wi(m, k), Xi(m, k), Yi(m, k) can be determined from the second-order B-format signals by mixing the input components W, X, Y, U, V according to
W
i(m,k)=0.5W(m,k)+0.5X(m,k) (10)
X
i(m,k)=0.25W(m,k)+0.5X(m,k)+0.25U(m,k) (11)
Y
i(m,k)=0.5Y(m,k)+0.25V(m,k) (12)
This mixing operation is performed e.g. in
From the segmental microphone signals 115, Wi(m, k), Xi(m, k), Yi(m, k), we can then determine (e.g. by block 120) the DOA parameter θi associated with the i'th sector by computing the sector-based active intensity vector
where Re {A} denotes the real part of the complex number A and * denotes complex conjugate. Furthermore, ρ0 is the air density and c is the sound velocity. The desired DOA estimate θi(m, k), for example represented by the unit vector ei(m, k), can be obtained by
We can further determine the sector-based, sound field energy related quantity
The desired diffuseness parameter Ψi(m, k) of the i'th sector can then be determined by
where g denotes a suitable scaling factor, E{ } is the expectation operator and ∥ ∥ denotes the vector norm. It can be shown that the diffuseness parameter Ψi(m, k) is zero if only a plane wave is present and takes a positive value smaller than or equal to one in the case of purely diffuse sound fields. In general, an alternative mapping function can be defined for the diffuseness which exhibits a similar behavior, i.e. giving 0 for direct sound only, and approaching 1 for a completely diffuse sound field.
Referring to the embodiment of
Let γm denote the generalized m-th order microphone signal, defined by the directivity patterns
γm(cos)pattern: cos(mθ)
γm(sin)pattern: sin(mθ) (17)
where θ denotes an azimuth angle so that
X=γ
1
(cos)
Y=γ
1
(sin)
U=γ
2
(cos)
V=γ
2
(sin) (18)
Then, it can be proven that
where j is the imaginary unit, k is the wave number, r and φ are the radius and the azimuth angle defining a polar coordinate system, Jm(·) is the m-order Bessel function of the first kind, and m are the coefficients of the Fourier series of the pressure signal measured on the polar coordinates (r, φ).
Note that care has to be taken in the array design and implementation of the calculation of the (higher order) B-format signals to avoid excessive noise amplification due to the numerical properties of the Bessel function.
Mathematical background and derivations related to the described signal transformation can be found, e.g. in A. Kuntz, Wave field analysis using virtual circular microphone arrays, Dr. Hut, 2009, ISBN: 978-3-86853-006-3.
Further embodiments of the present invention relate to a method for generating a plurality of parametric audio streams 125 (θi, Ψi, Wi) from an input spatial audio signal 105 obtained from a recording in a recording space. For example, the input spatial audio signal 105 comprises an omnidirectional signal W and a plurality of different directional signals X, Y, Z, U, V. The method comprises providing at least two input segmental audio signals 115 (Wi, Xi, Yi, Zi) from the input spatial audio signal 105 (e.g. the omnidirectional signal W and the plurality of different directional signals X, Y, Z, U, V), wherein the at least two input segmental audio signals 115 (Wi, Xi, Yi, Zi) are associated with corresponding segments Segi of the recording space. Furthermore, the method comprises generating a parametric audio stream for each of the at least two input segmental audio signals 115 (Wi, Xi, Yi, Zi) to obtain the plurality of parametric audio streams 125 (θi, Ψi, Wi).
Further embodiments of the present invention relate to a method for generating a plurality of loudspeaker signals 525 (L1, L2, . . . ) from a plurality of parametric audio streams 125 (θi, Ψi, Wi) derived from an input spatial audio signal 105 recorded in a recording space. The method comprises providing a plurality of input segmental loudspeaker signals 515 from the plurality of parametric audio streams 125 (θi, Ψi, Wi), wherein the input segmental loudspeaker signals 515 are associated with corresponding segments Segi of the recording space. Furthermore, the method comprises combining the input segmental loudspeaker signals 515 to obtain the plurality of loudspeaker signals 525 (L1, L2, . . . ).
Although the present invention has been described in the context of block diagrams where the blocks represent actual or logical hardware components, the present invention can also be implemented by a computer-implemented method. In the latter case, the blocks represent corresponding method steps where these steps stand for the functionalities performed by corresponding logical or physical hardware blocks.
The described embodiments are merely illustrative for the principles of the present invention. It is understood that modifications and variations of the arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the appending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus like, for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
The parametric audio streams 125 (θi, Ψi, Wi) can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the internet.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signal stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive method is therefore a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary.
A further embodiment of the inventive method is therefore a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example via the internet.
A further embodiment comprises a processing means, for example a computer or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may operate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
Embodiments of the present invention provide a high quality, realistic spatial sound recording and reproduction using simple and compact microphone configurations.
Embodiments of the present invention are based on directional audio coding (DirAC) (as described in T. Lokki, J. Merimaa, V. Pulkki: Method for Reproducing Natural or Modified Spatial Impression in Multichannel Listening, U.S. Pat. No. 7,787,638 B2, Aug. 31, 2010 and V. Pulkki: Spatial Sound Reproduction with Directional Audio Coding. J. Audio Eng. Soc., Vol. 55, No. 6, pp. 503-516, 2007), which can be used with different microphone systems, and with arbitrary loudspeaker setups. The benefit of the DirAC is to reproduce the spatial impression of an existing acoustical environment as precisely as possible using a multichannel loudspeaker system. Within the chosen environment, responses (continuous sound or impulse responses) can be measured with an omnidirectional microphone (Wi) and with a set of microphones that enables measuring the direction-of-arrival (DOA) of sound and the diffuseness of sound. A possible method is to apply three figure-of-eight microphones (X, Y, Z) aligned with the corresponding Cartesian coordinate axis. A way to do this is to use a “SoundField” microphone, which directly yields all the desired responses. It is interesting to note that the signal of the omnidirectional microphone represents the sound pressure, whereas the dipole signals are proportionate to the corresponding elements of the particle velocity vector.
Form these signals, the DirAC parameters, i.e. DOA of sound and the diffuseness of the observed sound field can be measured in a suitable time/frequency raster with a resolution corresponding to that of the human auditory system. The actual loudspeaker signals can then be determined from the omnidirectional microphone signal based on the DirAC parameters (as described in V. Pulkki: Spatial Sound Reproduction with Directional Audio Coding. J. Audio Eng. Soc., Vol. 55, No. 6, pp. 503-516, 2007). Direct sound components can be played back by only a small number of loudspeakers (e.g. one or two) using panning techniques, whereas diffuse sound components can be played back from all loudspeakers at the same time.
Embodiments of the present invention based on DirAC represent a simple approach to spatial sound recording with compact microphone configurations. In particular, the present invention prevents some systematic drawbacks which limit the achievable sound quality and experience in practice in conventional technology.
In contrast to conventional DirAC, embodiments of the present invention provide a higher quality parametric spatial audio processing. Conventional DirAC relies on a simple global model for the sound field, employing only one DOA and one diffuseness parameter for the entire observation space. It is based on the assumption that the sound field can be represented by only one single direct sound component, such as a plane wave, and one global diffuseness parameter for each time/frequency tile. It turns out in practice, however, that often this simplified assumption about the sound field does not hold. This is especially true in complex, real world acoustics, e.g. where multiple sound sources such as talkers or instruments are active at the same time. On the other hand, embodiments of the present invention do not result in a model mismatch of the observed sound field, and the corresponding parameter estimates are more correct. It can also be prevented that a model mismatch results, especially in cases where direct sound components are rendered diffusely and no direction can be perceived when listening to the loudspeaker outputs. In embodiments, decorrelators can be used for generating uncorrelated diffuse sound played back from all loudspeakers (as described in V. Pulkki: Spatial Sound Reproduction with Directional Audio Coding. J. Audio Eng. Soc., Vol. 55, No. 6, pp. 503-516, 2007). In contrast to conventional technology, where decorrelators often introduce an undesired added room effect, it is possible with the present invention to more correctly reproduce sound sources which have a certain spatial extent (as opposed to the case of using the simple sound field model of DirAC which is not capable of precisely capturing such sound sources).
Embodiments of the present invention provide a higher number of degrees of freedom in the assumed signal model, allowing for a better model match in complex sound scenes.
Furthermore, in case of using directional microphones to generate sectors (or any other time-invariant linear, e.g. physical, means), an increased inherent directivity of microphones can be obtained. Therefore, there is less need for applying time-variant gains to avoid vague directions, crosstalk, and coloration. This leads to less nonlinear processing in the audio signal path, resulting in higher quality.
In general, more direct sound components can be rendered as direct sound sources (point sources/plane wave sources). As a consequence, less decorrelation artifacts occur, more (correctly) localizable events are perceivable, and a more exact spatial reproduction is achievable.
Embodiments of the present invention provide an increased performance of a manipulation in the parametric domain, e. g. directional filtering (as described in M. Kallinger, H. Ochsenfeld, G. Del Galdo, F. Kuech, D. Mahne, R. Schultz-Amling, and O. Thiergart: A Spatial Filtering Approach for Directional Audio Coding, 126th AES Convention, Paper 7653, Munich, Germany, 2009), compared to the simple global model, since a larger fraction of the total signal energy is attributed to direct sound events with a correct DOA associated to it, and a larger amount of information is available. The provision of more (parametric) information allows, for example, to separate multiple direct sound components or also direct sound components from early reflections impinging from different directions.
Specifically, embodiments provide the following features. In the 2D case, the full azimuthal angle range can be split into sectors covering reduced azimuthal angle ranges. In the 3D case, the full solid angle range can be split into sectors covering reduced solid angle ranges. Each sector can be associated with an advantageous angle range. For each sector, segmental microphone signals can be determined from the received microphone signals, which predominantly consist of sound arriving from directions that are assigned to/covered by the particular sector. These microphone signals may also be determined artificially by simulated virtual recordings. For each sector, a parametric sound field analysis can be performed to determine directional parameters such as DOA and diffuseness. For each sector, the parametric directional information (DOA and diffuseness) predominantly describes the spatial properties of the angular range of the sound field that is associated to the particular sector. In case of playback, for each sector, loudspeaker signals can be determined based on the directional parameters and the segmental microphone signals. The overall output is then obtained by combining the outputs of all sectors. In case of manipulation, before computing the loudspeaker signals for playback, the estimated parameters and/or segmental audio signals may also be modified to achieve a manipulation of the sound scene.
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
13159421.0 | Mar 2013 | EP | regional |
This application is a continuation of copending International Application No. PCT/EP2013/073574, filed Nov. 12, 2013, which is incorporated herein by reference in its entirety, and additionally claims priority from U.S. Application No. 61/726,887, filed Nov. 15, 2012, and European Application No. 13159421.0, filed Mar. 15, 2013, both of which are also incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
61726887 | Nov 2012 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2013/073574 | Nov 2013 | US |
Child | 14712576 | US |