The disclosure relates to a system and method for generating a sound wave field.
Two-dimensional or three-dimensional audio may be realized using a sound field description with a technique called Higher-Order Ambisonics. Ambisonics is a full-sphere surround sound technique which may cover, in addition to the horizontal plane, sound sources above and below the listener. Unlike other multichannel surround formats, its transmission channels do not carry loudspeaker signals. Instead, they contain a loudspeaker-independent representation of a sound field, which is then decoded to the listener's loudspeaker setup. This extra step allows a music producer to think in terms of source directions rather than loudspeaker positions, and offers the listener a considerable degree of flexibility as to the layout and number of loudspeakers used for playback. Ambisonics can be understood as a three-dimensional extension of mid/side (M/S) stereo, adding additional difference channels for height and depth. In terms of First-Order Ambisonics, the resulting signal set is called B-format. The spatial resolution of First-Order Ambisonics is quite low. In practice, that translates to slightly blurry sources, but also to a comparably small usable listening area or sweet spot.
The resolution can be increased and the sweet spot enlarged by adding groups of more selective directional components to the B-format. In terms of Second-Order Ambisonics these no longer correspond to conventional microphone polar patterns, but look like, e.g., clover leaves. The resulting signal set is then called Second-, Third-, or collectively, Higher-Order Ambisonics (HOA). However, common applications of the HOA technique require, dependent on whether a two-dimensional (2D) or three-dimensional (3D) wave field is processed, specific spatial configurations, notwithstanding whether the wave field is measured or reproduced: Processing of 2D wave fields requires cylindrical configurations and processing of 3D wave fields requires spherical configurations, each with a regular distribution of the microphones or loudspeakers.
An audio system, which is configured to generate a sound wave field around a listening position in a target room, includes a multiplicity of loudspeakers distributed in the target room in an arbitrary fashion. The system further includes at least one modal beamformer module connected upstream of the multiplicity of loudspeakers and downstream of at least one input signal path that receives at least one Ambisonic input signal. The at least one modal beamformer module includes a matrixing module that includes a multiple-input multiple-output filter module.
An audio reproduction method, which is configured to generate a sound wave field around a listening position in a target room, includes generating sound signals to be reproduced at a multiplicity of positions that are distributed in the target room in an arbitrary fashion. The method further includes processing an Ambisonic input signal according to a modal beamforming algorithm to provide the sound signals. The modal beamforming algorithm includes matrixing according to a multiple-input multiple-output filter algorithm.
A computer program product is configured to cause a processor to execute an audio reproduction method to generate a sound wave field around a listening position in a target room. The method includes generating sound signals to be reproduced at a multiplicity of positions that are distributed in the target room in an arbitrary fashion. The method further includes processing an Ambisonic input signal according to a modal beamforming algorithm to provide the sound signals. The modal beamforming algorithm includes matrixing according to a multiple-input multiple-output filtering algorithm.
Other systems, methods, features and advantages will be, or will become, apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the following claims.
The system and methods may be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like referenced numerals designate corresponding parts throughout the different views.
The HOA technique may enhance the performance of common audio systems such as home audio systems. However, as mentioned above, the HOA technique requires, dependent on whether a two-dimensional (2D) or three-dimensional (3D) wave field is processed, specific spatial configurations of the microphones or loudspeakers, notwithstanding whether the wave field is measured (decoded) or reproduced (encoded). Accordingly, processing of 2D wave fields requires cylindrical configurations and processing of 3D wave fields requires spherical configurations, each with a regular distribution of the microphones or loudspeakers. This reduces the versatility of audio systems employing HOA significantly. Versatile audio systems described herein utilize a Multiple-Input-Multiple-Output (MIMO) filtering system/method (also referred to as MIMO system) in combination with a Higher-Order-Ambisonic (HOA) system/method to approximate a desired sound field with an arbitrary loudspeaker arrangement as for, e.g., home applications. Thus, the acoustical performance of already existing home stereo systems can be enhanced by using an advanced signal processing framework, based on a combination of MIMO and HOA techniques adapted to arbitrary wave fields in a versatile way.
Multiple-input multiple-output (MIMO technology in room acoustic employs multiple transmitters at one end and multiple receivers at another end. In simple audio systems, a single transducer (e.g., microphone, loudspeaker) is used at the source, and another single transducer (e.g., loudspeaker, microphone) is used at the destination. In some cases, this gives rise to problems with multipath effects. When an acoustic wave field is met with obstructions such as walls, windows, doors and furniture, the wave fronts are scattered, and therefore take many paths to reach the destination. The late arrival of scattered portions of sound causes problems such as echoes, reverberations, cancellations, and intermittent reception. The use of two or more transducers, along with the transmission of multiple signals (one for each transducer) at the source and the destination, eliminates the backlog caused by multipath wave propagation. It has been found that, when combined with HOA in a specific way, the MIMO algorithm provides for directivity in form of spherical harmonics, particularly their basic functions, and allows for overcoming the restrictions set by the HOA algorithm, such as the limited form of the basic transducer configuration and the necessity of regularly distributed transducers, without reducing the benefits of the HOA algorithm.
By way of the MELMS algorithm, which may be implemented in a MELMS processing module 106, a filter matrix W(z), which is implemented by an equalizing filter module 103, is controlled to change the original input signal x(n) such that the resulting Q output signals, which are supplied to Q loudspeakers and which are filtered by a filter module 104 with a secondary path filter matrix S(z), match the desired signals d(n). Accordingly, the MELMS algorithm evaluates the input signal x(n) filtered with a secondary pass filter matrix Ŝ(z), which is implemented in a filter module 102 and outputs Q×K filtered input signals, and K error signals e(n). The error signals e(n) are provided by a subtractor module 105, which subtracts K microphone signals y′(n) from the K desired signals d(n). The K recording channels with K microphone signals y′(n) are the Q output channels with Q loudspeaker signals y(n) filtered with the secondary path filter matrix S(z), which is implemented in filter module 104, representing the acoustical scene. Modules and paths are understood to be at least one of hardware, software and/or acoustical paths.
The MELMS algorithm is an iterative algorithm to obtain the optimum least mean square (LMS) solution. The adaptive approach of the MELMS algorithm allows for in situ design of filters and also provides a convenient method for readjusting the filters whenever a change occurs in the electro-acoustic transfer functions. The MELMS algorithm employs the steepest descent approach to search for the minimum of the performance index. This is achieved by successively updating filter coefficients by an amount proportional to the negative of gradient ∇(n), according to which w(n+1)=w(n)+μ(−∇(n)), where μ is the step size that controls the convergence speed and the final maladjustment. An approximation may be in such LMS algorithms so as to update the vector w using the instantaneous value of the gradient ∇(n) instead of its expected value, leading to the LMS algorithm.
Referring to
Simple Ambisonic panning (or encoding) takes a source signal s and two parameters, the horizontal angle θ and the elevation angle φ. It positions the source at the desired angle by distributing the signal over the Ambisonic components with different gains for the corresponding Ambisonic signals W (Y0,0+1(θ, φ)), X (Y1,1+1(θ, φ)), Y (Y1,1−1(θ, φ)) and Z (Y1,0+1(θ, φ)):
x=s·cos θ·cos φ,
y=s·sin θ·cos φ, and
z=s·sin φ.
Being omnidirectional, the W channel always delivers the same signal, regardless of the listening angle. Thus it has more-or-less the same average energy as the other channels. W is attenuated by w, i.e., by about 3 dB (precisely, divided by the square root of two). The terms for X, Y, Z actually produce polar patterns of figure-of-eight. Taking their desired weighting values at angles θ and φ (x, y, z), and multiplying the result with the corresponding Ambisonic signals (X, Y, Z), the output sums lead to a figure-of-eight radiation pattern pointing now to the desired direction, given by the azimuth θ and elevation φ, utilized in the calculation of the weighting values x, y and z, having an energy content that copes with the W component, weighted by w.
The B-format components can be combined to derive virtual radiation patterns that cope with any first-order polar pattern (omnidirectional, cardioid, hypercardioid, figure-of-eight or anything in between) pointing in any three-dimensional direction. Several such beam patterns with different parameters can be derived at the same time to create coincident stereo pairs or surround arrays.
Simple Ambisonic decoding similarly uses a set of virtual microphones. For perfectly regular layouts, a simplified decoder can be generated by pointing a virtual cardioid microphone in the direction of each loudspeaker. Here is a square:
LF=(2W+X+Y)·√{square root over (8)},
LB=(2W−X+Y)·√{square root over (8)},
RB=(2W−X−Y)·√{square root over (8)}, and
RF=(2W+X−Y)·√{square root over (8)}.
The signs of the X and Y components are the essential part, the rest are gain factors. The Z component is discarded in the present exemplary case because it is not possible to reproduce height cues with just four loudspeakers in one plane. Beyond the theory outlined above, a real Ambisonic decoder may include a number of psycho-acoustic optimizations.
The spatial resolution of the exemplary first-order Ambisonics described above is quite low. In practice, that translates to slightly blurry sources, but also to a comparably small usable listening area or sweet spot. The resolution can be increased and the sweet spot enlarged by adding groups of more selective directional components to the B-format. The resulting signal set is then called Second-, Third-, or collectively, Higher-Order Ambisonics. For a given order , full-sphere systems require (
+1)2 signal components, and 2
+1 components are needed for horizontal-only reproduction.
In the target room 313, further loudspeakers, e.g., a rear left (Ls) loudspeaker 315, a sub-woofer (Sub) loudspeaker 316, and a center (C) loudspeaker 317 may be installed. The target room 313 is acoustically very unfavorable as it includes a window 318 and a French door 319 in a left wall and a door 320 in the right wall in an unbalanced configuration. Furthermore, a sofa 321 is disposed at the left wall and extends approximately to the center of the target room 313 and a table 322 is arranged in front of the sofa 321. A television set 323 is arranged at the front wall and in line of sight of the sofa 321. The front left (Lf) loudspeaker 310 and the front right (Rf) loudspeaker 311 are arranged on both sides of the television set 323 and the center (C) loudspeaker 317 is arranged below the television set 323. The sub-woofer (Sub) loudspeaker 316 is disposed in the corner between the front wall and the left wall. The loudspeaker arrangement on the rear wall including the rear left (Ls) loudspeaker 315 and the rear right (Ls) loudspeaker 312 do not share the same center line as the loudspeaker arrangement on the front wall including the front left (Lf) loudspeaker 310, the front right (Ls) loudspeaker 311, and center (C) loudspeaker 317. An exemplary sweet spot 324 is on the sofa 321 with the table 322 and the television set 323 in front. As can be seen, the loudspeaker setup shown in
Modifications of the wave field can be made in a manner that can be seen from the following example in which a rotational element is introduced while decoding: P(r, ω)=S(jω)(Σm=0∞jmjm(kr)Σ0≤n≤n,σ=±1Bm,nσYm,nσ(θ, φ)Ym,nσ(θDes, φDes)), wherein Ym,nσ(θDes, φDes) are modal weighting coefficients that turn the spherical harmonics in the desired direction (θDes, φDes), Bm,nσ are the Ambisonic coefficients (weighting coefficients of the Nth spherical harmonic), Ym,nσ(θ, φ) is a complex spherical harmonic of mth order, nth grade (real part σ=1, imaginary part σ=−1), P(r, ω) is the spectrum of the sound pressure at a position r=(r, θ, φ), S(jω) is the input signal in the spectral domain, j is the imaginary unit of complex numbers and jm(kr) is the spherical Bessel function of first order and of nth grade. The complex spherical harmonics Ym,nσ(θ, φ) may then be modeled by the MIMO system/method in the target room, i.e., by the corresponding equalizing filter coefficients. The Ambisonic coefficients Bm,nσ are derived from an analysis of the wave field in the source room or a room simulation.
The exemplary MIMO system/method shown in
The MIMO system/method 300 may be integrated in an exemplary modal beamforming module 400 as depicted in
It is noted that any software, firmware, algorithm and method used herein before for adaptation or in an adaptive process or procedure may be performed or applied in the time domain, frequency domain or wave domain as the case may be.
The description of embodiments has been presented for purposes of illustration and description. Suitable modifications and variations to the embodiments may be performed in light of the above description. The described systems and methods are exemplary in nature, and may include additional elements or steps and/or omit elements or steps. As used in this application, an element or step recited in the singular and proceeded with the word “a” or “an” should be understood as not excluding plural of said elements or steps, unless such exclusion is stated. Furthermore, references to “one embodiment” or “one example” of the present disclosure are not intended to be interpreted as excluding the existence of additional embodiments that also incorporate the recited features. The terms “first,” “second,” and “third,” etc. are used merely as labels, and are not intended to impose numerical requirements or a particular positional order on their objects. A signal flow chart may describe a system, method or software executed by a processor for implementing the method dependent on the type of realization. e.g., as hardware, software or a combination thereof. A module may be implemented as hardware, software or a combination thereof.
Number | Date | Country | Kind |
---|---|---|---|
16150040.0 | Jan 2016 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2016/081012 | 12/14/2016 | WO | 00 |