This application claims priority under 35 U.S.C. § 119 or 365 to Great Britain Application No. 2109307.5, filed Jun. 28, 2021. The entire teachings of the above application(s) are incorporated herein by reference.
The present disclosure relates to a method of generating audio signals for an array of loudspeakers and a corresponding apparatus and computer program.
Loudspeaker arrays may be used to reproduce a plurality of different audio signals at a plurality of control points. The audio signals that are applied to the loudspeaker array are generated using filters, which may be designed so as to avoid cross-talk. However, the determination of the weights of these filters may be computationally expensive, particularly if the control points are moving and the filter weights thus need to be computed in real-time. This may, for example, be the case if the control points correspond to listeners' positions in an acoustic environment.
A previous approach to determining filter weights for a loudspeaker array is described in WO 2017/158338 A1.
Aspects of the present disclosure are defined in the accompanying independent claims.
Examples of the present disclosure will now be explained with reference to the accompanying drawings in which:
1 and
2;
Throughout the description and the drawings, like reference numerals refer to like parts.
In general terms, the present disclosure relates to a method of generating audio signals for an array of loudspeakers to reproduce a plurality of input audio signals at a respective plurality of control points in a manner that avoids cross-talk, i.e., that reduces the extent to which an audio signal to be reproduced at a first control point is also reproduced at other control points, whilst avoiding latency. A set of filters is applied to the input audio signals to obtain the plurality of output audio signals which are output to the array of loudspeakers. The present disclosure relates primarily to ways of determining those filters.
A method of generating audio signals for the array of loudspeakers is shown in
At step S100, a plurality of input audio signals are received. A respective one of the plurality of input audio signals is to be reproduced, by the array, at each of a plurality of control points in an acoustic environment, e.g., a first input audio signal is to be reproduced at a first control point, and a second input audio signal is to be reproduced at a second control point and a third control point. Each of the plurality of control points is associated with a respective one of a plurality of loudspeaker groups, e.g., the first control point is associated with a first loudspeaker group and the second and third control points are associated with a second loudspeaker group.
At step S110, an estimate of a position of each of the plurality of control points is received, e.g., from a position sensor.
At step S120, each of the loudspeakers in the array is assigned to at least one of the plurality of loudspeaker groups, e.g., a first, second and third loudspeaker may be assigned to the first loudspeaker group, and the third, a fourth and a fifth loudspeaker may be assigned to the second loudspeaker group. The assigning may be using the received estimate of the position of each of the plurality of control points.
As will be explained in more detail, the assigning of a particular loudspeaker to a particular loudspeaker group is based on a relative position of the particular loudspeaker with respect to one or more of the at least one control points associated with the particular loudspeaker group. For example, the assigning of the third loudspeaker to a particular loudspeaker group may be based on a relative position of the third loudspeaker with respect to 1) the first control point (the control point associated with the first loudspeaker group) and 2) the second and/or third control points (the control points associated with the second loudspeaker group); if the third loudspeaker is closer to the first control point than to the second and/or third control points, the third loudspeaker may be assigned to the first loudspeaker group.
At step S130, a set of filters may be determined based on the assigning of loudspeakers to groups. The manner in which the set of filters is determined is described in detail below.
At step S140, a respective output audio signal for each of the loudspeakers in the array is determined by applying the set of filters to the plurality of input audio signals. The output audio signal for a particular loudspeaker is generated according to the at least one loudspeaker group to which the particular loudspeaker is assigned.
The set of filters may be applied in the frequency domain. In this case, a transform, such as a fast Fourier transform (FFT), is applied to the input audio signals, the filters are applied, and an inverse transform is then applied to obtain the output audio signals.
At step S150, the output audio signals may be output to the loudspeaker array.
Steps S100 to S150 may be repeated with another plurality of input audio signals. These steps may be repeated in real time.
As steps S100 to S150 are repeated, the set of filters may remain the same, in which case step S130 need not be repeated, or may change. Similarly, if the position of each of the plurality of control points is known not to, or is assumed not to, change for a particular amount of time, then steps S110 to S130 need not be repeated for that particular amount of time.
As one example, steps S110, S120 and S130 can be performed once, during an initialisation phase, and need not be repeated thereafter. For example, the estimates of the positions of each of the plurality of control points may be based on a model rather than being received from a position sensor, and the group assignment of step S120 and/or the set of filters of step S130 may be pre-computed.
A method of determining a set of filters may be performed using steps S110 to S130. By performing such a method, the set of filters can be pre-computed, for example, when programming a device to perform the method of
Similarly, if the position of each of the plurality of control points changes over time but it is known, or is assumed, that their movement will be such that the assigning step 120 will not change over time (for example, if each of the plurality of control points is determined to remain within a respective given region of space), then step S120 need not be repeated for that particular amount of time. For example, step S120 can be performed once, during an initialisation phase, and need not be repeated thereafter (unless, for example, it is determined that at least one of the plurality of control points no longer remains within the respective given region of space).
As would be understood by a skilled person, the steps of
A block diagram of an exemplary apparatus 200 for implementing any of the methods described herein, such as the method of
The memory 220, for example a random-access memory (RAM), is arranged to be able to retrieve, store, and provide to the processor 210, instructions and data that have been stored in the memory 220. The network interface 230 is arranged to enable the processor 210 to communicate with a communications network, such as the Internet. The input interface 250 is arranged to receive user inputs provided via an input device (not shown) such as a mouse, a keyboard, or a touchscreen. The processor 210 may further be coupled to a display adapter 240, which is in turn coupled to a display device (not shown). The processor 210 may further be coupled to an audio interface 260 which may be used to output audio signals to one or more audio devices, such as a loudspeaker array 300. The audio interface 260 may comprise a digital-to-analog converter (DAC) (not shown), e.g., for use with audio devices with analog input (s).
Various approaches for determining the set of filters are now described.
The present disclosure relates to the field of audio reproduction systems with loudspeakers and audio digital signal processing. More specifically, the disclosure encompasses systems to perform sound-field control and control the sound field at two or more different points in space. This can be used to create personal virtual acoustic images through a plurality of loudspeakers and the use of cross-talk cancellation or beamforming with minimum latency (by controlling the sound pressure at the two ears of the listener) or for multi-zone audio reproduction (two or more different signals delivered two or more different zones in space).
Consider the case when we want to use an array of L loudspeakers, to control the reproduced sound pressure at two or more points in space and deliver an independent signal to each control point. This is achieved by creating a signal processing apparatus that takes the two or more inputs signals d1, d2, . . . and generates L loudspeaker signals. The signal processing apparatus includes one or multiple bank of filters. These filters may be non-causal, or may include delays that, in general, affect the input-output latency, hereafter succinctly referred to as latency. The present disclosure proposes a strategy to minimise the latency of the signal processing apparatus.
It is shown below that, in the general case, the control filters are non-causal IIR filters. They can be approximated as causal FIR filters by truncation and by applying a large modelling delay. This, however, comes at the cost of a significant system latency.
It is shown below that the lack of causality of the control filter is caused by the fact that the determinant of the matrix to be inverted for the filter computation is not minimum phase. The present disclosure devises a strategy to ensure the determinant is causal.
Creating audio signal processing strategies to perform sound-field control has been the focus of the industry and academia for many years. The motivation is to accurately control sound radiation from a set of speakers to achieve a desired sound-field reproduction pattern to yield a particular sound effect. Such effects are for example: to create a perceived direction of sound propagation, to create zones of differentiated acoustic pressure inside an environment for delivery of independent sound content (also known as sound zoning or personal audio) or to accurately control sound pressure at the listeners ears to deliver 3D sound, commonly known as cross-talk cancellation (CTC). The approach of the present disclosure can be used to achieve all these effects.
Sound-field control audio reproduction systems require solving an electro-acoustic problem that is based on the inversion of the electro-acoustic path between loudspeakers and the listener's ears. The solution of such problem yields a set of electrical or, in the field of this disclosure, digital filters that applied to the loudspeaker input signals yield a given sound propagation pattern. Previous art for creating digital filters for sound-field control require the digital filter to have certain time and frequency constraints. Considering an audio reproduction system using just two loudspeakers, the first constraint is to control the norm of the digital filters so that these do not produce audible colouration and artefacts and, furthermore, do not excessively boost the loudspeakers with the risk of damaging them. In order to solve such problem, the most common solution is the use of Tychonov regularisation. Although this technique may seem good to control the filter energy usage, the use of Tychonov regularisation introduces the need of applying a modelling delay to the filters time series. Depending on the application, the added modelling delay may not be desirable, as the total system latency of the digital filters is dependent on the filter length. Techniques exist that can minimise latency for systems using just two loudspeakers, however the latency problem cannot be easily avoided if more than two loudspeakers are employed in an array, even if no regularisation is used.
Sound-field control systems using more than two loudspeakers have been shown to be desirable, as they minimise the effect of room reflections and also provide a better acoustic control over the whole audio-frequency range. The use of more than two loudspeakers, however, requires the introduction of a modelling delay. Previous techniques have shown that the modelling delay can be minimised if the electro-acoustic problem is solved following a time-domain approach rather than a frequency-domain approach. In practice, time-domain based techniques require the calculation of very large inverse matrices, which is not possible in the context of real-time adaptive systems that require to constantly calculate and adapt the digital control filters according to the instantaneous position of the pressure control points. Therefore, new techniques that allow for the minimisation of the filter processing latency with loudspeaker arrays are required.
The approach of the present disclosure, Technology 3, introduces a strategy to satisfy such needs. By splitting the process between the loudspeaker array filters it is possible to minimise the filter latency to “zero” latency in the case of a symmetric listener or to “quasi-zero” latency for the case when listeners are not place symmetrically with respect to a loudspeaker array. The approach of the present disclosure is generalised with respect to all loudspeaker array control techniques (Technology 1 and non-Technology 1).
As explained below, the novel signal processing strategy disclosed in this document is based on splitting the loudspeakers into two or more groups. Each group of loudspeakers is associated to one control point. The system takes M signals as input, each of which is supposed to be delivered to a given control point, but not to the others (for example, signal d1 is expected to be delivered to the control point at X1 and not to at x2, x3 etc.). If the system is fed with only one of the M signals, say d1, while d2=d3= . . . =0, the signal processing apparatus will be such that the first group of loudspeakers will create a sound beam to deliver the signal d1 to control point x1, whilst the second set of loudspeakers will create a sound beam to cancel any leakage of signal d1 at control point x2, the third set of loudspeakers will create a sound beam to cancel the leakage of signal d1 to control point x3, and so on. As explained below, if the two or more groups of loudspeakers are chosen wisely, the method ensures that all digital filters are causal or require a very short modelling delay to become causal. This minimises the input-output latency of the system.
On the contrary, it is shown below that when the number of loudspeakers is equal or larger than 3 the digital filter computed with a conventional approach (i.e. without the method disclosed here) will, in general, be non-causal. This means that the output of the filters depends, in theory, on both past and future values of the input. These filters can be approximated as causal FIR filters, but at the cost of introducing a long modelling delay and therefore increasing the system latency.
In what follows, we first introduce the geometry and variables needed to study this problem. We will then demonstrate with numerical examples that the control filters of implementations common in the state of the art are non-causal and show that this is caused by the fact that the determinant of the matrix to be inverted is not minimum-phase and non-causal. We will then disclose our strategy to subdivide the loudspeaker into groups and demonstrate, again with numerical examples, that this approach allows for the determinant of the matrix to be minimum phase and therefore for the design of causal control filters (if a small modelling delay is applied). For completeness, a mathematical proof is provided of the (lack of) causality of the filters in the simple case of 2 control points and free-field transfer functions.
Consider a system with a reference geometry as reported in (ω) is the electro-acoustical transfer function between the
-th loudspeaker and the m-th control point, expressed as a function of the angular frequency ω. The reproduced sound pressure signals at the M control points, p(ω)=[p1(ω), . . . , pM(ω)]T, for a given frequency ω are given by p(ω)=S(ω)q(ω), where q(ω) is a vector whose L elements are the loudspeaker signals. These are given by q(ω)=H(ω)d(ω), where d(ω) is a vector whose two elements are the M signals intended to be delivered to the various control points. H(ω) is a complex-valued matrix that represents the effect of the signal processing apparatus, hereafter succinctly referred to as “filters”. It should be clear though that each element of H(ω) is not necessarily a single filter but can be the result of a combination of filters, delays, and other signal processing blocks.
In what follows, the dependency of variables on the frequency ω will be dropped to simplify the notation. We have therefore that
An approach to design the filters is to compute H as the (regularized) inverse or pseudo-inverse of matrix S, or of a model of matrix S, that is
where matrix G is our model or estimate of the plant matrix S, A is a regularisation matrix (for example for Tikhonov regularisation), [·]H is the complex-transposed (Hermitian) operator, j=√{square root over (−1)}, and T is a modelling delay. A straightforward implementation of this expression leads to a signal flow as using bank of M×L filters, as shown in the block diagram of
If, on the one hand, designing the filters on the basis of equation 2 allows for an effective delivery of independent signals to the two control points, on the other hand, when the number of loudspeakers L is larger than 3, the elements of H are non-causal IIR filters. They can be approximated by causal filters by applying a modelling delay to the elements of H (and by truncating the filters in the time domain, or equivalently by applying a frequency sampling approach), but this comes at the cost of significantly increasing the system latency.
To illustrate this effect, let's consider a simple set-up consisting of an array of a plurality of loudspeakers and 2 control points located at the ear of a listener, as shown in
Equation 2 can be rewritten as
Each of the terms of this equation can be studied independently. To simplify the analysis, assume that T=0 and A is a diagonal, real-valued, and frequency independent matrix, and that all elements of matrix G can be represented as FIR filters. Because of the latter assumption, then also the elements of GH and adj(GGH+A) are FIR filters (not necessarily causal), as they are given by products (in the frequency domain) and sums of FIR filters. For the same reason, det(GGH+A) is an FIR filter. Its inverse, on the other hand, is an IIR filter. Matrix (GGH+A) is a Gramian matrix and as such it is positive semi-definite and its eigenvalues and determinant are real and non-negative. This implies that det(GGH+A), as well as its inverse, are zero-phase filters, whose impulse response are symmetric with respect to time t=0, and therefore non-causal.
both plots for the case of M=2 introduced above. Non-causal pre-ringing is clearly observable in both filters.
An alternative signal flow to the state of the art MIMO theory is to implement (GGH+A)−1 (with some modelling delay) as a bank of M×M filters, hereafter referred to as Independent Filters (IFs), and GH (also with added modelling delay) as a bank of M×L filters, referred as Dependent Filters (DFs) and which are generally simpler to compute and implement than the Independent Filters.
The considerations made in the previous section suggest that a strategy that eliminates or significantly reduces the non-causal pre-ringing in the impulse response of the inverse of det(GGH+A) will significantly reduce the required amount of modelling delay and therefore the overall system latency.
For the sake of explanation, let us consider the geometry and variables introduced in the previous section. We subdivide the loudspeaker array in M subsets. Each subset m is associated to the m-th control point, see example in
1 and
2.
After having created the M loudspeaker sets m we create an auxiliary matrix {tilde over (G)} given by
where ⊙represents the element-wise (Hadamard) product and Γ is a 2×L activation matrix whose coefficients are
The activation matrix sets to zero the elements in each row m of G that do not belong to the set m, associated to that row.
In the case of two control points, for example, if the loudspeakers are ordered such that loudspeakers 1, 2, . . . , N belong to 1 and speakers N, N+1, . . . , L belong to
2 (note that, in this case, the N-th speaker belongs to both sets), matrix {tilde over (G)} is of the form
The filters can then be designed on the basis of the following equation:
where, as above, T is a modelling delay and A is a regularisation matrix.
As for equation 2, this equation can be implemented as a bank of Independent Filters (IF) and a bank of Dependent Filters (DF), such that
note that, in order to ensure causality of both sets of filters, the modelling delay has now been split into two terms T1 and T2 such that T1+T2=T.
m.
To gain a better understanding of the approach of the present disclosure, consider the diagram in 1 will create a sound beam to deliver the target signal to control point 1, whereas the loudspeakers of subset
2 will create a sound beam that cancels any “leakage” of the beam created by
1 to control point 2. The speakers of
1 will also cancel the “leakage” of the beam created by
2 to control point 1.
It is important to clarify that the approach of the present disclosure not only covers the DSP implementation as described in
To demonstrate the effect of this approach, let us again consider the control geometry with a loudspeaker array of L loudspeakers and M=2 pressure control points. The loudspeakers can be divided in various ways. As shown in 1 whereas loudspeakers N or N+1 belong to group
2.
The performance of a system with filters created according to the approach of the present disclosure is shown in
which is a dimensionless quantity measured in dB. The results of
The example is considered that includes more than M=2 control points and the geometry shown in
To check the validity of the approach of the present disclosure, performance results for the geometry of
The results of
In conclusion, the pre-ringing of the filters can be eliminated and the modelling delay significantly reduced if the filters are designed on the basis of equation 7 and with the appropriate definition of the loudspeaker groups m.
One option is to assign each loudspeaker to a given subset
m, associated to the moth control point, if that loudspeaker is “closer” to (or as close as) the control point m than any other control point.
The concept of “close” is defined by a distance factor rm. The latter can be defined either as the geometrical distance between the
-th loudspeaker and the m-th control point, i.e.
=∥xm−
∥, of the acoustic path between said loudspeaker and control points. The two definitions are identical in case of sound propagating in the free-field (i.e. no acoustic diffraction). Thus, this first criterion to define whether a given loudspeaker with index
belongs to a given set
m is mathematically defined as:
To have an easier understanding, see example of (1<l<3)∈
1,
(3<l<5)∈
2 and
(5<l<L)∈
3.
The rationale for that choice is that, under the assumption that the loudspeakers are ideal monopole sources radiating in free field, the elements of matrix G are of the form
where c0 is the speed of sound. The elements of Ψ=G{tilde over (G)}H+A (assuming again that A is diagonal and real-valued) are of the form
where the elements of matrix Γ are as defined in equation (5). In the light of equation 12 it is clear that all terms of the sum are either delays (if ∈
m) or are equal to zero (if
∉
m). This in turn implies that all terms of matrix Ψ correspond to causal filters—this is not the case with the conventional filter design (eq 2). Also its determinant can be represented as a causal filter, as it is given by a linear combination of the product (in the frequency domain) of causal filters.
The causality of the determinant is not sufficient to ensure the causality of its inverse. The determinant should also be a minimum phase filter. Whereas this is difficult to prove mathematically, practice shows that, when designing the filters with the method proposed here, the determinant is a minimum phase filters (i.e. all its zeros are within the unit circle) for a large variety of cases of practical relevance.
The same criterion to assign loudspeakers to a given group could be extended to the case when a given loudspeaker group is assigned to more than one control point (a group of control points). In this case, a reference control point is defined for each group of control points. This reference control point could coincide with one of the control points in that group, or could be an additional control point created for the sole purpose of assigning loudspeakers to groups (e.g., a centroid of the control points in the group). With this in mind, a loudspeaker with index is assigned to a group
ν based on the following equation:
where (and
) is the distance from the
-th loudspeaker to the reference control point of the ν-th group (or μ-th group) of control points. In this case, ν could be group 1 and μ could be group 2.
This operation allows for loudspeaker groups to be associated to more than one control points and, in many practical cases, it also ensures that all loudspeakers in a given loudspeaker group are closer to all control points associated to that group than to control points associated to different groups, but reduces the computational cost required for assigning loudspeakers to groups. In this case, the causality of the filters may not always be ensured, but still the latency of the system may be reduced significantly if the position of the reference control points is chosen wisely.
One practical example where this option of assigning more than one control point to one group may be useful is given by the case when the system is supposed to deliver independent signals to multiple listeners, and each listener is associated to two or more control points (for example, the position of their ears) and those two or more control points are in turn associated to one loudspeaker group. The reference control point associated to each group can be, for example, the centre of the head of the given listener.
In case of 2 control points a different option can be chosen for the definition of the loudspeaker sets.
Firstly, we define the path difference
We then split the loudspeakers into the two sets such that
Namely, the path difference of any loudspeakers in subset 1 should be greater than, or equal to the path difference of any loudspeaker in subset 2. Note that criterion (12) (Option 1) being satisfied implies that (16) is satisfied, but the opposite is not true. This means that criterion (12) is a stricter condition than criterion (16).
To understand the rationale of this criterion, we observe that, under the same assumption as in the previous section (i.e. equation 13), the determinant of (G{tilde over (G)}H+A) is of the form
where D and are real, frequency independent numbers (their exact definitions, eq. 18 and 19, are not particularly important for the sake of the approach of the present disclosure). If the loudspeaker subsets (i.e. matrix Γ, as defined in equation (5)) have been defined to satisfy condition (16), the arguments of the exponentials in equation (17) will always have zero real part and negative or zero imaginary part. As a consequence of that, the inverse of the determinant has an input-output time-domain relation of the form
which is clearly a causal relation if condition (16) is satisfied.
The stability of [det(G{tilde over (G)}H+A)]−1 is ensured by the Cauchy-Schwarz inequality, by which
{tilde over (g)}1 and {tilde over (g)}2 (and g1, g2) are the first and second row of matrix {tilde over (G)} (and G). The strict inequality holds if A1,1, A2,2>0 or if the pairs {tilde over (g)}1, g2 and {tilde over (g)}2, g1 are linearly independent. The latter condition will in general be true since some of the entries of {tilde over (g)}1 are zero whereas the corresponding elements of g2 are not (or equivalently for {tilde over (g)}2 and g1).
In summary, this second condition will ensure that the inverse determinant [det(G{tilde over (G)}H+A)]−1 corresponds to a causal and stable filter, which therefore no longer needs to be approximated by an FIR with a long modelling delay.
Considering a given set of control points M with loudspeakers divided into a set of M groups. According to Eq. 3, it is possible to define the adjoint matrix B with size M×M so that
with elements Bnm. For a given set of M input binaural signals d=[d1, d2, . . . , dM]T, the signal driving a loudspeaker that belongs to the subset m (and to no other subset) is given by
In case of ideal monopoles propagating in free-field, i.e. eq. 13, this becomes
If the loudspeaker belongs to two subsets m and
m+1 the loudspeaker signal becomes
and in case of ideal monopoles in free field
As a consequence of equations 24, under free-field assumptions all signals feeding the speakers that belong to the same subset m (with the possible exception of single speakers that belong to two groups) are identical apart from a gain and a delay that are loudspeaker dependent. In practice, this effect can also be observed in filters created using other plant transfer functions different from free-field.
In the case of a system using the Technology 1 DSP architecture, the loudspeaker signals for a speaker belonging only to speaker set m are
In the case that one loudspeaker belongs to both speaker sets m and
m+1 the loudspeaker signals are
The proof above where given for the case where the plant matrix G is defined under the assumption that the loudspeakers are ideal monopoles (with a “flat” frequency response) radiating in free-field, and thus neglecting any effect of acoustic diffraction (ref. eq. 13). This may be relevant especially in the case of cross-talk cancellation, where the control points correspond to the ears of one of more listeners, and the scattering effect of the human head may not be negligible. It can be observed that the elements on the diagonal of {tilde over (G)}HG represent the sum of auto-spectra of transfer functions of all the loudspeaker of a given subset m to the corresponding control point xm. Those auto-spectra are, by definition, real-valued, i.e. zero-phase. If the transfer functions do not have a “flat” frequency response then the inverse Fourier transform of their auto-spectra, their auto-correlation functions, will be symmetric non-causal signals. This in turn implies that, in general, it cannot be guaranteed that the determinant of (GGH+A) can be represented as a causal filter, as in the case of free field shown above.
An example is shown in
Variations—Filter Design with Weighted Norm
If we neglect the regularisation matrix A, the conventional filter design approach based on eq. 2 can be interpreted as the solution of the constrained optimisation problem
which is a classical minimum 2 norm solution. Noting that the latter is one of the infinite possible solutions of an underdetermined problem, the approach can be made more general by defining a weighted norm
where W is a real-valued diagonal matrix, which, in the case under consideration, applies different penalty (weight) to different loudspeakers when computing the solution. In this case equation 2 becomes
This weighted-norm approach can be extended straightforwardly to the approach of the present disclosure. In this case, after having reintroduced the regularisation matrix A, an alternative to equation 7 to be used to design the filters is
The approach presented herein can be applied also to a ‘hybrid’ signal processing architecture (‘Technology 2’). In this case two models C and G of the plant matrix S are used. C is a simple model of the form
where and
are a real-valued and frequency independent scalars. From a signal processing prospective, each element of C is therefore a product of a gain and a delay.
Matrix G is a generally more complex model of S, which may account for the loudspeaker response, acoustic diffraction, and other factors.
After having defined
the filters can be computed on the basis of the following equation:
Practice shows that causality and stability of the filters are granted provided the delay terms are chosen wisely.
It is also possible to split the filters in dependent and independent filters, as in equations 8 and 9. In this case
The following considerations on the minimum required modelling delays assume G is free-field (eq. 13). They can, however, be extended to more general cases, even if approximately.
The elements of {tilde over (C)} have delay terms of the form hence the delay to ensure causality of the dependent filters should satisfy the relation
Note that this modelling delay does not have a significant impact on latency, since the minimum latency of a dependent filter (DF) is zero and the maximum latency is τmax−τmin. In practice, it may be convenient to choose T1=.
IF is a 2×2 matrix whose elements are
The minimum modelling delay should ensure that
and therefore
Given that
the equation above is rewritten as
If ΔN≥0 and ΔN′≤0 then no modelling delay T2 is required, i.e., T2=0.
The total modelling delay T should therefore satisfy the relation
When =
Considering that ∥r2,N−r1,N∥≤∥x1−x2∥ a possible, even if sub-optimal choice for the total modelling delay is
If the control points x1 and x2 are the two ears of a listener, the system described here is a cross-talk cancellation system. In this case, matrix G is a model of the Head-Related Transfer Function of the loudspeaker array under consideration (may also be a free-field model, in which case G=C). The factor Δ represents the Interaural Time Difference (ITD) associated to the
-th loudspeaker. Ordering the loudspeakers as in equation (16) corresponds to ordering the loudspeakers on based on their ITD. Hence, if x1 is the left ear, y1 will be the location of the leftmost loudspeaker and yL the location of the rightmost one.
Regarding the modelling delay, if the array is split in two and the listener is pointing their nose towards the centre of the array, no modelling delay is required for the Independent Filters is T2=0. In this case, the filters of the matrix IF look as shown in for any possible system configuration. This corresponds to the maximum Inter-aural Time Difference. If a free-field model is used for the Head-related Transfer Function (shadowless head model), namely if G=C, this delay is the physical distance between the two control points divided by the speed of sound. As discussed above, a possible but sub-optimal choice for the total modelling delay is given by equation 48. More generally, removing the free-field assumption
where maxITD is the maximum possible Interaural Time Difference.
A listener with the head not pointing towards the centre of the array and the required modelling delay is shown in the top of =[1,2,3,4] and
=[5,6,7,8]. A close-up of the impulse responses of the IF is shown in the bottom of
Systems using Technology 1 and Technology 2 filters can already obtain very low latencies 5-10 ms, however, due to the soundcard input-output latency this is increased to a total of 10-20 ms total latency, which may be too much for certain applications. Furthermore, longer filters require a longer modelling delay and inherent processing latency and that may not be feasible for some applications. A comparison of the measured latency improvement introduced by the approach of the present disclosure is shown in
The Technology 1 signal processing scheme is unique with respect to the fact that it allows for a large degree of listener-adaptability at low processing cost using scaled delays. The same applies to the Technology 3 approach.
Another alternative to minimise the system latency, as mentioned above, is the design of the filters using a time-domain approach. This approach, however, is very computationally expensive and it also introduces phase distortion.
One alternative to the approach of the present disclosure is to use two conventional beamformers based on delay and gains only, each steered to one control point. This corresponds to filters equal to CHe−jωT
In the presented signal processing scheme, the centre speaker signal is the same for both input channels for a symmetric listener, and all signals feeding the speakers that belong to either or
are identical apart from a gain and a delay, see the magnitude of the control filters shown in
Because this signal processing is substantially different from the conventional filter design method, it would be possible to characterise a system in laboratory conditions and detect the use of the algorithm.
An effect of the present disclosure is to provide a filtering approach with improved stability.
It will be appreciated that the above approaches can be implemented in many ways. There follows a general description of features which may be common to many implementations of the above approaches. It will of course be understood that, unless indicated otherwise, any of the features of the above approaches may be combined with any of the common features listed below.
There is provided a method of generating audio signals for an array of loudspeakers (e.g., a line array of L loudspeakers).
The method may comprise receiving a plurality of input audio signals [e.g., d]. A respective one of the plurality of input audio signals may be to be reproduced, by the array, at each of a plurality of control points (or ‘listening positions’)[e.g., x1, . . . , xM ∈R3] in an acoustic environment (or ‘acoustic space’).
Each of the plurality of input audio signals may be different.
At least one of the plurality of input audio signals may be different from at least one other one of the plurality of input audio signals.
Each of the plurality of control points may be associated with a respective one of a plurality of loudspeaker groups.
The method may further comprise receiving an estimate of a position of each of the plurality of control points.
The method may further comprise assigning, using the received estimate of the position of each of the plurality of control points, each of the loudspeakers in the array to at least one of the plurality of loudspeaker groups.
The assigning of a particular loudspeaker to a particular loudspeaker group may be based on a relative position of the particular loudspeaker with respect to one or more of the at least one control points associated with the particular loudspeaker group.
The assigning of the particular loudspeaker to the particular loudspeaker group may be based on a length of a path between the particular loudspeaker and one of the at least one control points associated with the particular loudspeaker group, or a path between the particular loudspeaker and a point between the at least one control points associated with the particular loudspeaker group.
The length of the path may be the length of an acoustic path.
The assigning of the particular loudspeaker may comprise:
The assigning of the particular loudspeaker may comprise:
The reference control point of a particular loudspeaker group may be a centroid of the control points associated with the particular loudspeaker group.
The plurality of control points may comprise a first control point associated with a first one of the plurality of loudspeaker groups and a second control point associated with a second one of the plurality of loudspeaker groups, and the assigning may comprise:
Each two of the loudspeaker groups may have at most one loudspeaker in common.
The assigning may comprise assigning each of the loudspeakers in the array to at most two of the plurality of loudspeaker groups.
Each of the loudspeaker groups may comprise at least one of the loudspeakers in the array. Each of the loudspeaker groups may comprise at least two of the loudspeakers in the array.
At least two of the loudspeakers in each of the loudspeaker groups may have substantially the same frequency response.
The plurality of input audio signals may comprise:
The plurality of input audio signals may consist of:
The first loudspeaker and the at least one other loudspeaker may have substantially the same frequency response.
The scaling may be frequency-independent.
The method may further comprise generating (or ‘determining’) a respective output audio signal [e.g., Hd or q] for each of the loudspeakers in the array by applying a set of filters [e.g., H] to the plurality of input audio signals [e.g., d].
The set of filters may be determined such that, when the output audio signals are generated by applying the set of filters to the plurality of input audio signals and the output audio signals are fed to the array, substantially only the respective one of the plurality of input audio signals is reproduced at each of the plurality of control points.
The output audio signal for the particular loudspeaker may be based on each of the plurality of input audio signals.
The output audio signal for a particular loudspeaker may be generated according to the at least one loudspeaker group to which the particular loudspeaker is assigned.
The estimate of the position of each of the plurality of control points may be received at a first time and the assigning may be at a second time, and the method may further comprise:
The set of filters may be digital filters. The set of filters may be applied in the frequency domain.
The set of filters may be based on a first plurality of filter elements [e.g., {tilde over (C)} or {tilde over (G)}] comprising a respective filter element for each of the control points and loudspeakers.
For each particular control point and particular loudspeaker:
The reduced value may be zero.
Each one of the first plurality of filter elements [e.g., Ć ] may be a frequency-independent delay-gain element [e.g., cm,l=e−jωτ(x
Each one of the first plurality of filter elements [e.g., {tilde over (C)}] may comprise a delay term [e.g. e−jωτ(x
Each one of the first plurality of filter elements may comprise a delay term [e.g., e−jωτ(x
The set of filters may be based on a second plurality of filter elements [e.g., G] comprising a respective filter element for each of the control points and loudspeakers, each filter element comprising an approximation of a respective transfer function between an audio signal applied to a respective one of the loudspeakers and an audio signal received at a respective one of the control points from the respective one of the loudspeakers.
The set of filters may be based on:
The subset may be a strict subset.
A filter element may be a weight of a filter. A plurality of filter elements may be any set of filter weights. A filter element may be any component of a weight of a filter. A plurality of filter elements may be a plurality of components of respective weights of a filter.
The set of filters may comprise:
Generating the respective output audio signal for each of the loudspeakers in the array may comprise:
The output audio signal for a particular loudspeaker may be generated by applying, to a subset of the intermediate audio signals, the one or more filters of the second subset of filters corresponding to the particular loudspeaker and the one or more control points associated with the one or more loudspeaker groups to which the particular loudspeaker is assigned, the subset of the intermediate audio signals comprising the one or more intermediate audio signals for the one or more control points associated with the one or more loudspeaker groups to which the particular loudspeaker is assigned.
The array may comprise L loudspeakers of which Lcommon are assigned to more than one of the plurality of loudspeaker groups, the plurality of control points may comprise M control points, and the first subset of filters [e.g., [G{tilde over (C)}H]−1 or [G{tilde over (G)}H]−1] may comprise M2 filters and the second subset of filters [e.g., {tilde over (C)}H or {tilde over (G)}H] may comprise at least L+Lcommon filters and at most L×M filters.
The set of filters or the first subset of filters [e.g., [G{tilde over (C)}H]−1 or [G{tilde over (G)}H]−1] may be determined based on an inverse of a matrix [e.g., [G{tilde over (C)}H] or [G{tilde over (G)}H]] containing the first [e.g., {tilde over (C)} or {tilde over (G)}] and second [e.g., G] pluralities of filter elements.
The matrix [e.g., [G{tilde over (C)}H] or [G{tilde over (G)}H]] containing the first [e.g., {tilde over (C)} or {tilde over (G)}] and second [e.g., G] pluralities of filter elements may be regularized prior to being inverted [e.g., by regularisation matrix A].
The matrix [e.g., [G{tilde over (C)}H] or [G{tilde over (G)}H]] containing the first [e.g., {tilde over (C)} or {tilde over (G)}] and second [e.g., G] pluralities of filter elements may be determined based on:
The set of filters may be determined based on:
The set of filters may be determined using an optimisation technique.
The first subset of filters may be determined so as to reduce a difference between a scalar matrix (e.g., an identity matrix I) and a matrix comprising a product of: a matrix [e.g., G] comprising the second plurality of filter elements, a matrix [e.g., {tilde over (C)}] comprising the first plurality of filter elements, and a matrix representing the first subset of filters [e.g., IFs].
The approximation for the first plurality of filter elements [e.g., {tilde over (C)}] may be a first approximation and the approximation for the second plurality of filter elements [e.g., G] may be a second approximation.
The first and second approximations may be different. The first and second pluralities of filter elements may be based on different approximations of the transfer functions. In particular, the different approximations may be based on different models of the transfer functions.
The first approximation (e.g., that used to determine {tilde over (C)}) may be based on a free-field acoustic propagation model and/or a point-source acoustic propagation model.
The second approximation (e.g., that used to determine G) may account for one or more of reflections, refraction, diffraction or scattering of sound in the acoustic environment. The second approximation may alternatively or additionally account for scattering from a head of one or more listeners. The second approximation may alternatively or additionally account for one or more of a frequency response of each of the loudspeakers or a directivity pattern of each of the loudspeakers.
The second approximation may be based on one or more head-related transfer functions, HRTFs. The one or more HRTFs may be measured HRTFs. The one or more HRTFs may be simulated HRTFs. The one or more HRTFs may be determined using a boundary element model of a head.
The second plurality of filter elements may be determined by measuring the set of transfer functions.
The plurality of control points [e.g., x1, . . . , xM∈R3] may be locations of a corresponding plurality of listeners, e.g., when operating in a ‘personal audio’ mode.
The plurality of control points [e.g., x1, . . . , xM∈R3] may be locations of ears of one or more listeners, e.g., when operating in a ‘binaural’ mode.
The method may further comprise determining the plurality of control points using a position sensor.
Generating the respective output audio signals [e.g., Hd] may comprise using a filter bank to apply at least a portion of the set of filters in a plurality of frequency subbands.
The first subset of filters [e.g., [G{tilde over (C)}H]−1] and the second subset of filters [e.g., {tilde over (C)}H] may be applied in each of the frequency subbands.
The first subset of filters [e.g., [G{tilde over (C)}H]−1] and the second subset of filters [e.g., {tilde over (C)}H] may be applied within the filter bank.
The first subset of filters [e.g., [G{tilde over (C)}H]−1] may be applied in fullband and the second subset of filters [e.g., {tilde over (C)}H] may be applied in each of the frequency subbands. In other words, the first subset of filters [e.g., [G{tilde over (C)}H]−1] may be applied outside the filter bank and the second subset of filters [e.g., {tilde over (C)}H] may be applied within the filter bank.
Generating a respective output audio signal for each of the loudspeakers in the array may comprise:
The first plurality of filter elements may comprise a first subset of first filter elements for a first one of the plurality of frequency subbands and a second subset of first filter elements for a second one of the plurality of frequency subbands; and/or the second plurality of filter elements may comprise a first subset of second filter elements for the first one of the plurality of frequency subbands and a second subset of second filter elements for the second one of the plurality of frequency subbands.
The first subset of first filter elements and the second subset of first filter elements may be different and/or the first subset of second filter elements and the second subset of second filter elements may be different.
The set of filters [e.g., H] may be time-varying. Alternatively, the set of filters [e.g., H] may be fixed or time-invariant, e.g., when listener positions and head orientations are considered to be relatively static.
The method may further comprise outputting the output audio signals [e.g., Hd or q] to the array of loudspeakers.
The method may further comprise receiving the set of filters [e.g., H], e.g., from another processing device, or from a filter determining module. The method may further comprise determining the set of filters [e.g., H].
At least one of the first plurality of filter elements [e.g., {tilde over (C)}] may be different from a corresponding one of the second plurality of filter elements [e.g., G].
The method may further comprise determining any of the variables listed herein using any of the equations set out herein.
The set of filters may be determined using any of the equations set out herein (e.g., equations 2, 3, 7, 8, 9, 31, 32, 35, 36, 37, etc.).
There is provided an apparatus configured to perform any of the methods described herein.
The apparatus may comprise a digital signal processor configured to perform any of the methods described herein.
The apparatus may comprise the array of loudspeakers.
The apparatus may be coupled, or may be configured to be coupled, to the loudspeaker array.
There is provided a computer program comprising instructions which, when executed by a processing system, cause the processing system to perform any of the methods described herein.
There is provided a (non-transitory) computer-readable medium or a data carrier signal comprising the computer program.
In some implementations, the various methods described above are implemented by a computer program. In some implementations, the computer program includes computer code arranged to instruct a computer to perform the functions of one or more of the various methods described above. In some implementations, the computer program and/or the code for performing such methods is provided to an apparatus, such as a computer, on one or more computer-readable media or, more generally, a computer program product. The computer-readable media is transitory or non-transitory. The one or more computer-readable media could be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or a propagation medium for data transmission, for example for downloading the code over the Internet. Alternatively, the one or more computer-readable media could take the form of one or more physical computer-readable media such as semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disc, or an optical disk, such as a CD-ROM, CD-R/W or DVD.
In an implementation, the modules, components and other features described herein are implemented as discrete components or integrated in the functionality of hardware components such as ASICS, FPGAs, DSPs or similar devices.
A ‘hardware component’ is a tangible (e.g., non-transitory) physical component (e.g., a set of one or more processors) capable of performing certain operations and configured or arranged in a certain physical manner. In some implementations, a hardware component includes dedicated circuitry or logic that is permanently configured to perform certain operations. In some implementations, a hardware component is or includes a special-purpose processor, such as a field programmable gate array (FPGA) or an ASIC. In some implementations, a hardware component also includes programmable logic or circuitry that is temporarily configured by software to perform certain operations.
Accordingly, the term ‘hardware component’ should be understood to encompass a tangible entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein.
In addition, in some implementations, the modules and components are implemented as firmware or functional circuitry within hardware devices. Further, in some implementations, the modules and components are implemented in any combination of hardware devices and software components, or only in software (e.g., code stored or otherwise embodied in a machine-readable medium or in a transmission medium).
Those skilled in the art will recognise that a wide variety of modifications, alterations, and combinations can be made with respect to the above described examples without departing from the scope of the disclosed concepts, and that such modifications, alterations, and combinations are to be viewed as being within the scope of the present disclosure.
It will be appreciated that, although various approaches above may be implicitly or explicitly described as ‘optimal’, engineering involves tradeoffs and so an approach which is optimal from one perspective may not be optimal from another. Furthermore, approaches which are slightly sub-optimal may nevertheless be useful. As a result, both optimal and sub-optimal solutions should be considered as being within the scope of the present disclosure.
Examples of the present disclosure are set out in the following numbered clauses.
1. A computer-implemented method of generating audio signals for an array of loudspeakers, the method comprising:
2. The method of clause 1, wherein the assigning of the particular loudspeaker to the particular loudspeaker group is based on a length of a path between the particular loudspeaker and one of the at least one control points associated with the particular loudspeaker group, or a path between the particular loudspeaker and a point between the at least one control points associated with the particular loudspeaker group.
3. The method of clause 2, wherein the length of the path is the length of an acoustic path.
4. The method of any of clauses 2 to 3, wherein the assigning of the particular loudspeaker comprises:
5. The method of any of clauses 2 to 3, wherein the assigning of the particular loudspeaker comprises:
6. The method of any of clauses 2 to 3, wherein the plurality of control points comprises a first control point associated with a first one of the plurality of loudspeaker groups and a second control point associated with a second one of the plurality of loudspeaker groups, and the assigning comprises:
7. The method of any preceding clause, wherein the plurality of input audio signals comprises:
8. The method of any preceding clause, wherein the plurality of control points are locations of a plurality of listeners or locations of ears of one or more listeners.
9. The method of any preceding clause, wherein the estimate of the position of each of the plurality of control points is received at a first time and the assigning is at a second time, and wherein the method further comprises:
10. The method of any preceding clause, wherein the set of filters is based on a first plurality of filter elements comprising a respective filter element for each of the control points and loudspeakers, wherein, for each particular control point and particular loudspeaker:
11. The method of clause 10, wherein the set of filters is based on a second plurality of filter elements comprising a respective filter element for each of the control points and loudspeakers, each filter element comprising an approximation of a respective transfer function between an audio signal applied to a respective one of the loudspeakers and an audio signal received at a respective one of the control points from the respective one of the loudspeakers.
12. The method of any of clauses 10 to 11, wherein the approximation for the first plurality of filter elements is based on a free-field acoustic propagation model and/or the approximation for the second plurality of filter elements accounts for one or more of reflection, refraction, diffraction or scattering of sound in the acoustic environment.
13. The method of any preceding clause, wherein generating the respective output audio signal for each of the loudspeakers in the array comprises:
14. The method of clause 13, wherein the output audio signal for a particular loudspeaker is generated by applying, to a subset of the intermediate audio signals, the one or more filters of the second subset of filters corresponding to the particular loudspeaker and the one or more control points associated with the one or more loudspeaker groups to which the particular loudspeaker is assigned, the subset of the intermediate audio signals comprising the one or more intermediate audio signals for the one or more control points associated with the one or more loudspeaker groups to which the particular loudspeaker is assigned.
15. An apparatus configured to perform the method of any preceding clause, or
Those skilled in the art will also recognise that the scope of the invention is not limited by the examples described herein, but is instead defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
2109307.5 | Jun 2021 | GB | national |
Number | Date | Country | |
---|---|---|---|
20230007424 A1 | Jan 2023 | US |