The present invention relates generally to sound reproduction systems, and may be viewed as relating to virtual sound systems.
Takeuchi and Nelson [1] first proposed the Optimal Source Distribution (OSD) for achieving binaural sound reproduction for a single listener. The approach has proven to yield excellent subjective results [2] and has since been implemented in a number of products for virtual sound imaging. A remarkable property of the OSD is that the cross-talk cancellation produced at a single centrally placed listener is replicated at all frequencies at a number of other locations in the radiated sound field, a phenomenon that has since been further investigated [3, 4]. An analysis of this has recently been presented by Yairi et al [5] who also show that once a discrete approximation to the hypothetically continuous OSD is introduced, the effectiveness of the cross-talk cancellation is achieved at many but not all frequencies at the non-central positions in the radiated sound field. The work presented here provides a framework for the analysis of the multiple listener virtual sound imaging problem based on two methods; a minimum norm solution and a linearly constrained least squares approach. The aim is to enable the exploitation of the fundamental property of the OSD with the objective of producing exact cross-talk cancellation for multiple listener positions at all frequencies. The background to the problem is introduced, and then the new theoretical approach is presented. Although the example given below is explained in terms of the original two-channel OSD system, it should also be emphasised that the approach presented is equally applicable to the three-channel extension [6].
We have devised a sound reproduction system in which an approximation of the sound field generated by OSD is reproduced which allows for multiple listeners to simultaneously enjoy the enhanced binaural sound reproduction associated with OSD.
According to one aspect of the invention there is provided signal processor as claimed in claim 1.
The signal processor may be configured to implement an approximation of the sound field of a two channel OSD system or an approximation of the sound field of a three channel OSD system.
The loudspeaker input signals generated by the signal processor may be representative of a discrete source strength. The loudspeaker input signals may comprise a source strength vector.
The processing which the signal processor is configured to perform may be based on or derived from any of the solutions for producing a source strength signal as set out in the detailed description.
The signal processor may comprise a filter which is arranged to perform at least some of the signal processing.
The processing performed by the signal processor may be in the digital domain.
Optimal Source Distribution, OSD, may be considered as comprising a hypothetically continuous acoustic source distribution, each element of which radiates sound at a specific frequency in order to achieve cross-talk cancellation at the ears of a listener. OSD may also be defined as a symmetric distribution of pairs of point monopole sources whose separation varies continuously as a function of frequency in order to ensure that all frequencies of one-quarter of an acoustic wavelength between source pairs and the ears of the listener. A discretised embodiment of OSD may be described as comprising an array of frequency-distributed loudspeakers in which multiple pairs of loudspeakers are provided, each pair producing substantially the same frequency or substantially the same band of frequencies, wherein those pairs of loudspeakers which produce higher frequencies are placed closer together and those producing lower frequencies are placed further apart. The background to and further details of OSD are contained in the Detailed Description below.
According to another aspect of the invention there is provided a sound reproduction apparatus which comprises the signal processor of the first aspect of the invention, and an array of discretised speakers.
In the case that the invention is implemented by a two-channel system, the array of loudspeakers is divided into two banks or sub-array of the loudspeakers, each sub-array constituting a channel. In a three-channel system, as an enhancement to a two-channel system, a third loudspeaker is included which emits over all frequencies (which are emitted by the two-channel system) and which is located substantially central or intermediate of the two sub-arrays from substantially one and the same position (i.e. there is substantially no frequency emission distribution in space). Different speakers may be arranged to output different respective frequencies or different frequency bands. The speakers may be arranged to emit different respective frequencies in a distributed frequency manner.
The speakers may be arranged in a spatially distributed manner. The spacing between successive/neighbouring speakers may be substantially in accordance with a logarithmic scale.
A speaker may comprise an electro-acoustic transducer.
Each of the loudspeakers of the array may be considered as a discrete source.
According to a further aspect of the invention there is provided machine-readable instructions to implement the processing of the signal processor claim 1.
The instructions may be stored on a data carrier or may be realised as a software or firmware product.
The invention may comprise one or more features, either individually or in combination, as disclosed in the description and/or drawings.
Various embodiments of the invention will now be described, by way of example only, in which:
There are now described a number of solutions to achieve the generation of an approximation of OSD sound field using an array of discrete number of loudspeakers and advantageously obtaining multiple listener positions for binaural sound reproduction. These may be considered as being grouped into two main types, namely a least squares method and minimal norm method. The solutions yielded by such methods are implemented by a signal processor, which may comprise a suitably configured filter set. In the description which follows some further background information regarding OSD is provided to aid understanding of the invention. As will be evident to the person skilled in the art, the solutions presented by both of the different method types can be implemented as suitable signal processing stages and/or steps which generates the respective input signals for each of the loudspeakers of the array.
d
T=[dRdL] vT=[vRvL] wT=[wRwL] (1a,b,c)
and the inverse filter matrix and transmission path matrix are respectively defined by
where all variables are in the frequency domain and a harmonic time dependence of ejωt is assumed. Thus, v=Hd, =Cv, and w=CHd. Assuming the sources are point monopoles radiating into a free field, with volume accelerations respectively given by vR and vL, the transmission path matrix takes the form
where the distances between the assumed point sources and the ears of the listener are as shown in
where g=l1/l2 and Δl=l2−l1. If it is assumed that the target values of the reproduced signals at the listeners ears are given by
it follows that [1] the inverse filter matrix is given by simple inversion of the elements of the matrix C yielding
where the approximation Δl≤Δr sin θ has been used, assuming that l>>Δr. Takeuchi and Nelson [1] present the singular value decomposition of the matrix C (and thus the matrix H) and demonstrate that the two singular values are equal when
kΔr sin θ=nπ/2 (7)
where n is an odd integer (1, 3, 5 . . . ). Under these circumstances, the inversion problem is intrinsically well-conditioned and reproduction is accomplished with minimal error. Note that this condition is equivalent to the difference in path lengths Δl being equal to odd integer multiples of one quarter of an acoustic wavelength λ. Since the angle 2θ is equal to the angular span of the sources (see
This gives a particularly simple form of filter matrix, and through the term −jg, involves simple inversion, 90-degree phase shift, and a small amplitude adjustment of the input signals.
As shown by Yairi et al [5] this idealised source distribution has some attractive radiation properties. This is best demonstrated by computing the far field pressure p(r,φ) radiated by a pair of sources with inputs determined by the above optimal inverse filter matrix. Furthermore, if one assumes that the desired signals for reproduction are given by dT=[1 0], then it follows that the source signals are given by
Therefore the pressure field is given by the sum of the two source contributions such that
which, writing h=r1/r2, can be written as
Now note that in the far field, where r1,r2>>d, the distance between the sources, then it follows from the geometry of
and therefore that
The squared modulus of the term in square brackets is given by
|1−jghe−jkd sin φ|2=1+(gh)2−2gh sin(kd sin φ) (14)
and therefore the modulus squared of the pressure field can be written as
Now note that as r increases, such that we can make the approximation r≈r1≈r2, one can also make the approximations g≈h≈1 and this expression becomes
and therefore maxima and minima are produced in the sound field when kd sin φ=nπ/2 where n is odd. The term (1−sin(kd sin φ)) becomes zero at n=1, 5, 9 . . . etc. and is equal to two when n=3, 7, 11 . . . etc. The form of the squared modulus of the sound pressure as a function of the angle φ is illustrated in
It is to be noted that this directivity pattern of the far field pressure is the same at all frequencies. This is because the value of kΔr sin θ=nπ/2 where n is an odd integer, and since from the geometry of
Reference is made to the case illustrated in
The vector wB is of order P and defines the reproduced signals at a number of pairs of points in the sound field at which cross-talk cancellation is sought. Thus wB=Bv where B defines the P×M transmission path matrix relating the strength of the M sources to these reproduced signals. The vector wA is of order N and defines the reproduced signals sampled at the remaining points in the sound field. Thus N=L−P and the reproduced field at these remaining points can be written as wA=Av, where A is the N×M transmission path matrix between the sources and these points.
Now note that the desired pressure at the P points in the sound field at which cross-talk cancellation is required can be written as the vector ŵB. This vector can in turn be written as ŵB=Dd, where the matrix D defines the reproduced signals required in terms of the desired signals. As a simple example, suppose that cross-talk cancellation is desired at two pairs of points in the sound field such that
The matrix D has elements of either zero or unity, is of order P×2, and may be extended by adding further pairs of rows if cross talk cancellation is required at further pairs of points. Similar to the analysis presented above, we assume that the inputs to the sources are determined by operating on the two desired signals defined by the vector d via an M×2 matrix H of inverse filters. The task is to find the source strength vector v that generates cross-talk cancellation at multiple pairs of positons in the sound field, guided by the observation of the directivity pattern illustrated in
Assume that cross-talk cancellation is required at the specific sub-set of all the points in the sound field where the number P of points in the sound field is smaller than the number of sources M available to reproduce the field. One can then seek to ensure that we make wB=ŵB=Dd whilst minimising the “effort” Ξv∥22 made by the acoustic sources. The problem is thus
min∥v∥22 subject to ŵB=Dd (19)
where ∥ ∥2 denotes the 2-norm. The solution [8, 9] to this minimum norm problem is given by the optimal vector of source strengths defined by
v
opt
=B
H[BBH]−1Dd (20)
where the superscript H denotes the Hermitian transpose. Thus a possible solution to the problem can be found that requires only specification of the points at which cross-talk cancellation is required in the sound field. A sensible approach might be to use the directivity of the OSD as a guide to the choice of angular location these points. Note that it is also possible to include a regularization factor α into this solution such that
v
opt
=B
H[BBH+αI]−1Dd (21)
where I is the identity matrix.
A further approach to the exploitation of the known properties of the optimal source distribution is to attempt not only to achieve cross-talk cancellation at multiple pairs of points as in the case above, but also to attempt a “best fit” of the radiated sound field to the known directivity function of the OSD. In this case, the problem is a least squares minimisation with an equality constraint. Thus, as above, a sub-set of desired signals at pairs of points in the sound field is defined by ŵB=Dd. The aim is also to minimise the sum of squared errors between the desired sound field and the reproduced field at the remaining L−P=N points of interest (see
Here it is assumed that the number N of these points is greater than the number of sources M. The desired sound field at these points can be chosen to emulate the directivity of the OSD at angular positions between those at which cross-talk cancellation is sought. One therefore seeks the solution of
min∥Av−ŵA∥22 subject to ŵB=Bv (22)
Before moving to exact solutions, it is worth noting that Golub and Van Loan [9] point out that an approximate solution to the linearly constrained least squares problem is to solve the unconstrained least squares problem given by
They demonstrate that the solution to this problem, which is a standard least squares problem, converges to the constrained solution as γ→∞. They also point out, however, that numerical problems may arise as γ becomes large.
An exact solution to this problem can deduced by following Golub and Van Loan [9], but working with complex variables and replacing the matrix transpose operation by the complex conjugate transpose. The method relies on the use of the use of the QR-factorisation of the matrix BH where
where BH is M×P, Q is an M×M square matrix having the property QHQ=QQH=I and R is an upper triangular matrix of order P×P and the zero matrix is of order (M−P)×P. Now define
where the matrix A1 is N×P, the matrix A2 is N×(M−P), and the vectors y and z are of order P and M−P respectively. As shown in Appendix 1, the optimal solution to the least squares problem can be written as
and partitioning the matrix Q such that vopt=Q1γopt+Q2zopt enables the solution to be written as
v
opt
=Q
2
A
2
†
*ŵ
A+(Q1RH-1−Q2A2†A1RH-1)ŵB (27)
where A2†=[A2HA2]−1A2H is the pseudo inverse of the matrix A2. This enables the calculation of the optimal source strengths in the discrete approximation to the OSD in terms of the signals ŵB reproduced at the points of cross talk cancellation, and the remaining signals ŵA specified by the directivity of the OSD. Note that it is also possible to include a regularization parameter into the computation of the matrix A2† such that
A
2
†=[A2HA2+ηI]−1A2H (28)
where η is the regularization parameter.
Whilst the above solution provides a compact and efficient means for solving the problem at hand, it is shown in Appendix 2 that it is also possible to derive a solution that is expressed explicitly in terms of the matrices A and B. The solution can be derived by using the method of Lagrange multipliers where one seeks to minimise the cost function given by
J=(Av−ŵA)H(Av−ŵA)+(Bv−ŵB)Hμ+μH(Bv−ŵB)+βvHv (29)
where μ is a complex vector of Lagrange multipliers and the term β is used to penalise the “effort” associated with the sum of squared source strengths. As shown in Appendix 2, the minimum of this function is defined by
v
opt=[I−ÃBH[BÃBH]−1B]A†ŵA+ÃBH[BÃBH]−1ŵB (30)
where the matrices à and A† are respectively defined by
Ã=[AHA+βI]−1 and A†=[AHA+βI]−1AH (31)
This solution has also been derived by Olivieri et al [10], although the solution presented by those authors differs from that above by a factor ½ that multiplies the first term in square brackets above.
Various applications of the solution types/methodologies set out above are now described.
In what follows, the geometry chosen for illustrative numerical simulations is that depicted in
A straightforward solution to achieving a good fit to the OSD sound field is to allocate pairs of sources to given frequency ranges and to apply band-pass filters to the signals to be reproduced prior to transmission via each source pair. In order to achieve this it is helpful to allocate the source pairs in a logarithmic spatial arrangement so that there is a higher concentration of sources at the centre of the source array (that transmit higher frequencies) whilst the concentration of sources is reduced towards the ends of the array (where sources transmit lower frequencies). As an example,
Numerical simulation of the minimum norm solution given by equation (21) above shows that this provides a viable method for determining the source strengths necessary to ensure cross-talk cancellation at multiple positions in the sound field.
In all cases it has been found that the minimum norm solution provides satisfactory solutions when a suitable value of the regularization parameter α is chosen.
Three potential methods of solution have been presented above. Extensive numerical simulations have revealed that that the extension to the unconstrained solution produces results for the optimal source strength vector that converge to the results produced by the QR factorization method without any regularization. Furthermore, the QR factorization method and the Lagrange multiplier method produce identical results when identical regularization parameters are used (i.e. η and β respectively are chosen to be the same). Finally, it has been found that as the value of the regularization parameters η and β are increased, both the QR factorization method and the Lagrange multiplier method approach the minimum norm solution.
First note that the matrix A is badly-conditioned at low frequencies for all four cases as illustrated in
Application of the Linearly Constrained Solution with Enhanced Conditioning
The conditioning of the matrix A can be improved significantly by placing the field points at which ŵA is defined closer to the source array as illustrated in
The regularization methods used in the numerical studies described are of particular use to achieve viable solutions, especially in the constrained least squares approach, although regularization of the minimum norm solution can have particular benefit at low frequencies. It is also possible to refine further these solutions using regularization approached that promote sparse solutions. That is, the number of sources participating in the solutions is minimized. Full details of these approaches, including the algorithms used to implement them are described in Appendices 4 and 5. The application of these methods help to identify better the sources within the array that are most important to the process of ensuring that the required solution is delivered.
In summary, as described above the Optimal Source Distribution (OSD) is a symmetric distribution of pairs of point monopole sources whose separation varies continuously as a function of frequency in order to ensure at all frequencies a path length difference of one-quarter of an acoustic wavelength between the source pairs and the ears of a listener. The field of the OSD has a directivity function that is independent of frequency that in principle can produce cross-talk cancellation at a number of listener positions simultaneously over a wide frequency range. As demonstrated above we have shown that the problem of approximating the field of the OSD with a discrete number of transducers can be solved using either a minimum norm solution or a linearly constrained least squares approach. The minimum norm solution is effective in delivering cross-talk cancellation at the required field points with minimum source effort. The constrained least squares solution also delivers the required cross-talk cancellation at the required field points and tends also to produce a replica of the OSD sound field. Sparse solutions can also be beneficially used to better identify the most important sources required. The above embodiments allow multiple listeners to simultaneously experience virtual sound imaging from an array of speakers
Using the definitions given in the main text, it can be shown that
Using the property QQH=I also shows that
This enables the problem to be transformed into a minimisation problem where one seeks to minimise ∥A1y+A2z−ŵA∥2 subject to RHy=ŵB. Then since y=RH-1ŵB the minimisation problem reduces to
min∥A2z−(ŵA−A1y)∥22=min∥A2z−(ŵA−A1RH-1ŵB)∥22 (A1.3)
This can be solved for z to give the following solution for the optimal source strength vector
where the least squares solution to the minimisation problem involving the vector z can be written as
z
opt=[A2HA2]−1A2H(ŵA−A1RH-1ŵB) (A1.5)
and the modified constraint equation above gives
y
opt
=R
H-1
ŵ
B (A1.6)
Now note that this solution can be written explicitly in terms of the optimal vector of source strengths by partitioning of the matrix Q such that
v
opt
=Q
1
y
opt
+Q
2
z
opt (A1.7)
Also writing the pseudo inverse of matrix A2 as [A2HA2]−1A2H=A2†, enables the solution to be written as
v
opt
=Q
1
R
H-1
ŵ
B
+Q
2
A
2
†(ŵA−A1RH-1ŵB) (A1.8)
and therefore as
v
opt
=Q
2
A
2
†
ŵ
A+(Q1RH-1−Q2A2†A1RH-1)ŵB (A1.9)
It is interesting to note that there may be other possibilities for the specification of the reproduced signals ŵA other than the radiation pattern associated with the OSD. Thus note that the expression for the optimal source strength vector given above can be written as
v
opt
=Xŵ
B
+Yŵ
A (A2.1)
where the matrices X and Y are respectively defined by
X=Q
1
R
H-1
+Q
2
A
2
†
A
1
R
H-1
, Y=Q
2
A
2
† (A2.2)
The effort made by the acoustic sources, given by ∥v∥22, can be written as
∥vopt∥22=voptHvopt=(XŵB+YŵA)H(XŵB+YŵA) (A2.3)
which when expanded shows that
∥vopt∥22=voptHvopt=ŵAHYHYŵA+ŵAHYHXŵB+ŵBHXHXŵB+ŵBHXHYŵA (A2.4)
which is a quadratic function of ŵA minimised by
ŵ
A=−[YHY]−1YHXŵB (A2.5)
It would therefore be useful to compute these values of the signals at the radiated field points that ensures that the source effort (as exemplified by the sum of squared magnitudes of the source strengths) is minimised, whilst still ensuring that the constraint is satisfied. The feasibility and results of such a computation will of course depend on the properties of the matrices X and Y.
Another method for determining the solution to
min∥Av−ŵA∥22 subject to ŵB=Bv (A3.1)
is to use the method of Lagrange multipliers, which is widely used in the solution of constrained optimisation problems. The analysis presented here is similar to that presented previously by Olivieri et al [10] and by Nelson and Elliott [8]. The analysis begins by defining a cost function J given by
J=(Av−ŵA)H(Av−ŵA)+(Bv−ŵB)Hμ+μH(Bv−ŵB)+βvHv (A3.2)
where μ is a complex vector of Lagrange multipliers and the term βvHv is included, as will become apparent, in order to regularise the inversion of a matrix in the solution. The derivatives of this function with respect to both v and μ are defined by
where v=vR+jv1 and μ=μR+jμ1. The following identities can be deduced from the analysis presented by Nelson and Elliott [8] (see the Appendix):
Expanding the first term in the above expression for the cost function J gives
J=v
H
A
H
Av−v
H
A
H
ŵ
A
−ŵ
A
H
Av+ŵ
A
H
ŵ
A+(Bv−ŵB)Hμ+μH(Bv−ŵB)+βvHv (A3.5)
and using the above identities shows that the minimum in the cost function is given by
Note that these equations can also be written in matrix form as
and are sometimes known as the Karush-Kuhn-Tucker (KKT) conditions. Rearranging the first of these equations shows that
[AHA+βI]v=AHŵA−BHμ (A3.9)
and therefore
v=[AHA+βI]−1(AHŵA−BHμ) (A3.10)
The above relationship can be solved for the vector μ of Lagrange multipliers by using Bv=ŵB so that
B[AHA+βI]−1(AHŵA−BHμ)=ŵB (A3.11)
Further manipulation can be simplified by defining the following expressions:
A
†=[AHA+βI]−1AH, and Ã=[AHA+βI]−1 (A3.12a,b)
These enable the above equation to be written as
BA
†
ŵ
A
−BÃB
H
μ=ŵ
B (A3.13)
It then follows that the expression for the vector of Lagrange multipliers can be written as
μ=[BÃBH]−1(BA†ŵA−ŵB) (A3.14)
Substituting this into the expression for the source strength vector yields the optimal value given by
v
opt
=A
†
ŵ
A
−ÃB
H[BÃBH]−1(BA†ŵA−ŵB) (A3.15)
A little rearrangement shows that this expression can also be written as
v
opt=[I−ÃBH[BÃBH]−1B]A†ŵA+ÃBH[BÃBH]−1ŵB (A3.16)
It is worth noting that in the absence of the linear constraint, such that BŵB=0, then the solution reduces to the usual regularised least squares solution
v
opt
=A
†
ŵ
A=[AHA+βI]−1AHŵA (A3.17)
In making use of the above solutions, it may be helpful to encourage so called “sparse” solutions that reflect the desirability of using a number of loudspeakers that is as small as possible in order to deliver the desired result. Considerable work in recent years has been aimed at the solution of such problems, one approach to which is the introduction of a term into the cost function to be minimised that, in this case, is proportional to the one norm of the vector v whose solution is sought. The one norm of the vector is defined by
∥v∥1=Σm=1M|vm| (A4.1)
where |vm| is the modulus of the m′th element of the complex vector v. The introduction of such a term gives a cost function whose minimisation is known to promote a sparse solution, a typical example of which is the “least absolute shrinkage and selection operator” (LASSO) [11], which in terms of the current variables of interest, can be written in the form
min[½∥{tilde over (C)}v−{tilde over (w)}∥22+v∥v∥1] (A4.2)
where the matrix {tilde over (C)} and vector {tilde over (w)} are used to represent the terms in the linearly constrained least squares solution given in section 6 above, and v is a parameter that is used to trade off the accuracy of the solution against its sparsity. It is worth noting that whilst the term ∥v∥1 itself is not differentiable (where the elements of v are equal to zero), and thus the usual apparatus for the analytical solution of such minimisation problems is not available, the function is nonetheless convex and has a unique minimum. There are many algorithms available to solve this problem when the variables involved are real (see for example, [12]) although less work has been done on the case where the variables are complex (see [13-15] for some examples in the complex case).
A technique that has become popular in recent years for finding the minimum of the above cost function is to use so-called proximal methods [16]. These are particularly suited to problems where the dimensionality is large, although the simplicity of the algorithms involved make them more generally attractive. Note that the function to be minimised can be written as
min F(v)=[½∥{tilde over (C)}v−{tilde over (w)}∥22+v∥v∥1]=ƒ(v)+g(v) (A4.3)
where ƒ(v) and g(v) are respectively the two-norm and one-norm terms. First consider the minimisation of ƒ(v). The proximal operator for a given function ƒ(v) is defined as [16]
where this function effectively defines a compromise between minimising ƒ(v) and being near to the point z, and the parameter τ in this case is used to quantify the compromise. The minimum of this function can be found easily analytically, and using the definition of the gradient of these functions with respect to the complex vector v, together with other results in Appendix 3, shows that
where the gradient of the function ∇ƒ(v)={tilde over (C)}H ({tilde over (C)}v−w). Equating this result for the gradient to zero shows that v=z−τ∇ƒ(v) and that the proximal operator can be written as
proxƒ(z)=z−τ∇ƒ(v) (A4.6)
In this case the “proximal algorithm” for finding the minimum of the function is simply an expression of the gradient descent method (the “proximal gradient method”) where the (k+1)′th iteration for finding the minimum is expressed as
v
k+1
=v
k−τ∇ƒ(vk) (A4.7)
Whilst the minimisation of the function ƒ(v) via proximal methods appears trivial, the minimisation of the function g(v)=v∥v∥1 is less straightforward. Nonetheless, despite the lack of differentiability of the function, it is possible to derive an appropriate proximal operator [16]. In the case of a real vector variable, the proximal operator is the “soft thresholding” operator applied to each (real) element zi of the (real) vector z in turn:
This operator is often referred to as the “shrinkage operator” and can also be written compactly in the form
where sgn(zm)=zm/∥zm| is the sign operator. It turns out that, when written in this form, the same shrinkage operator is applicable to complex vectors where |zm| is the modulus of the complex number zm. A full derivation of this proximal operator in the case of complex variables is given by Maleki et al [16] and has been used by a number of other authors in finding solutions to what is effectively the complex LASSO problem (see for example [14, 15]).
A particular algorithm for minimising the function F(v)=ƒ(v)+g(v) that makes use of the two proximal operators given above is known as the “iterative soft-thresholding algorithm” (ISTA)[17]. This can be written in the form of a two “half step” process using each of the iterations above such that
v
k+1/2
=v
k−τ∇ƒ(vk) (A4.10)
v
k+1
=S
α(vk+1/2) (A4.11)
However, the algorithm is very often compactly written in the form of a single operation given by
v
k+1
=S
τα(vk−τ∇ƒ(vk)) (A4.12)
where the threshold in the above shrinkage operator is given by the product of α and τ. The algorithm is sometimes referred to as a “forward-backward splitting algorithm”, given that it implements a forward gradient step on ƒ(v), followed by a backward step on g(v). It has also been shown that the speed of convergence of this algorithm can be greatly enhanced by some simple modifications to the step size. This results in the “fast iterative shrinkage-thresholding algorithm” (FISTA) [18].
A very similar algorithm to that described above has been derived in order to further promote sparsity of the solution through the addition of a term proportional to the “zero norm” of the vector v. This is simply the number of non-zero elements of the vector and can be written as ∥v∥0. The cost function for minimisation in this case can be written in the form
min[∥{tilde over (C)}v−{tilde over (w)}∥22+p∥v∥0] (A5.1)
Whilst the LASSO function described above is known to have a unique minimum, it is also known that this cost function, including the zero norm term, does not have a uniquely defined minimum. Nonetheless a very compact and useful algorithm has been derived by Blumensath and Davies[19, 20] that will find local minima in this cost function using a similar approach to that described above. In this case, the algorithm is known as an “iterative hard-thresholding algorithm” and has the form
v
k+1
=H
ρ(vk−∇ƒ(vk)) (A5.2)
where ∇ƒ(vk)={tilde over (C)}H({tilde over (C)}v−w) and Hρ is the hard-thresholding operator defined by
A further useful algorithm has been derived [19] that provides a means of finding at least local minima in the cost function defined by
min[∥{tilde over (C)}v−{tilde over (w)}∥22+∥v∥0≤M] (A5.4)
where M defines a desired upper bound on the number of non-zero elements of the vector v. The appropriate algorithm in this case is given by
v
k+1
=H
M(vk−∇ƒ(vk)) (A5.5)
where HM is a non-linear operator that only retains the M coefficients with the largest magnitude defined by
In this case, the threshold ρM0.5(v) is set to the largest absolute value of vk−{tilde over (C)}H({tilde over (C)}vk−w) and if less than M values are non-zero, ρM0.5(v) is defined to be the smallest absolute value of the non-zero coefficients. This algorithm was described by its originators as the “M-sparse algorithm”[19] and provides another means of finding a solution that limits the number of loudspeakers required to meet the objective of replicating the desired sound field.
Accepting that only local minima may be identified in the cost functions involving the “zero norm”, it may assist in the search for good solutions by repeating the iterative search processes with a range of initial conditions. Other techniques such as simulated annealing [21] may be used in an “outer loop” to the above algorithms that should enable a controlled statistically based search process that prevents such algorithms from becoming trapped in local minima.
Number | Date | Country | Kind |
---|---|---|---|
1916857.4 | Nov 2019 | GB | national |