Binaural audio with loudspeakers (BAL), also known as transauralization, aims to reproduce, at the entrance of each of the listener's ear canals, the sound pressure signals recorded on only the ipsilateral channel of a stereo signal. That is, only the sound signal of the left stereo channel is reproduced at the left ear and only the sound signal of the right stereo channel is reproduced at the right ear. For example, if the source signal was encoded with a head-related transfer function (HRTF) of the listener, or includes the proper interaural time difference (ITD) and interaural level difference (ILD) cues, then delivering the signal on each of the channels of the stereo signal to the ipsilateral ear, and only to that ear, would ideally guarantee that the car-brain system receives the cues it needs to hear an accurate 3-dimensional (3-D) reproduction of a recorded soundfield.
However, an unintended consequence of binaural audio playback through loudspeakers is crosstalk. Crosstalk occurs when the left ear (right ear) hears sounds from the right (left) audio channel, originating from the right speaker (left speaker). In other words, crosstalk occurs when the sound on one of the stereo channels is heard by the contralateral ear of the listener.
Crosstalk corrupts HRTF information and ITD or ILD cues so that a listener may not properly or completely comprehend the soundfield's binaural cues that are embedded in the recording. Therefore, approaching the goal of BAL requires an effective cancellation of this unintended crosstalk, i.e. crosstalk cancellation or XTC for short.
While there are various techniques for effecting some level of crosstalk cancellation (XTC) for a two loudspeaker system, they all have one or more of the following drawbacks:
The above drawbacks can be seen by analyzing XTC using the most fundamental formulation of the XTC problem—that is by looking at the inverse of the system transfer matrix (as will be shown and discussed below) that describes sound propagation from the loudspeakers to the ears of the listener.
While the technique of constant parameter (non-frequency dependent) regularization, commonly used in XTC filter design to make the inversion of the system transfer matrix better behaved, may alleviate some of Drawback D3, it inherently introduces spectral artifice of its own (specifically, at the expense of reducing the amplitude of the spectral peaks in the inverted transfer matrix, constant-parameter regularization results in undesirable narrow-band artifacts at higher frequencies and a rolloff at lower frequencies at the loudspeakers) and does little to alleviate the other two drawbacks (D1 and D2).
Prior art frequency-dependent regularization, even when coupled with an effective optimization scheme, is not enough to deal away with Drawbacks D1, D2 and D3.
Previous XTC filter design methods based on system transfer matrix inversion (with or without regularization) strive to maintain a flat amplitude vs. frequency response at the ears of the listener by imposing a non-flat amplitude vs frequency response at the loudspeakers (as explained below), which causes a loss in the dynamic range of the processed sound, and, for reasons that will be explained below, leads to a spectral coloration of the sound as heard by the listener, even if the listener is sitting in the intended sweet spot.
Therefore, while previous methods are useful for designing XTC filters that can inherently correct for non-idealities in the amplitude vs frequency response of the playback hardware and loudspeakers, they do not address all of Drawbacks D1, D2 and D3.
A method and system for calculating the frequency-dependent regularization parameter (FDRP) used in inverting the analytically derived or experimentally measured system transfer matrix for crosstalk cancellation (XTC) filter design is described. The method relies on calculating the FDRP that results in a flat amplitude vs frequency response at the loudspeakers (as opposed to a flat amplitude vs frequency response at the ears of the listener, as inherently done in prior art methods) thus forcing XTC to be effected into the phase domain only and relieving the XTC filter from the drawbacks of audible spectral coloration and dynamic range loss. When the method is used with any effective optimization scheme it results in XTC filters that yield optimal XTC levels over any desired portion of the audio band, impose no spectral coloration on the processed sound beyond the spectral coloration inherent in the playback hardware and/or loudspeakers, and cause no dynamic range loss. XTC filters designed with this method and used in the system are not only optimal but, due to their being free from Drawbacks D1, D2 and D3, allow for a most natural and spectrally transparent 3D audio reproduction of binaural or stereo audio through loudspeakers. The method and system do not attempt to correct the spectral characteristics of the playback hardware, and therefore are best suited for use with audio playback hardware and loudspeakers that are designed to meet a desired spectral fidelity level without the help of additional signal processing for spectral correction.
A more detailed understanding of the present invention may be had from the following detailed description which should read in light of the accompanying drawings wherein:
In order to explain the advantages of the method and system of the present invention an analytical formulation of the fundamental XTC problem in an idealized situation will be described and the “perfect XTC filter” will be defined, which will serve as a benchmark illustrating the severe problem of audible spectral coloration inherent to all XTC filters.
In the following description, for the sake of clarity and to allow analytical insight, an idealized situation will be used consisting of two point sources (idealized loudspeakers) 12, 14 in free space (no sound reflections) and two listening points 16, 18 corresponding to the location of the ears of an idealized listener 20 (no HRTF). However, in the example given following the description of the invention, actual data corresponding to the impulse responses of real loudspeakers in a real room measured at the ear canal entrances of a dummy head will be used.
Formulation of the Fundamental XTC Problem
In the frequency domain, the air pressure at a free-field point located a distance r from a point source (monopole) radiating a sound wave of frequency ω, under the idealizing assumptions that sound propagation occurs in a free field (with no diffraction or reflection from the head and pinnae of the listener or any other physical objects), and that the loudspeakers radiate like point sources, is given by:
where ρo is the air density, k2π/λ=ω/cs is the wavenumber, λ is the wavelength, cs is the speed of sound (340.3 m/s), and q is the source strength (in units of volume per unit time). Defining the mass flow rate of air from the center of the source, V, as:
which is the time derivative of
in the symmetric two-source geometry shown in
Similarly, at the right ear 18 of the listener 20 the following is the sensed pressure:
Here, l1 and l2 are the path lengths between any of the two sources 12, 14 and the ipsilateral and contralateral ear, respectively, as shown in
Throughout this specification, uppercase letters represent frequency variables, lowercase represent time-domain variables, uppercase bold letters represent matrices, and lowercase bold letters represent vectors, and define
Δl≡l2−l1 and g≡l1/l2 (3)
as the path length difference and path length ratio, respectively.
Because the contralateral distance in the geometry of
where Δr is the effective distance between the entrances of the ear canals, and l is the distance between either source and the interaural mid-point of the listener. As defined in
defined as the time it takes a sound wave to traverse the path length difference Δl.
Using equations (1) and (2), the received signal at the listener's left ear 16 and the received signal at the listener's right ear 18 may be written in vector form as:
which, in the time domain, is a transmission delay (divided by the constant l1) that does not affect the shape of the received signal. The source vector at the loudspeaker comprising a left channel, VL, and a right channel, VR, is written in vector form as v=[VL(iω),VR(iω)]T. v may be obtained from the two channels of “recorded” signals, denoted d=[DL(iω),DR (iω)]T, using the transformation
is the sought 2×2 filter or transformation matrix for XTC. Therefore, from Eq. (7), the following result may be obtained
p=αCHd (11)
where p=[PL(iω),PR(iω)]T is the vector of pressures at the ears, and C is the system's transfer matrix
which is symmetric due to the symmetry of the geometry shown in
In summary, the transformation from the signal d, through the filter H, to the source variables v, then through wave propagation from the loudspeaker sources to pressure, p, at the ears of the listener, can be written as
where the performance matrix, R, is defined as
The diagonal elements of R (i.e., RLL(iω) and RRR(iω)) represent the ipsilateral transmission of the recorded sound signal to the ears, and the off-diagonal elements (i.e., RRL(iω) and RLR(iω)) represent the undesired contralateral transmission, i.e., the crosstalk.
Performance Metrics
A set of metrics by which to judge the spectral coloration and performance of XTC filters will now be described. The amplitude spectrum (to a factor α) of a signal fed to only one (either left or right) of the two inputs of the system, as heard at the ipsilateral ear is
Esi∥(ω))≡|RLL(iω)|=|RRR(iω)|
where the subscripts “si” and ∥ stand for “side image” and “ipsilateral ear (with respect to the input signal)”, respectively, since Esi∥, as defined, is the frequency response (at the ipsilateral ear) for the side image that would result from the input being panned to one side. Similarly, at the contralateral ear to the input signal (subscript X), the following is the side-image frequency response:
Esi
The system's frequency response at either ear when the same signal is split equally between left and right inputs is another spectral coloration metric:
Here the subscript “ci” stands for “center image” since Eci, as defined, is the frequency response (at either ear) for the center image that would result from the input being panned to the center.
Also of importance are the frequency responses that would be measured at the sources (i.e., the loudspeakers), which are denoted by S and may be obtained from the elements of the filter matrix H:
They are given using the same subscript convention used with the amplitude spectrum above (with “∥” and “X” referring to the loudspeakers that are ipsilateral and contralateral to the input signal, respectively). An intuitive interpretation of the significance of the above metrics is that a signal panned from a single input to both inputs to the system will result in frequency responses going from Esi to Eci at the ears, and Ssi to Sci at the loudspeakers.
Two other spectral coloration metrics are the frequency responses of the system to in-phase and out-of-phase inputs to the system. These two responses are given by:
Si(ω)≡|HLL(iω)+HLR(iω)|=|HRL(iω)+HRR(iω)|
So(ω)≡|HLL(iω)−HLR(iω)|=|HRL(iω)−HRR(iω)|
The subscripts i and o denote the in-phase and out-of-phase responses, respectively. Note that, as defined, Si is double (i.e., 6 dB above) Sci, as the latter describes a signal of amplitude 1 panned to center (i.e., split equally between L and R inputs), while the former describes two signals of amplitude 1 fed in phase to the two inputs of the system.
Since a real signal can comprise various components having different phase relationships, it is useful to combine Si(ω) and So(ω) into a single metric, Ŝ(ω), which is the envelope spectrum that describes the maximum amplitude that could be expected at the loudspeakers, and is given by
Ŝ(Ω)≡max[Si(ω),So(ω)].
It is relevant to note that Ŝ(ω) is equivalent to the 2-norm of H, ∥H∥, and that Si and So are the two singular values of H.
Finally, an important metric that will allow for the evaluation and comparison of the XTC performance of various filters is χ(ω), the crosstalk cancellation spectrum:
It is the ratio of the amplitude spectrum at the ipsilateral ear to the amplitude spectrum at the contralateral ear and, therefore, the greater the value of the crosstalk cancellation spectrum, χ(ω), the more effective is the crosstalk cancellation filter. The above definitions give a total of eight metrics, (Esi
Benchmark: Perfect Crosstalk Cancellation
A perfect crosstalk cancellation (P-XTC) filter may be defined as one that, theoretically, yields infinite crosstalk cancellation at the ears of the listener, for all frequencies. Crosstalk cancellation requires that the received signal at each of the two ears be that which would have resulted from the ipsilateral signal alone. Therefore, in order to achieve perfect cancellation of the crosstalk, Eq. (13) requires that R═CH═I, where I is the unity matrix (identity matrix), and thus, as per the definition of R in Eq. (14), the P-XTC filter is the inverse of the system transfer matrix expressed in Eq. (12), and may be expressed exactly:
where the superscript [P] denotes perfect XTC. For this filter, the eight metrics defined above become:
The perfect XTC filter (χ[P]=∞) gives flat frequency responses at the ears (as evidenced by the constant Esi
The extent of spectral coloration at the loudspeakers is plotted in
The peaks in the Ssi
Using the first and second derivatives (with respect to ωτc) of the expressions for the various spectra, the amplitudes and frequencies for the associated peaks, denoted by the superscript ⇑, and minima, denoted by the superscript ↓, are given by:
For a typical listening set-up, g≈1, say, a reference g=0.985 case shown in
(and the peaks in the other spectra,
correspond to boosts of about 30.5 dB.) While these boosts have equal frequency widths across the spectrum, when the spectrum is plotted logarithmically (as is appropriate for human sound perception), the low-frequency boost is most prominent in its perceived frequency extent. This low frequency (i.e., bass boost) has been recognized as an intrinsic problem in XTC. While the high-frequency peaks could, in principle, he pushed out of the audio range by decreasing τc (which, as can be seen from Eqs. (4) to (6), is achieved by increasing l and/or decreasing the loudspeaker span, Θ, as is done in the so-called “Stereo Dipole” configuration, where Θ may be 10°), the “low frequency boost” of the P-XTC filter would remain problematic.
The severe spectral coloration associated with these high-amplitude peaks presents three practical problems: 1) it would be heard by a listener outside the sweet spot, 2) it would cause a relative increase (compared to unprocessed sound playback) in the physical strain on the playback transducers, and 3) it would correspond to a loss in the dynamic range.
These penalties might be a justifiable price if infinitely good XTC performance (χ=∞) and perfectly flat frequency response (E[P](ω)=constant) that the perfect XTC filter promises were guaranteed at the ears of a listener in the sweet spot. However, in practice, these theoretically promised benefits are unachievable due to the solution's sensitivity to unavoidable errors. This problem can best be appreciated by evaluating the condition number of the transfer matrix C.
It is well known that in matrix inversion problems the sensitivity of the solution to errors in the system is given by the condition number of the matrix. The condition number κ(C) of the matrix C is given by
κ(C)=∥C∥ ∥C−1∥=∥C∥ ∥H[P]∥.
(It is also, equivalently, the ratio of largest to smallest singular values of the matrix.) Therefore, we have
Using the first and second derivatives of this function, as was done for the previous spectra, the following are the maxima and minima:
First, it is noted that the peaks and minima in the condition number occur at the same frequencies as those of the amplitude envelope spectrum at the loudspeakers, Ŝ[P]. Second, it is noted that the minima have a condition number of unity (the lowest possible value), which implies that the XTC filter resulting from the inversion of C is most robust (i.e., least sensitive to errors in the transfer matrix) at the non-dimensional frequencies
Conversely, the condition number can reach very high values (e.g., κT(C)=132.3 for typical case of g=0.985) at the non-dimensional frequencies ωτc=0,π,2π,3π . . . . As g→1 the matrix inversion resulting in the P-XTC filter becomes ill-conditioned, or in other words, infinitely sensitive to errors. The slightest misalignment, for instance, of the listener's head, would thus result in a severe loss in XTC control at the ears (at and near these frequencies) which, in turn, causes the severe spectral coloration in Ŝ[P](ω) to be transmitted to the ears.
Deficiencies of Constant-Parameter Regularization
Regularization methods allow controlling the norm of the approximate solution of an ill-conditioned linear system at the price of some loss in the accuracy of the solution. The control of the norm through regularization can be done subject to an optimization prescription, such as the minimization of a cost function. Regularization may be discussed analytically in the context of XTC filter optimization, which may be defined as the maximization of XTC performance for a desired tolerable level of spectral coloration or, equivalently, the minimization of spectral coloration for a desired minimum XTC performance.
A pseudoinverse representing a nearby solution to the matrix inversion problem is sought:
H[β]=[CHC+βI]−1CH (22)
where the superscript H denotes the Hermitian operator, and β is the regularization parameter which essentially causes a departure from H[P], the exact inverse of C. β is taken to be a constant, 0<β<<1. The pseudoinverse matrix, H[β], is the regularized filter, and the superscript [β] is used to denote constant-parameter regularization. The regularization stated in Eq. (22) corresponds to a minimization of a cost function, J (iω),
J(iω)=eH(iω)e(iω)+βvH(iω)v(iω) (23)
where the vector e represents a performance metric that is a measure of the departure from the signal reproduced by the perfect filter. Physically, then, the first term in the sum constituting the cost function represents a measure of the performance error, and the second term represents an “effort penalty,” which is a measure of the power exerted by the loudspeakers. For β>0, Eq. (22) leads to an optimum, which corresponds to the least-square minimization of the cost function J(iω).
Therefore, an increase of the regularization parameter β leads to a minimization of the effort penalty at the expense of a larger performance error and thus to an abatement of the peaks in the norm of H, i.e., the coloration peaks in the S(ω) spectra, at the price of a decrease in XTC performance at and near the frequencies where the system is ill-conditioned.
Using the explicit form for C given by Eq. (12), the frequency response of the constant parameter regularization XTC filter becomes:
The eight metric spectra we defined herein become:
It is worth noting that as β→0, H[β]→H[P] and the spectra of the perfect XTC filter are recovered from the expressions above as expected.
The envelope spectrum, Ŝ[β](ω), is plotted in
To get a measure of peak attenuation and the conditions for the formation of doublet peaks, the first and second derivatives of Ŝ[β](ω) with respect to ωτc are used to find the conditions for which the first derivative is nil and the second is negative. These conditions are summarized as follows: If β is below a threshold β* defined as
β<β*≡(g−1)z. (29)
the peaks are singlets and occur at the same non-dimensional frequencies as for the envelope spectrum peaks of the P-XTC filter (Ŝ[P]⇑), and have the following amplitude:
If the condition
β*≦β=1 (30)
is satisfied, the maxima are doublet peaks located at the following non-dimensional frequencies:
and have an amplitude
which does not depend on g. (The superscripts ⇑ and ⇑⇑ denote singlet and doublet peaks, respectively.) The attenuation of peaks in the Ŝ[β] spectrum due to regularization can be obtained by dividing the amplitude of the peaks in the P-XTC (i.e., β=0) spectrum by that of peaks in the regularized spectrum. For the case of singlet peaks, the attenuation is
and for doublet peaks, it is given by
For the typical case of g=0.985 illustrated in
to either side of the peaks in the response of the perfect XTC filter. (For an illustrative case of g=0.935, it is found that β*=2.225×10−4 and Δ(ωτo); 0.225 for β=0.05). Due to the logarithmic nature of frequency perception for humans, these doublet peaks are perceived as narrow-band artifacts at high frequencies (i.e., for n=1, 2, 3, . . . ), but the first doublet peak centered at n=0 is perceived as a wide-band low-frequency rolloff of typically many dB, as can be clearly seen in
Since regularization is essentially a deliberate introduction of error into system inversion, it is expected that both the XTC spectrum and the frequency responses at the ears will suffer (i.e., depart from their ideal P-XTC filter levels of ∞ and 0 dB, respectively) with increasing β. The effects of constant-parameter regularization on responses at the ears are illustrated in
The black horizontal bars on the top axis mark the frequency ranges for which an XTC level of 20˜dB or higher is reached with β=0.05, and the grey bars represent the same for the case of β=0.005. (Other parameters are the same as for
The black curves in that plot represent the crosstalk cancellation spectra and show that XTC control is lost within frequency bands centered around the frequencies where the system is ill-conditioned (ωτc=nπ with n=0, 1, 2, 3, 4, . . . ) and whose frequency extent widens with increasing regularization. For example, increasing β to 0.05 limits XTC of 20 dB or higher to the frequency ranges marked by black horizontal bars on the top axis of that figure, with the first range extending only from 1.1 to 6.3 kHz and the second and third ranges located above 8.4 kHz. In many practical applications, such high (20 dB) XTC levels may not be needed or achievable (e.g., because of room reflections and/or mismatch between the HRTF of the listener and that used (e.g. dummy head) to design the filter, and the higher values of β needed to tame the spectral coloration peaks below a required level at the loudspeakers may be tolerated.
The
responses at the ears, shown as the bottom curves in
spectrum are given by:
For the typical (g=0.985) example shown in the figure, for
showing that even relatively aggressive regularization results in a spectral coloration at the ears that is quite modest compared to the spectral coloration the perfect XTC filter imposes at the loudspeakers.
In sum, while constant-parameter regularization, a commonly used technique in the design of XTC filters, is effective at reducing the amplitude of peaks (including the “low-frequency boost”) in the envelope spectrum at the loudspeakers, it typically results in undesirable narrow-band artifacts at higher frequencies and a rolloff of the lower frequencies at the loudspeakers. This non-optimal behavior can be avoided if the regularization parameter is allowed to be a function of the frequency, as described herein.
Spectral Flattening through Frequency-Dependent Regularization
The method and system of the present invention rely on the use of a specific scheme for calculating the frequency-dependent regularization parameter (FDRP) that would result in the flattening of the amplitude vs frequency spectrum measured at the loudspeakers and not at the ears of the listeners as is implicit in previous XTC filter designs that are based on the inversion of the system transfer matrix.
Flattening of the amplitude vs frequency spectrum measured at the loudspeakers, as opposed to at the ear of the listener, forces XTC to result from phase effects only, and not from amplitude effects, since the amplitude is flat with frequency at the loudspeakers. This means that any inherent spectral (i.e. amplitude vs frequency) coloration in the loudspeaker and/or playback hardware will not be corrected for (as is inherently done in previous inversion-based XTC filter design methods where the XTC filter aims to reproduce at the ears the same amplitude vs frequency response of the recorded the signal).
Flattening of the amplitude vs frequency spectrum measured at the loudspeakers, results in the listener hearing the same amplitude vs frequency response that would be heard without processing the sound through the XTC filter. This implies that the listener would not hear any spectral coloration beyond that due to the playback hardware and loudspeakers without the filter. Equally important is the fact that such a flat filter response at the loudspeakers also means no dynamic range loss in the processed audio.
In order to explain method and system of the present invention, an idealized analytical description of how to calculate a frequency-dependent regularization parameter will be described that results in the specific goal of flattening the XTC filter response at the loudspeakers.
Description of the Method of the Present Invention in the Context of the Idealized Model
For the sake of clarity, the same optimization scheme described with respect to the minimization of the cost function expressed in Eq. (23)) will be used, keeping in mind that the method and system of the present invention are completely independent of the adopted optimization scheme
In order to avoid the frequency-domain artifacts discussed above and illustrated in
Ŝ(ω)=γ if Ŝ[P](ω)≧γ (33)
where the P-XTC envelope spectrum, Ŝ[P](ω), is given by Eq. (16), and
γ=10Γ/20 (35)
with Γ given in dB. Γ cannot exceed the magnitude of the peaks in the Ŝ[P](ω) spectrum, γ is bounded by:
where the bound is the maxima of the Ŝ[P] spectra, Ŝ[P]⇑, given by Eq. (18).
The frequency-dependent regularization parameter needed to effect the spectral flattening required by Eq. (33) is obtained by setting Ŝ[β](ω), given by Eq. (27), equal to γ and solving for β(ω), which is now a function of frequency. Since the regularized spectral envelope, Ŝ[β](ω), (which is also ∥H[β]∥, the 2-norm of the regularized XTC filter) is the maximum of two functions, two solutions for β(ω) are obtained:
The first solution, βE(ω), applies for frequency bands where the out-of-phase response of the perfect filter (i.e., the second singular value, which is the second argument of the max□ function in Eq. (16)) dominates over the in-phase response (i.e., the first argument of that function):
Similarly, regularization with βII(ω) applies for frequency bands where Si[P]≧So[P]. Therefore, we must distinguish between three branches of the optimized solution: two regularized branches corresponding to β=β1(ω) and β=βH(ω), and one non-regularized (perfect-filter) branch corresponding to β=0. We call these Branch I, II and P, respectively, and sum up the conditions associated with each as follows:
Following this three-branch division, the envelope spectrum at the loudspeakers, Ŝ(ω), for the case of frequency-dependent regularization is plotted as the thick black curve in
which is also plotted (light solid curve) as a reference for the corresponding case of constant-parameter regularization. (We call a spectrum obtained with frequency-dependent regularization and one obtained with constant-β regularization “corresponding spectra,” if the peaks in Ŝ[β](ω), whether singlets or doublets, are equal to γ.)
It is seen from that figure that the low-frequency boost and the high-frequency peaks of the perfect XTC spectrum, which would be transformed into a low-frequency roll-off and narrow-band artifacts, respectively, by constant-β regularization, are now flat at the desired maximum coloration level, Γ. The rest of the spectrum, i.e., the frequency bands with amplitude below Γ, is allowed to benefit from the infinite XTC level of the perfect XTC filter and the robustness associated with relatively low condition numbers.
In the method of the present invention γ is specifically chosen to be at or below the value equal to the lowest value of the Ŝ[β](ω) spectrum, i.e.
Ŝ[P]↓≧γ (40)
as this would insure that the entire spectrum Ŝ[β](ω) is flat (i.e. the inequality in (34) does not hold and Branch P disappears) and XTC would be forced to be effected through phase effects only, resulting in no amplitude coloration due to XTC filtering and no dynamic range loss, all while insuring the minimization of whatever cost function is prescribed by the adopted optimization scheme (in this particular example, Eq. (23)).
Generalized Method
The above leads us to a general description of the method of the present invention in terms of specific steps that are taken in the XTC filter design procedure (the steps are also shown schematically in
In step 30, the system's transfer matrix in the frequency domain (i.e. matrix C as in Eq. (12) and the input 28) is inverted, either analytically (if it results from a tractable idealized model) or numerically (if it results from experimental measurements), using zero or a very small constant regularization parameter (large enough to avoid machine inversion problems) to obtain the corresponding perfect XTC filter, H[P].
In step 34 Γ is set equal to Γ*,be the lowest value (in dB) reached by the amplitude vs frequency response at the loudspeakers, Ŝ[P]↓ in Step 34. This is found from either Eq. (19) (or a similar equation resulting from another tractable analytical model) or from plotting the H[P] spectra (if the inversion was done numerically using actual measurements as in the example given further below) then calculate γ from γ*=10Γ*/20 (36).
In Step 38, the frequency-dependent regularization parameter (FDRP) β(ω) that would result in a flat frequency response at the loudspeakers is calculated, so that Ŝ[β](ω)=constant ≦γ* (as, for instance, is done by using Equations (37) and (38)) thus forcing XTC to be caused by phase effects only.
In Step 40, the FDRP thus obtained, β(ω), is used to calculate the pseudo-inverse of the system's transfer matrix (e.g. according to Eqn. (22)), which yields the sought regularized optimal XTC filter H[β] that has a flat frequency response at the loudspeakers. (Finally, if needed for applying the resulting filter through a time-base convolution, as is often done in practical XTC implementation), a time domain version (impulse response) of the filter is obtained in step 44 by simply taking the inverse Fourier transform of H[β] (output 42).)
It should be noted that in Step 38, if the FDRP is calculated so that Ŝ[β](ω)=constant ≦γ*, the spectral flattening occurs for a side image (i.e. a sound panned to either the left or right channel and thus would be perceived by a listener to be located at or near his or left or right ear when the XTC level is sufficiently high). However, the same method can be used to flatten the response at the loudspeakers for an image that is not a pure side image by simply requiring that S[β](ω)=constant ≦γ*, where S[β](ω) is the XTC filter's frequency response for an image of source panned anywhere between the left and right channels. For instance, to flatten for a central image, we set S[β]ci(ω), (given, for instance, by the equation preceding Eqn. 27) to a constant ≦γ*, and proceed with the steps of the method as outlined above. In this context it is relevant to mention that for some applications, for instance pop music recording where the lead vocal audio is panned dead center, it might be desirable to flatten the response for a center image, i.e. Sci(ω), (or an image of any other desired panning) in order to avoid coloration of that image. It should also be noted in that context that since Ŝ[β](ω)≧S[β](ω) only flattening the side image (i.e. setting Ŝ[β](ω)=constant ≦γ*) would result in no dynamic range loss due to the XTC filter. In other words, flattening for anything but the side image would incur a dynamic range loss that must be balanced by the benefit of a reduced spectral coloration for the desired panned image. For instance, for binaural recordings of real acoustic soundfields, which typically contain no dead-center panned images, flattening of the side image is advisable as this leads to no dynamic range loss.
Example Using a Measured Transfer Function.
An example based on the transfer function of two loudspeakers in a room measured by microphones placed at the ear canal entrances of a dummy head (Neumann KU-100) will now be described. The loudspeakers had a span of 60 degrees at the listening position, which was about 2.5 meters from each loudspeaker.
We contrast these curves with those curves in
The method described herein may be implemented in software, or firmware incorporated in a computer-readable storage medium for execution by a general purpose computer or a processor, such as a DSP chipset. Examples of suitable computer-readable storage mediums include a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs).
Embodiments of the present invention may be represented as instructions and data stored in a computer-readable storage medium. For example, aspects of the present invention may be implemented using Verilog, which is a hardware description language (HDL). When processed, Verilog data instructions may generate other intermediary data, (e.g., netlists, GDS data, or the like), that may be used to perform a manufacturing process implemented in a semiconductor fabrication facility. The manufacturing process may be adapted to manufacture semiconductor devices (e.g., processors) that embody various aspects of the present invention.
Suitable processors include, by way of example, a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, a graphics processing unit (GPU), a DSP core, a controller, a microcontroller, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), any other type of integrated circuit (IC), and/or a state machine, or combinations thereof.
While the foregoing invention has been described with reference to its preferred embodiments, various alterations and modifications will occur to those skilled in the art. All such alterations and modifications are intended to fall within the scope of the appended claims.
This application claims the benefit of U.S. provisional application No. 61/379,831 entitled “OPTIMAL CROSSTALK CANCELLATION FOR BINAURAL AUDIO WITH TWO LOUDSPEAKERS” filed on Sep. 3, 2010, the contents of which are hereby incorporated by reference herein.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2011/050181 | 9/1/2011 | WO | 00 | 3/1/2013 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2012/036912 | 3/22/2012 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6668061 | Abel | Dec 2003 | B1 |
6950524 | Nelson et al. | Sep 2005 | B2 |
20040170281 | Nelson et al. | Sep 2004 | A1 |
20050135643 | Lee et al. | Jun 2005 | A1 |
20050254660 | Sakurai et al. | Nov 2005 | A1 |
20090086982 | Kulkarni et al. | Apr 2009 | A1 |
20100202629 | Takeuchi et al. | Aug 2010 | A1 |
Number | Date | Country |
---|---|---|
2004-511118 | Apr 2004 | JP |
Entry |
---|
Bai, et al., “Optimal Design of Loudspeaker Arrays for Robust Cross-Talk Cancellation Using the Taguchi Method and the Genetic Algorithm,” The Journal of the Acoustical Society of America, vol. 117, No. 5, pp. 2802-2813 (May 1, 2005). |
Choueiri, Edgar Y., “Optimal Crosstalk Cancellation for Binaural Audio with Two Loudspeakers,” Princeton University, pp. 1-24 Retrieved from the Internet: [URL:http://www.princeton.edu/3D3A/Publications/BACCHPaperV4d.pdf] (Nov. 13, 2010). |
Supplementary European Search Report Issued by the European Patent Office for European Application No. EP 11 82 5672 mailed Mar. 10, 2014 (6 pgs.). |
International Search Report and Written Opinion Issued by the U.S. Patent and Trademark Office as International Searching Authority for International Application No. PCT/US2011/050181 mailed Dec. 23, 2011 (8 pgs.). |
Office Action issued by the Japan Patent Office for Japanese Patent Application No. 2013-527311 dated Apr. 27, 2015 (3 pgs.). |
Number | Date | Country | |
---|---|---|---|
20130163766 A1 | Jun 2013 | US |
Number | Date | Country | |
---|---|---|---|
61379831 | Sep 2010 | US |