This invention relates to sound reproduction systems.
The invention is particularly, but not exclusively, concerned with the stereophonic reproduction of sound whereby signals recorded at a plurality of points in the recording space such, for example, at the notional ear positions of a head, are reproduced in the listening space, by being replayed via three loudspeaker channels, the system being designed with the aim of synthesising at a plurality of points in the listening space an auditory effect obtaining at corresponding points in the recording space.
Binaural technology [1]-[3] is often used to present a virtual acoustic environment to a listener. The principle of this technology is to control the sound field at the listener's ears so that the reproduced sound field coincides with what would be produced when he is in the desired real sound field. One way of achieving this is to use a pair of loudspeakers (electro-acoustic transducers) at different positions in a listening space with the help of signal processing to ensure that appropriate binaural signals are obtained at the listener's ears [4]-[8].
It is also possible to use three channels of loudspeakers for binaural reproduction. It has been experimentally observed by several workers that the addition of another centre channel can improve the cross-talk cancellation achieved with two channel binaural reproduction systems. For example Miyoshi and Koizumi [9] presented a filter design technique for enhanced cross-talk cancellation when three loudspeakers are used in place of two loudspeakers, this method of design following from that previously presented by Miyoshi and Kaneda [10] for the inversion of room acoustic responses. A similar approach that used three loudspeakers was presented by Uto et al [11] who used an adaptive filter design technique. Finally Cooper and Bauck [12] also later disclosed a three channel filter design technique based on the analytical frequency domain inversion of the Moore-Penrose pseudo-inverse matrix of transfer functions relating the loudspeaker outputs to the listener ear signals.
We discuss hereafter in Section 2 a number of problems which arise from these conventional approaches to system inversion involved in such a binaural synthesis over loudspeakers. A basic analysis with a free field transfer function model illustrates the fundamental difficulties which such systems can have. The amplification required by the system inversion results in loss of dynamic range. The inverse filters obtained are likely to contain large errors around ill-conditioned frequencies. Regularisation is often used to design practical filters but this also results in poor control performance. The performance suffers severely even with small errors in the reproduction stage. The Optimal Source Distribution (OSD) provided the solution for all the above problems by introducing the concept of variable frequency span transducers [13].
A sound reproduction system comprising electro-acoustic transducer means, and transducer drive means for driving the electro-acoustic transducer means in response to a plurality of channels of a sound recording, the transducer drive means comprising filter means which is configured to reproduce at a listener location an approximation to the local sound field that would be present at a listener's ears in recording space, taking into account the characteristics and intended position of the electro-acoustic transducer means relative to the ears of the listener, the electro-acoustic transducer means comprising first sound emitter means which provides an intermediate sound emission channel, second sound emitter means which provides a left sound emission channel and a third sound emitter means which provides a right sound emission channel, the first sound emitter means being located intermediate of second and third sound emitter means, the second and third sound emitter means being such that predominantly higher frequencies are transmitted closer to the first sound emitter means and predominantly lower frequencies are transmitted away from the first sound emitter means.
In a preferred embodiment of the invention we provide three channels of sound emitter means that are each positioned in a different azimuthal region relative to a listener location, and portions of each of the second and third sound emitter means having different azimuth directions emit different frequencies or different frequency ranges of sound.
The sound emitters may be in the form of discrete side-by-side/adjacent transducer units, each unit being substantially in the form of a conventional loudspeaker. For example each transducer unit may emit sound at predominant frequency or range of frequencies, or each unit may comprise a plurality of transducer sub-assemblies each of which emits a respective predominant frequency or range of frequencies. Alternatively the sound emitters may be constituted by area portions of an extended transducer means. Thus, the position of the emitter portions of the extended transducer could be arranged to vary continuously with frequency.
It should be appreciated that the invention does not preclude the use of additional electro-acoustic transducer means such as one or more sub-woofer units or one or more conventional loudspeakers for stereophonic or surround reproduction.
Preferably the operational transducer position-frequency range for the left and right channel of emitters is determined by
where θL, and θR are the azimuth span with respect to the listener subtended by the left and centre, and right and centre channel emitters respectively, where 0<n<4.
c0: speed of sound (≈340 m/s)
Δr: equivalent distance between the ears
The following equation is the correction factor to the foregoing equations (a), (b), (c) and (d) which are obtained from free field model, in order to match the frequency-azimuth characteristics to the realistic case with the presence of head diffraction.
Δr=Δr0(1+(θL+θR)/π)
Δr0: distance between the ears (≈0.12˜0.25 m)
Note that signal levels to define the operational frequency-span range should ideally be monitored at the receiver positions, not at the transducer input or output signals. This is because there may be a relatively large output signal level outside the operational frequency range for a transducer pair (much smaller than it would be without cross-over filters but may be larger compared to the case of multi-way conventional stereo reproduction without system inversion) which will cancel each other due to the characteristics of the plant matrix that results in small signal level at the ears.
In the foregoing equation (a) n being made equal substantially to 2 is ideal, and a ‘tolerance’ of ±2 for example can be applied to produce a position-frequency range. Thus n=2 can be assigned to around the centre frequency of the desired frequency range.
In one advantageous embodiment we employ 0<n<3.9.
In another advantageous embodiment we employ 0<n<3.7.
In yet another advantageous embodiment we employ 0.1<n<3.9.
In a further advantageous embodiment we employ 0.3<n<3.7.
An example of a 2-way system will now be described. Cross-over filters may be employed for distributing signals over the appropriate frequency range to the appropriate sound emitters. The cross-over filters may be arranged to respond to the outputs of an inverse filter means (Hh, H1) of said filter means. Alternatively inverse filter means (Hh, H1) of said filter means may be arranged to be responsive to the outputs (dh, d1) of the cross-over filters.
The filter means may be configured to be a minimum norm solution of the inverse problem.
The filter means may be configured to be a pseudoinverse filter.
The filter means may be configured to be adaptive filters.
The filter means may be configured to apply regularisation to the drive output signals in a frequency range at the lower end of the audio range.
Sub-woofers may be provided for responding to very low audio frequencies.
When the sound emitters are constituted by area portions of an extended transducer means, the extended transducer means preferably comprises elongated sound emitting members, the sound emitting surfaces of each member having a proximal end and a distal end, the proximal ends of the left and right channel transducers being adjacent to centre channel, excitation means mounted on said members adjacent to said proximal ends for imparting vibrations to said members in response to the drive output signals, the vibration transmission characteristics of the members being chosen such that the propagation of higher frequency vibrations along the members towards the distal end is inhibited whereby the proximal end of said surfaces is caused to vibrate at higher frequencies than the distal end.
According to another aspect of the invention there is provided electro-acoustic transducer arrangement comprising a first sound emitter which provides an intermediate sound emission channel, a second sound emitter which provides a left sound emission channel and a third sound emitter which provides a right sound emission channel, the first sound emitter being located intermediate of second and third sound emitter, and at least one of the second and third sound emitters being such that predominantly higher frequencies are transmitted closer to the first sound emitter and predominantly lower frequencies are transmitted away from the first sound emitter.
Yet a further aspect of the invention relates to a transducer drive for driving an electro-acoustic transducer arrangement in response to a plurality of channels of a sound recording, the transducer drive comprising a filter arrangement which is configured to reproduce at a listener location an approximation to the local sound field that would be present at a listener's ears in recording space, taking into account the characteristics and intended position of the electro-acoustic transducer arrangement relative to the ears of the listener, the transducer drive configured for use the electro-acoustic transducer arrangement which comprises a first sound emitter which provides an intermediate sound emission channel, a second sound emitter which provides a left sound emission channel and a third sound emitter which provides a right sound emission channel, the first sound emitter being located intermediate of second and third sound emitter, and at least one of the second and third sound emitters being such that predominantly higher frequencies are transmitted closer to the first sound emitter and predominantly lower frequencies are transmitted away from the first sound emitter.
Where the transducer drive comprises a configurable signal processor, machine-readable instructions may be used to suitably configure the transducer drive. The instructions may be provided on a data carrier, such as a CD or DVD, or may be in the form of a signal or data structure
Various embodiments of the invention will now be described, by way of example only, together with a more detailed presentation of prior art arrangements with reference to the accompanying drawings, which show:
FIG. 1—Block diagram for binaural reproduction over loudspeaker with system inversion,
FIG. 2—The geometry of a 2-source 2-receiver system under investigation,
FIG. 3—The definition of azimuth span,
FIG. 4—Norm and singular values of the inverse filter matrix H as a function of n. a) Logarithmic scale. b) Linear scale,
FIG. 5—Dynamic range loss due to system inversion,
FIG. 6—Condition number κ(C) as a function of n,
FIG. 7—Sound radiation by the control transducer pairs with reference to the receiver directions (0 dB and −∞ dB).
FIG. 8—Principle of the OSD system,
FIG. 9—Relationship between source span and frequency for different odd integer number n,
FIG. 10—Norm and singular values of the inverse filter matrix H of OSD as a function of frequency,
FIG. 11—Sound radiation by the OSD transducer pairs with reference to the receiver directions (0 dB and −∞ dB).
FIG. 12—Singular values of the inverse filter matrix H as a function of n. Optimal point for 2 channel OSD and 3 channel OSD
FIG. 13—Principle of the 3 channel OSD system,
FIG. 14—Relationship between source span and frequency for different integer number of n=2, 6, 10, . . . .
FIG. 15—Block diagram for binaural reproduction over 3 loudspeakers with system inversion,
FIG. 16—The geometry of a 3-source 2-receiver system under investigation,
FIG. 17—Norm and singular values of the inverse filter matrix H of the 3 channel case as a function of n. a) Logarithmic scale. b) Linear scale,
FIG. 18—Norm and singular values of the inverse filter matrix H of the 3 channel case as a function of n when the sensitivity of the centre channel transducer is increased by a factor of 3 dB. a) Logarithmic scale. b) Linear scale,
FIG. 19—Norm and singular values of the inverse filter matrix H of the 3 channel OSD as a function of frequency
FIG. 20—Variable frequency/position transducer,
FIG. 21—Discretised variable frequency/position transducer,
FIG. 22—An example of frequency/azimuth region and discretisation,
FIG. 23—Condition number κ(C) of the 3 channel case as a function of n,
FIG. 24—Condition number κ(C) of the 3 channel case as a function of n, when the sensitivity of the centre channel transducer is increased by a factor of 3 dB, and
The principle of binaural reproduction over loudspeaker is described below and is illustrated in
w=Cv (1)
where C is the plant matrix (a matrix of transfer functions between sources and receivers). The two signals to be synthesised at the receivers are defined by the elements of the complex vector d=[d1(jω)d2(jω)]T. In the case of audio applications, these signals are usually the signals that would produce a desired virtual auditory sensation when fed to the two ears independently. They can be obtained, for example, by recording sound source signals u with a recording head (eg a dummy head) or by filtering the signals u by matrix of synthesised binaural filters A.
Therefore, a filter matrix H which contains inverse filters is introduced (the inverse filter matrix) so that v=Hd where
and thus
w=CHd (3)
The inverse filter matrix H can be designed so that the vector w is a good approximation to the vector d with a certain delay [14][15]. When the independent control at two receivers is perfect, CH becomes the identity matrix I. The inverse filter matrix H can also be designed to be a pseudoinverse of the plant matrix C. The filter matrix H can also consist of adaptive filters.
However, the system inversion involved gives rise to a number of problems such as, for example, loss of dynamic range and sensitivity to errors. A simple case involving the control of two monopole receivers with two monopole transducers (sources) under free field conditions is first considered here. The fundamental problems with regard to system inversion can be illustrated in this simple case. The geometry is illustrated in
In the free field case, the plant transfer function matrix can be modelled as
where an ejωt time dependence is assumed with k=ω/c0, and where ρ0 and c0 are the density and sound speed.
Now consider the case
i.e., the desired signals are the acoustic pressure signals which would have been produced by the closer sound source and whose values are either D1(jω) or D2(jω) without disturbance due to the other source (cross-talk). This way the effect of system inversion can be separated from the effects of spherical attenuation due to propagation in space as well as ensuring a causal solution. The elements of H can be obtained from the exact inverse of C, and the magnitude of the elements of H (|Hmn(jω)|) show the necessary amplification of the desired signals produced by each inverse filter in H. The maximum amplification of the source strengths can be found from the 2-norm of H (denoted as ∥H∥) which is the largest of the singular values of H, where these singular values are denoted by σo and σi [13]. Thus
∥H∥=max(σo,σi) (6)
σo corresponds to the amplification factor of the out-of-phase component of the desired signals and σi corresponds to the amplification factor of the in-phase component of the desired signals. Plots of σo, σi, and ∥H∥ with respect to frequency are illustrated in
The singular value σo has peaks at n=0, 4, 8, . . . where the system has difficulty in reproducing the out-of-phase component of the desired signals and σi has peaks at n=2, 6, 10, . . . where the system has difficulty in reproducing the in-phase component. Around these frequencies, sound signals from control sources interfere destructively with each other, leaving little response left at the ears of the listener. In other words, the signals cancel each other. Therefore, the solution for the inverse, i.e., the amplification required to produce the desired sound pressure at each receiver, becomes substantially large.
In practice, since the maximum source output is given by ∥H∥max, this must be within the range of the system in order to avoid clipping of the signals. The required amplification results directly in the loss of dynamic range illustrated in
Eq. (1) implies that the system inversion (which determines v and leads to the design of the filter matrix H) is very sensitive to small errors in the assumed plant C (which is often measured and thus small errors are inevitable) where the condition number of C, κ(C), is large. In addition, the reproduced signals w are less robust to small changes in the real plant matrix C, where κ(C) is large.
The condition number of C is shown in
The calculated inverse filter matrix H is likely to contain large errors due to small errors in the assumed plant matrix C and results in large errors in the reproduced signal w at the receiver. This is because such errors are magnified by the inverse filters but remain not being cancelled in the plant. Even if H does not contain any errors, the reproduction of the signals at the receiver is too sensitive to the small errors within the real plant matrix C to be useful.
Such errors include individual differences of HRTFs, [16]-[18] and misalignment of the head and loudspeakers [19], approximation of filters and regularisation, where a small error is deliberately introduced to improve the condition of matrix to design practical filters [20]. These errors may seem small but it is far too large in practice where κ(C) is large.
On the contrary, κ(C) is small around the frequencies where n is an odd integer number in Eq. (7). Around these frequencies, a practical and close to ideal inverse filter matrix H is easily obtained and the accurate reproduction of intended sound signal is possible.
In addition, the sound radiated in directions other than that of receiver has a peaky frequency response due to the response of inverse filter matrix H and normally results in severe coloration. This contributes to coloured reverberation and makes listening in any other location other than one optimal location impractical.
Equation (7) can be rewritten in terms of the source azimuth span Θ as
As seen from the analysis above, frequencies with the source span where n is an odd integer number in Eq. (8) give the best control performance as well as robustness.
The Optimal Source Distribution (OSD) introduced the idea of a pair of conceptual monopole transducers whose span varies continuously as a function of frequency (
As discussed above, the two-channel OSD essentially uses the frequency span region where the two singular values, representing the in-phase and out-of-phase components of the binaural reproduction process, are balanced in order to overcome the fundamental problems of conventional binaural reproduction over loudspeakers. However, a system which aims to improve this further is proposed in what follows. For convenience, we refer to it as the “three channel OSD” system in contrast to the earlier OSD that will henceforth be referred as the “two channel OSD”.
Now we try to make use of the lowest value (−6 dB, at points B in
Since the condition for the in-phase component has now been relaxed, we now can use the optimal value (points B in
In order to see the effect of this additional transducer, we consider the simple case again where monopole transducers are used for binaural reproduction as in section 2.2 but this time with another transducer added on the median plane. The block diagram and geometry are illustrated in
where an ejωt time dependence is assumed with k=ωc0, and where ρ0 and c0 are the density and sound speed.
Note that the system is under-determined in that there can be a number of choices of the inverse filter matrix which produces no error [22] [23]. Among them, the minimum norm solution would be the most straightforward choice as well as giving the best performance with regard to the fundamental problems described in Section 3.1˜3.3. Therefore, the following examples use the minimum norm solution.
The 2-norm of H (∥H∥) and the two singular values σo and σi with respect to frequency are illustrated in
Having a third transducer for two point reproduction (i.e the mathematically under-determined case), the balance between the two singular values σo and σi can be changed independently by changing the relative sensitivity of the transducer of the centre channel with respect to those on the left and right. This is an important aspect which the three channel OSD possesses which in contrast the two channel OSD does not. If the sensitivity of the centre channel transducer is increased by the factor of √{square root over (2)}, the two singular values σo and σi become equal to each other at n=2, 6, 10, . . . and that is shown in
The singular value σi at n=0, 4, 8, . . . is always smaller than that of at n=2, 6, 10, . . . where all three transducers can contribute to the reproduction of in phase component. The 2-norm of H (∥H∥) and the two singular values σo and σi of the 3 channel OSD with respect to frequency are illustrated in
The three channel OSD requires, for the transmission of the left and right channels, monopole type transducers whose position varies substantially continuously as frequency varies, similar to the case with the two channel OSD. This may, for example, be realised by exciting a substantially triangular shaped plate whose width varies along its length. The requirement of such a transducer is that a certain frequency or a certain range of frequencies of vibration is excited most at a particular position having a certain width such that sound of that frequency is radiated mostly from that position (
From Eq. (7), the range of source direction is given by the frequency range of interest as can be seen from
Eq. (7) can also be rewritten in terms of frequency as
The smallest value of n gives the lowest frequency limit for a given source direction. Since sin θ≦1,
ie, the physically maximum source azimuth of θL=θR=90° gives the low frequency limit, f1, associated with this principle. A smaller value of n gives a lower low frequency limit so the system given by n=2 is normally the most useful among those with n=2, 6, 10, . . . . The low frequency limit given by n=2 of a system designed for all average human is about fi=700 Hz, which is higher than that for two channel OSD where it is about 350 Hz. Below the low frequency limit of three channel OSD, the performance gradually approaches that of two channel OSD, becoming identical below the low frequency limit of two channel OSD.
In
The fundamental behaviour is the same for the more realistic case where various other factors such as the Head Related Transfer Function come into effect as in the case with the two channel OSD.
The discretisation of the Optimal Source Distribution can also be used for the three channel OSD in a similar way to the two channel case. In practice, whilst a monopole transducer whose position varies continuously as a function of frequency may not be easily available it is possible to realize a practical system based on the underlying principle by discretising the transducer span. With a given span, the frequency region where the amplification is relatively small and plant matrix C is well conditioned is relatively wide around the optimal frequency.
Therefore, by allowing n to have some width, say ±v(0<v<2), a certain transducer span can nevertheless be allocated to cover a certain range of frequencies where control performance and robustness of the system is still reasonably good (
The difference of the slope around the ideal frequency/span relationship has advantages here again in many ways. For the same given tolerance width of n, the error will be much smaller than that in the two channel OSD. So the same level of discretisation gives a better approximation to the ideal case for the three channel OSD. For the same level of approximation, the discretisation can be coarser hence saving resources. The maximum width of n, which is the maximum allowance for v, becomes twice that in the two channel OSD, i.e. 0<v<2. In general, the performance of the discretised three channel OSD is much better due to the fact that the valley in
The condition number for the case shown in
Reference will now be made to
Turning initially to
With reference to
A new binaural reproduction system has been described which overcomes the fundamental problems with system inversion by utilising three-channels of transducers with variable position with respect to frequency.
This system can most easily be realised in practice by discretising the theoretical continuously variable transducer span which results in multi-way sound control system.
The three channel OSD arrangement finds application in numerous ways and in particular in the field of home audio. A particularly advantageous implementation is in the context of the transducers of portable media devices, such as mobile telephones and portable gaming devices, and so enhances the listener's experience of sound emitted thereby. Some portable media devices (such as MP3 players) are capable of being interfaced with a separate speaker arrangement (sometimes known as a docking station). Such speaker arrangements would benefit from being adapted to implement the three channel OSD arrangement.
Number | Date | Country | Kind |
---|---|---|---|
0712998.4 | Jul 2007 | GB | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/GB2008/002310 | 7/4/2008 | WO | 00 | 4/15/2010 |