1. Field of the Invention
The present invention relates to virtual speaker sound systems, and more particularly, to digital signal processing and speaker arrays to render rear surround channels.
2. Related Art
Typically, playing back surround sounds with only a few speakers have employed spatial enhancement techniques. The spatial enhancement techniques that allow playing back surround sound from few loudspeakers arranged in front of the listener are presently available from many different vendors. Example of such applications include 3D sound reproduction in home theatre systems where no rear speakers need to be installed and surround movie and computer game rendering using small transducers integrated into multimedia monitors or laptops. Usually, the listening experience is less than compelling, as apparent problems arise like (i) very narrow sweet spots that do not even allow larger head movements, (ii) strong imaging and tonal distortion off axis and (iii) phasiness and ear pressure felt while listeners turn their head around.
One approach for providing surround sound with only a few speakers employs multiway crosstalk canceller methods during the spatial enhancements. However, this approach requires high order inverse filter matrices with the aim to generate exact ear signals based on accurate head models, which results in degraded sound quality off axis where the listener's head is not at the exact intended position.
A signal processing approach has also been applied where a conventional crosstalk canceller circuit is used prior to crossover filters that connect to two pairs of transducers. This approach has limited success because the crosstalk canceller filters are not optimized for either of the transducer pairs.
Accordingly, a need exists for a speaker array that enables virtual surround rendering and that improves the playing back of surround sound. In particular, it is desirable to improve both the robustness and off-axis coloration of the virtual surround sound.
In view of the above, a digital signal processor is provided to process a stereo or surround sound audio signal rendering virtual surround. The process uses only speakers arranged in front of a listener and results in virtual surround sound that is robust to head movements and has low off-axis coloration. The digital signal processor renders to a speaker array rear surround channels with extended width and depth of stereo front channels by employing crossover circuits with first order head-related filters, an upmixing matrix and an array of delay lines to generate early reflections. It is to be understood that the features mentioned above and those yet to be explained below may be used not only in the respective combinations indicated but also in other combinations or in isolation without departing from the scope of the invention.
Other devices, apparatus, systems, methods, features and advantages of the invention will be or will become apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the accompanying claims.
The description below may be better understood by reference to the following figures. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. In the figures, like reference numerals designate corresponding parts throughout the different views.
It is to be understood that the following description of various examples is given only for the purpose of illustration and is not to be taken in a limiting sense. The partitioning of examples in function blocks, modules or units shown in the drawings is not to be construed as indicating that these function blocks, modules or units are necessarily implemented as physically separate units. Functional blocks, modules or units shown or described may be implemented as separate units, circuits, chips, functions, modules, or circuit elements. One or more functional blocks or units may also be implemented in a common circuit, chip, circuit element or unit.
In
Turning to
In
The center channel C 304 is added to left and right input channels L 302 and R 306, via an attenuation factor h1, respectively. Typically, h1 may be set as h1=0.4 and is approximately −8 dB in the current example. The summed signals are connected to the inputs IN_L and IN_R (output of combiners 308 and 310) of the 2-in 4-out upmixer 312, which generate main stereo outputs Out_L 314, Out_R 316, and surround outputs Surr_Out_L 318, Surr_Out_R 320. The main outputs are directly added to the signals that feed the outer transducer pair 104 and 110 via two summing nodes or combiners 338 and 340. The surround outputs of the 2-in 4-out upmixer 312 are multiplied by a factor h3, respectively, and added by combiners 324 and 328 to the surround input channels LS 322, and RS 326, which are multiplied by scaling factors h2. Resulting summed input signals are connected to the inputs of the surround renderer 302, which generates four signals, a first pair A_L 330 and A_R 332 connected to the outer transducer pair 104 and 110 via summing nodes (combiners 338 and 340), and a second pair B_L 334 and B_R 336, connected to the inner transducer pair 106 and 108.
Typical values for the scaling factors employed in the 2-in 4-out mixer 312 may be h2=2.3, h3=1.9, but other values may be used in other implementations depending on application and taste of user. In case of a computer monitor application, the outer transducers 104 and 110 may be spaced apart by (40 . . . 50) cm, the inner pair 106 and 108 by (6 . . . 10) cm. This corresponds to angular spans to the listeners head of +/−(14 . . . 17)° for the outer pair 104 and 110, and +/−(2 . . . 4)° for the inner pair 106 and 108 at a listening distance of 80 cm. In a home theatre system, where the outer transducers 104 and 110 are located at the edges of a large TV screen, the outer transducers 104 and 110 may be spaced apart by, for example, 150 cm, and the inner transducers 106 and 108 by, for example, 30 cm, leading to similar angular spans at a listening distance of 250-300 cm. The design parameters primarily depend on the angular spans and therefore may stay the same for both example applications.
Turning to
The low-pass filtered signal pair then passes through a non-recursive (first order) crosstalk-canceller section with cross paths modeled by delay sections HD 414 and 416, representing a pure delay of d1 samples, followed by gains g2 418, respectively. The cross-path outputs are subtracted from the respective direct paths by combiners 420 and 422, thereby cancelling signals that reach the left ear from the right transducer, and vice versa. At low frequencies below 700 Hz, inter-aural time differences (ITD) are prominent localization cues, whereas in the frequency range above 700 Hz, inter-aural level differences (ILD) become more dominant. At the specified listening angles, the path differences in the crosstalk paths correspond to delay values of d1=(4 . . . 8) samples at a sampling rate of 48 kHz.
The high-pass filtered signal pair is processed by a second crosstalk-canceller section with first order lowpass filters HC 424 and 426 in the cross paths, which are characterized by a −3 dB cutoff frequency ft 428. Empirically determined values for HC 424 and 426 are ft=(3 . . . 4) kHz in the current implementation. No further delay or gain parameters are required in this section. The output of HC 424 is subtracted from the output of HP 408 by combiner 430 and results in output signal B_R. Similarly, the output of HC 426 is subtracted from the output of HP 406 by combiner 428 and results in output signal B_L.
With the described two-way approach, first order head-related models have been used that resemble ITD and ILD localization cues in the respective frequency bands. Thereby, high order head-related filters as taught in the prior art have been avoided, resulting in less off-axis coloration, phasiness and unpleasant feeling of ear pressure.
A useful range for the cross path gain factor is typically g2=(0.3 . . . 0.9). Values close to one result in maximum separation (virtual images along the axis across the listener's ears) but require maximum bass boost, the amount of which can be set by choice of gain factor g1. A typical design example for a computer monitor system may be:
LP, HP=second order BW sections, fc=800 Hz
g1=−3.0,
HD=frequency response of delay d1=4 samples,
g2=0.7,
HC=1st oder lowpass, ft=3.5 kHz.
The frequency response at the center position, with mono input, is
g1·LP·(1−g2·HD)+HP·(1−HC).
At an off-axis position, an additional path length difference HD1 between left and right outer transducers leads to the frequency response formula:
g1·LP·(1·g2·HD)·(1+HD1)/2+HP·(1−HC).
In
Turning to
Stereo width adjustment may be accomplished in the stereo width adjustment section 601 with two linear 2×2 matrices with negative cross coefficients b1 602 for the main stereo pair Out_L 314, Out_R 316, and b2 604 for the virtual surround pair Surr_Out_L 318, Surr_Out_R 320, respectively. The parameter's useful range is the interval [0 . . . 1], with maximum separation for values close to one. Chosen values for the current example implementation are b1=0.04, b2=0.33.
Distance of the perceived sound stage may be increased beyond the speaker base by the addition of discrete reflected energy in the distance adjustment section 605. The higher the amplitude of reflections and the closer the reflections are to the direct sound (smaller delay values), the more distant the sound may be perceived. In the current example, four reflections (delayed replica of the direct sound) have been created and added to the four outputs of the 2-in 4-out upmixer 312. Parameters are the four delay values (d1 606, d2 608, d3 610, and d4 612) and their respective amplitudes (c1614, c2616, c3618, c4620). Sufficient decorrelation between the reflected signals may be achieved by assigning random values, thereby avoiding phantom imaging (merging of two or more reflections into one) and excessive coloration. An example parameter set for the current implementation may be c1=0.62, c2=0.50, c3=0.71, c4=0.5 (corresponding to −4 dB, −6 dB, −3 dB and −5 dB, respectively) and d1=564, d2=494, d3=776, d4=917 samples.
Further, a pair of first order high-shelving filters 622 and 624 may be inserted into the reflection path to simulate natural wall absorption and attenuate transients in the simulated ambient sound field. Typical parameters for the high-shelving filters 622 and 624 are depicted in
Turning to
The methods described with respect to
It will be understood, and is appreciated by persons skilled in the art, that one or more processes, sub-processes, or process steps or modules described in connection with
The foregoing description of implementations has been presented for purposes of illustration and description. It is not exhaustive and does not limit the claimed inventions to the precise form disclosed. Modifications and variations are possible in light of the above description or may be acquired from practicing examples of the invention. The claims and their equivalents define the scope of the invention.