1. Technical Field
This disclosure relates to the creation of a stereo signal with enhanced perceptual quality, or spatialization; and in particular, to how a signal represented by a mid-signal and side-signals can be processed to create a stereo signal with improved characteristics.
2. Background
Recently, it has become feasible to store and playback larger amounts of music on portable devices. As a consequence, the use of such devices became very popular, especially as the musical content can be played back via headphones everywhere. Normally, the content to be played back has been mixed in stereo, i.e., to two independent channels. However, the production has been performed for a playback via loudspeakers, using common two-channel stereo-equipment. That is, the stereo-channels have been mixed in a music-studio such as to provide maximum reproduction quality, and, as far as possible, the spatial perception of the original auditory scene using two loudspeakers. However, listening to such stereo recordings via headphones leads to in-head localization of the sound; virtual sound sources, which are meant to be localized somewhere between the two loudspeakers, are localized inside the listener's head due to psychoacoustic properties of the human auditory system. This is the case since no crosstalk and no reflections are perceived.
Several methods and devices have been proposed to address this problem by processing the left and right channels prior to the playback via headphones. However, these approaches, as for example the use of head related transfer functions, are computationally very complex. These approaches try to stimulate the human auditory system to localize the sound sources outside the head when playing back music with headphones by simulating the listening situation of loudspeakers in a room. For example, a cross-talk sound path and the reflections of the room's walls may be artificially added to the signal. When fairly well-sounding results are to be received with reduced complexity, those models are, for example, reduced to cross-talk, and, in some cases, to a very small number of wall reflections, which can be implemented by low-order filtering
Often, stereo signals are also transformed in to a mid-side representation containing a mid-signal (sum-signal) and a side-signal (difference signal). The sum-signal is formed by summing up the right channel and the left channel and the difference signal is formed by building the difference of the left channel and the right channel. In most musical stereo-signals, the virtual sound sources of highest relevance are those localized in front of the listener. This is the case, since these commonly represent the leading voice or the leading instrument in the recording. As these sound sources are intended to be localized between the loudspeakers of a two-channel setup, these signal components are present in the left channel as well in the right channel. Therefore, these important signals are mainly represented by a sum-signal (mid-signal) and hardly by a different signal (side-signal). Therefore, when attempting to achieve a localization out of a listener's head, such a mid-side representation has to be processed with great care.
Given the conventional generation of stereo-signals and the changed playback habits, the need exists to provide a concept for the generation of a stereo signal with enhanced spatialization that can be efficiently implemented.
The method disclosed here can be logically divided into three parts. In the first part, the left and right signals are decorrelated to increase the differences between them. The rest of the method enhances those differences, so the more difference there is to begin with, the better the rest of the method will work. The second part of the method converts the left and right signals to a decorrelated side signal. The left and right and side signals are processed separately to further enhance the left and right signals compared to the side signal. Finally, the left and right and side signals are mixed to recreate the left and right stereo signals.
First, the left decorrelator block (110) and the right decorrelator block (120) increase the differences between the left and right signals so that the processing that follows is more effective. The decorrelator blocks (110, 120) are preferably sixth-order infinite-impulse response (IIR) all-pass filters with arbitrary group delay. By making the group delay characteristics different between the left decorrelator (110) and the right decorrelator (120), the difference between the left and right signals is enhanced as the signals are passed through the decorrelators (110, 120). Since the filters are all-pass, the frequency responses of the left and right signals are not changed.
The IIR filters provide an efficient decorrelation filter, but the decorrelation can be improved by using finite-impulse-response (FIR) filters. The phase characteristic can be better controlled in FIR filters, so that a perfect decorrelation can be achieved if the filters are long enough and carefully designed. But even relatively short decorrelation filters, on the order of 15 taps can be used to achieve very good results.
After the left and right signals are decorrelated, the left and right and side signals are calculated as shown in
After the left and right and side signals are created, they are independently gained. The left and right signals would typically be gained more than the side signal. This enhances the left and right signals compared to the side signal, making the stereo content more easily heard.
Next, the signals are colored independently in color filters, preferably using second-order IIR filters. The left and right signals would typically have their high frequencies enhanced using shelf filters or even high-pass filters (190, 210). The side signal would typically have its low frequencies enhanced using a shelf or low-pass filter (200). The ear determines spatial location using high frequencies, so enhancing those frequencies on the left and right signals should increase the spatial sense. Enhancing the low frequencies of the side signal ensures that the bass response of the complete audio is maintained.
Finally, a delay (220, 230) can be optionally added to the left and right signals. These delays (220, 230) are meant to emulate the delay caused when sound from the side of the head arrives at each ear at different times. The delay would typically be about 0.5 milliseconds.
After the left and right and side signals have been thus enhanced, the left and right stereo channels are recreated using: L=L+S and R=R−S in fourth and fifth adders (240, 250) as shown in
In a second embodiment (300), the components of the circuit are re-arranged as shown in
In the second embodiment (300), the output of the right decorrelator (320) is subtracted in adder (330) from the output of the left decorrelator (310). The side signal S thus created is passed through a first color filter (340) as described above in the discussion of the first embodiment (100). The side signal is passed through a delay block (350) for the purposes also discussed above and independently gained in a first gain block (360). The left and right channels, bypassing decorrelation, are passed through second and third color filters (370, 380) as described in the discussion of the first embodiment (100) and independently gained in second and third gain blocks (390, 400).
The side signal is added in a second adder (430) to the separately colored and gained left signal to recreate the left stereo channel. The side signal is inverted in an inverter (420) and added in a third adder (440) to the separately colored and gained right signal to recreate the right stereo channel.
The gain of the side channels can be automatically controlled by an optional dynamic processing block (450) that maintains the gain relationship between the main signal and the side signal. The Dynamic Processing block analyzes the energy of the side signal and main signal and varies the gain of the side signal accordingly. The DP block works like an AGC, increasing the gain of the side signal when the energy is low, and decreasing the gain when the energy is high. The DP block allows the spatialization method to work with many different types of music.
The reader will see that the methods disclosed can be tuned to specific applications by varying the gain, filtering and delay characteristics of the various components shown, all of which implementations are intended to be covered by the claims.
None of the description in this application should be read as implying that any particular element, step, or function is an essential element which must be included in the claim scope; the scope of patented subject matter is defined only by the allowed claims. Moreover, none of these claims are intended to invoke paragraph six of 35 U.S.C. Section 112 unless the exact words “means for” are used, followed by a gerund. The claims as filed are intended to be as comprehensive as possible, and no subject matter is intentionally relinquished, dedicated, or abandoned.