The technical field of this invention is stereo synthesis from monaural input signals.
When listening to sounds that are from in a monaural source, widening the sound image using a stereo synthesizer in the entire frequency range doesn't always satisfy listeners' preference. For example, the vocal of a song would be best if localized at center. Conventional stereo synthesis does not do this.
This invention uses strictly complementary linear phase FIR filters to separate the incoming audio signal into at least two frequency regions. Stereo synthesis is performed at less than all of these frequency regions.
This invention uses any magnitude response curve for the band separation filter. This enables selection of one frequency band or multiple frequency bands on which to perform stereo synthesis. This is different from conventional methods which just widen the monaural signal in the entire frequency region or just places the crossover frequencies at the formant frequencies of the human voice.
This invention let a certain instrument or vocal sound be localized at center, while the other instruments are perceived in wider sound space.
These and other aspects of this invention are illustrated in the drawings, in which:
A monaural audio signal is perceived at the center of a listener's head in a binaural system and at the midpoint of two loudspeakers in two-loud speaker system. A stereo synthesizer produces a simulated stereo signal from the monaural signal so that the sound image becomes ambiguous and thus wider. This widened sound image is often preferred to a plain monaural sound image.
A lot of work has been done on stereo synthesizers. The technique that is commonly employed is to delay the monaural signal and add to/subtract from the original signal. From a digital signal processing standpoint, this is called a comb filter due to its frequency response. When allocating notches of the comb filter onto different frequencies for left and right channels, the outputs from both channels become uncorrelated. This causes the sound image to be ambiguous and accordingly wider than just listening to the monaural signal.
The comb filter solution works well for producing a wider sound image from a monaural signal. However, just widening the total sound sometimes causes a problem. When listening to pop music, listeners generally expect the vocal be localized at the center. The other instruments are expected to be in the stereophonic sound image. This preference is quite similar to many multichannel speaker systems which have a center speaker that centralizes human voices.
To overcome the problem, one example of this invention separates the incoming monaural signal into two frequency regions using a pair of strictly complementary (SC) linear phase finite impulse response (FIR) filters. The invention applies a comb filter stereo synthesizer to just one of the two frequency regions. This invention uses SC linear phase FIR filters is because of the low computational cost. This invention does not need to implement synthesis filters that reconstruct the original signal. This invention needs to calculate only one of the filter outputs, because the other filter output can be calculated from the difference between the input signal and the calculated filter output.
For the particular problem of centralizing the voice signal, the frequency separation should be achieved with band pass and band stop filters. The pass band and stop band are placed at the voice band. However, this invention is not limited to band pass and band stop filters. Any type of filter pair such as low pass and high pass are applicable depending on which frequency regions desired to be in or out of the stereo synthesis. This depends upon the instrument(s) to be centralized. This flexibility makes this invention more attractive than the prior art method which just places the crossover frequencies at the formant frequencies of the human voice.
Stereo synthesis is typically achieved using FIR comb filters. These comb filters are embodied by adding a delayed weighted signal to the original signal.
C
0(z)=(1+αz−D)/(1+α)
C
1(z)=(1−αz−D)/(1+α) (1)
where: D is a delay that controls the stride of the notches of the comb; and α controls the depth of the notches, where typically 0<α≦1. The magnitude responses are given by:
Equation (2) shows that both filters have peaks and notches with constant stride of 2π/D. The peak of one filter is placed at the notches of the other filter and vice versa. These responses de-correlate the output channels. The sound image becomes ambiguous and thus wider.
The equalization (EQ) filter 213 Q(z) may be optionally inserted in order to compensate for the harmony that might be distorted by the notches of the comb filters. Since EQ filter 213 doesn't affect the sound image wideness, but just the sound quality it will not be described in detail.
The output of strictly complementary (SC) finite impulse response (FIR) filters 210 and 211 are as follows:
For the example of
H
1(e−jω)=z−N/2−H0(e−jω) (4)
But since H0(z) is linear phase, the frequency response can be written as:
H
1(e−jω)=e−jωN/2(1−|H0(e−jω)|) (5)
From equation (5), it is clear that:
|H1(e−jω)=1−|H0(e−jω)| (6)
From the computational cost viewpoint, equation (4) suggests the benefit from using the SC linear phase FIR filters. The output from H0(z) can be calculated by letting h0(n) be the impulse response as follows:
Then the other filter output can be calculated as follows:
y
1(n)=x(n−N/2)−y0(n) (8)
The following will describe an example stereo synthesizer according to this invention. The input was sampled at a frequency of 44.1 kHz. The first SC FIR filters is an order 64 FIR band pass filter H0(z) based on a least square error prototype. The cutoff frequencies were chosen to be 0.5 kHz and 3 kHz. This frequency range covers lower formant frequencies of the human voice. The complementary filter H1(z) was calculated according to equation (4).
For the comb filters: a was selected as 0.7; and D was selected as 8 mSec. This delay D implies a filter of 352 taps.
In this example equalization filter 213 includes first order low and high shelving filters that boost the low and high frequency sound. This achieves better sound quality. In this example the equalization filter 213 includes a low shelving gain of 6 dB at the band edge 0.3 kHz and a high shelving gain of 6 dB at the band edge 6 kHz.
A brief listening tests on the stereo synthesizer of this example results in centralization of everything around the range between 0.5 kHz and 3 kHz. In the listening test this included the vocal sounds. However, the sound image was widened in the other frequency ranges. Therefore this example stereo synthesizer can relatively centralize the voice sound. This confirmed realization of the object of this example of simulating stereo sound while centralizing the voice band.
The compressed digital music system illustrated in
Direct memory access (DMA) unit 704 controls data movement throughout the whole system. This primarily includes movement of compressed digital music data from hard disk drive 721 to external system memory 730 and to digital signal processor 714. Data movement by DMA 704 is controlled by commands from CPU 702. However, once the commands are transmitted, DMA 704 operates autonomously without intervention by CPU 702.
System bus 710 serves as the backbone of system-on-chip 700. Major data movement within system-on-chip 700 occurs via system bus 710.
Hard drive controller 711 controls data movement to and from hard drive 721. Hard drive controller 711 moves data from hard disk drive 721 to system bus 710 under control of DMA 704. This data movement would enable recall of digital music data from hard drive 721 for decompression and presentation to the user. Hard drive controller 711 moves data from digital input 720 and system bus 710 to hard disk drive 721. This enables loading digital music data from an external source to hard disk drive 721.
Keypad interface 712 mediates user input from keypad 722. Keypad 722 typically includes a plurality of momentary contact key switches for user input. Keypad interface 712 senses the condition of these key switches of keypad 722 and signals CPU 702 of the user input. Keypad interface 712 typically encodes the input key in a code that can be read by CPU 702. Keypad interface 712 may signal a user input by transmitting an interrupt to CPU 702 via an interrupt line (not shown). CPU 702 can then read the input key code and take appropriate action.
Dual digital to analog (D/A) converter and analog output 713 receives the decompressed digital music data from digital signal processor 714. This provides a stereo analog signal to headphones 723 for listening by the user. Digital signal processor 714 receives the compressed digital music data and decompresses this data. There are several known digital music compression techniques. These typically employ similar algorithms. It is therefore possible that digital signal processor 714 can be programmed to decompress music data according to a selected one of plural compression techniques.
Display controller 715 controls the display shown to the user via display 725. Display controller 715 receives data from CPU 702 via system bus 710 to control the display. Display 725 is typically a multiline liquid crystal display (LCD). This display typically shows the title of the currently playing song. It may also be used to aid in the user specifying playlists and the like.
External system memory 730 provides the major volatile data storage for the system. This may include the machine state as controlled by CPU 702. Typically data is recalled from hard disk drive 721 and buffered in external system memory 730 before decompression by digital signal processor 714. External system memory 730 may also be used to store intermediate results of the decompression. External system memory 730 is typically commodity DRAM or synchronous DRAM.
The portable music system illustrated in
This application is related to contemporaneously filed U.S. patent application Ser. No. ______ (TI-36520) LOW COMPUTATION MONO TO STEREO CONVERSION USING INTRA-AURAL DIFFERENCES and U.S. patent application Ser. No. ______ (TI-37099) STEREO SYNTHESIZER USING COMB FILTERS AND INTRA-AURAL DIFFERENCES.