The technical field of this invention is stereophonic audio synthesis applied to enhancing the presentation of both music and voice for more pleasant sound quality.
Currently, most commercial audio equipment has stereophonic (stereo) sound playback capability. Stereo sound provides a more natural and pleasant quality than monaural (mono) sound. Nevertheless there are still some situations which employ mono sound signals including telephone conversations, TV programs, old recordings, radios, and so forth. Stereo synthesis creates artificial stereo sounds from plain mono sounds attempting to reproduce a more natural and pleasant quality.
The present inventors have previously described two distinctively different synthesis algorithms. The first of these [TI-36290] applies comb filters [referred to in the disclosure as complementary linear phase FIR filters] to a selected range of frequencies. Comb filters are commonly used in signal processing. The basic comb filter includes a network producing a delayed version of the incoming signal and a summing function that combines the un-delayed version with the delayed version causing phase cancellations in the output and a spectrum that resembles a comb. Stated another way, the composite output spectrum has notches in amplitude at selected frequencies. When arranging separate comb filters to produce allocated notches of at different frequencies for left and right channels, the outputs from the both channels become uncorrelated. This causes the band-selected sound image to be ambiguous and thus wider. Typically, the purpose of band selection is to centralize just the human voices. The second earlier invention [TI-36520] describes the use of an Intra-Aural Time Difference (ITD) and an Intra-Aural Intensity Difference (IID). This simulates the cultural fact that, in many live orchestras and some rock bands, the low instruments tend to be located toward the right and the high instruments on the left. To do this, the incoming mono signal is split into three frequency bands and then sent to left and right channels with different delays and gains for each channel, so that the band signals add up to the original, but with ITD and IID in low and high bands respectively.
This invention is a new method for creating a stereophonic sound image out of a monaural signal. The method combines two synthesis techniques. In the first technique comb filters de-correlate the left and right channel signals. The second technique applies intra-aural difference cues. Specifically this invention applies intra-aural time difference (ITD) and intra-aural intensity difference (IID) cues. The present invention performs a three-frequency band separation on the incoming monaural signal using strictly complementary (SC) linear phase FIR filters. Comb filters and ITD/IID are applied to the low and high frequency bands to create a simulated stereo sound image for instruments other than human voice. Listening tests indicate that the method of this invention provides a wider stereo sound image than previous methods, while retaining human voice centralization. Since the comb filter computation and ITD/IID computation can share the same filter bank, the invention does not increase the computational cost compared to the previous method.
These and other aspects of this invention are illustrated in the drawings, in which:
The stereo synthesizer of this invention combines the best features of two techniques employed in prior art. Comb filters provide wider sound image and the combination of ITD/IID gives sound quality more faithfully reproducing the character of the original mono signal. This application describes a composite method that combines the two algorithms creating a wider sound image than the two methods provide individually. Since the two algorithms can share the same filter bank, which is three strictly complementary (SC) linear phase FIR filters, the integrated system can maintain a simple structure and the computational cost does not unduly increase.
In
H
l(z)+Hm(z)+Hh(z)=cZ−N
is satisfied, where c=1, in particular. Thus just adding all these filter outputs perfectly reconstructs the original signal. It is also important to make these FIR filters be phase linear with an even number order N. With the choice N0=N/2, equation (1) can be written as:
H
l(z)+Hm(z)+Hh(z)=z−N/2 (2)
Substituting z=ejω and recognizing that Hl(ejω), Hm(ejω) and Hh(ejω) are linear phase whose phase terms are given as e−jωN/2, we have the frequency response relationship among the three filters as:
|Hl(e−jω)|+|Hm(e−jω)|+|Hh(e−jω)|=1 (3)
Let Hl(z) be the low pass filter (LPF) and Hh(z) be the high pass filter (HPF). Then Hm(z) will be a band-pass filter (BPF0). The output from low pass filter (Hl(z)) 201 is calculated as:
and the output from high pass filter (Hh(z)) 203 is calculated as:
with hl(n) and hh(n) designating the respective impulse responses. Then the other output can be calculated just from:
y
m(n)=x(n−N/2)−y1(n)−yh(n) (5)
Both equation (3) and equation (5) illustrate the benefit of using the SC linear phase FIR filters. Implementing a low pass filter and a high pass filter and just subtracting their outputs from the input signal gives a band pass filter output. This means that the major computational cost is for calculating only two filter outputs out of the three.
where: D is a delay that controls the stride of the notches of the comb; and α controls the depth of the notches. Typically 0<α≦1. The magnitude responses are given by:
The applicable magnitude response depends on the signs of the multiplier that are applied to the delayed-weighted path. Equations (7A) and (7B) show that both filters have peaks and notches with a constant stride of 2π/D. The peaks of one filter are placed at the notches of the other filter and vice-versa. This de-correlates the output channels resulting in the sound image becoming ambiguous and thus wider.
In a spatial hearing, a sound coming from left side of a listener arrives at the right ear of the listener later than the left ear. The left side sound is more attenuated at the right ear than at the left ear. The intra-aural time difference (ITD) and intra-aural intensity difference (IID) provide sound localization cues that make use of these spatial hearing mechanisms.
Referring back to
The following is a description of a design example. In this example, a sampling frequency was chosen 44.1 kHz. The SC FIR filters were designed using MATLAB. This example uses order 32 FIR Hl(z) and Hh(z) selected based on the least square error prototype. The cut off frequency of the low pass filter Hl(z) was chosen as 300 Hz and the cut off frequency of the high pass filter Hh(z) was chosen as 3 kHz. These selections puts the lower formant frequencies of the human voice in their stop bands. The band pass filter Hm(z) was calculated using equation (5). This was confirmed as providing a band pass filter magnitude response. The low and high pass filters were implemented using equation (4).
The comb filters were designed as follows. Comb filters 208 C1,0 and C1,1 for the low channel:
Comb filters 218 Ch,0 and Ch,1 for the low channel:
where: D=8 milliseconds corresponding to 352 filter taps was selected for the all comb filters. The purpose of flipping the signs of the multiplier for low band and high band was to cancel the notches of each other in the transition region of LPF and HPF. This contributed to further centralizing the human voice, while the sound image for the other instruments was unaffected. In this example only intra-aural-intensity differences (IID) were implemented. The intensity difference w was 1.4.
Brief listening confirmed that this method provides wider sound image than the two previous methods, while the voice band signals were centralized the same as with those methods.
Referring back to
This invention is a stereo synthesis method that combines two previous methods, the comb filter method and intra-aural difference method. Through listening tests it has been confirmed that this method provides a wider stereo sound image than previous methods, while the human voice centralization property is retained. The computational cost of the present invention is almost the same as the previous methods.