The proposed technology generally relates to sound or audio reproduction, and more specifically to a method for decoding and a corresponding audio decoder, especially for use with earphones, a sound reproduction system comprising such an audio decoder and a computer program for decoding.
Music is normally produced and mixed for loudspeaker reproduction. When music is mixed for loudspeaker reproduction however, the resulting listening experience becomes less than optimal when listening through earphones.
The process of music production and music reproduction can together be said to consist of a sound encoding part and a sound decoding part. The encoding part entails music production and storage of the music material on a designated format, e.g. the CD format. The decoding part is the sound reproduction part which entails the whole procedure of reading the music signal from the storage format to the signal processing that enables presenting the music to the ears of the listeners. The decoding part normally entails sound reproduction by either loudspeaker or earphone listening.
A stereo music signal has information encoded in it that, when played back over loudspeakers in a listening room, results in psychoacoustic cues being presented to the listener that gives a certain spatial impression of the sound. By spatial impression is meant aspects of the sound that has to do with e.g. the location and size of each instrument in the sound image and what kind of acoustical space is perceptually associated with each instrument.
These spatial psychoacoustic cues become either strongly distorted or totally missing when earphones are used in the reproduction system.
An often used solution for making the perceived sound field more natural in earphones when reproducing a stereo signal is to use a cross-feed network to feed some of the left signal to the right ear, and some of the right signal to the left ear. See for example references [1], [2], and [3].
In some implementations only the frequency dependent head shadowing is simulated and the ITD is kept at zero. The side-effect of this is that the sound stage loses ambience, and becomes too narrow. If a time-delay is inserted in the cross-feed signal paths HRL and HLR the sound stage proportions can be simulated properly but another problem arises—center panned sounds that are correlated between the left and right input channels experience a strong comb filtering effect in the addition of the direct-path and cross-feed path sound. This comb filtering effect colors the spectrum of the sound.
The proposed technology overcomes these and other drawbacks of the prior art arrangements.
It is an object to provide a decoding method and a corresponding decoder, also referred to as an audio or sound decoder or a spatial decoder, or a binaural decoder.
It is also an object to provide a sound reproduction system comprising an audio decoder.
Yet another object is to provide a computer program for decoding, when executed by a processor, input signals representative of at least two audio input channels.
It is another object to provide a carrier comprising such a computer program.
These and other objects are met by embodiments of the proposed technology.
In a first aspect, the proposed technology provides an audio decoder configured to receive input signals representative of at least two audio input channels. The audio decoder is configured to provide direct signal paths and cross-feed signal paths for the input signals. The audio decoder is configured to apply head shadowing filters in the direct signal paths and cross-feed signal paths for simulating head shadowing of loudspeakers placed at different angles to an intended listener. The audio decoder is also configured to apply phase shift filters in the direct signal paths and cross-feed signal paths for introducing a phase difference between the direct signal paths and the cross-feed signal paths representing a phase difference occurring between the ears of the intended listener. The audio decoder is further configured to sum the direct and cross-feed signal paths to provide output signals.
In a second aspect, the proposed technology provides a method of decoding input signals representative of at least two audio input channels, where direct signal paths and cross-feed signal paths are provided for the input signals. The method comprises the step of applying head shadowing filters in the direct signal paths and cross-feed signal paths for simulating head shadowing of loudspeakers placed at different angles to an intended listener. The method also comprises the step of applying phase shift filters in the direct signal paths and cross-feed signal paths for introducing a phase difference between the direct signal paths on the one hand and the cross-feed signal paths on the other hand. The phase difference between the direct signal paths and the cross-feed signal paths represents the phase difference occurring between the ears of the intended listener when a signal is input on either of the input channels. The method further comprises the step of summing the direct and cross-feed signal paths to provide output signals.
In a third aspect, the proposed technology provides a sound reproduction system comprising an audio decoder according to the first aspect.
In a fourth aspect, the proposed technology provides a computer program for decoding, when executed by a processor, input signals representative of at least two audio input channels. The computer program comprises instructions, which when executed by the processor causes the processor to:
In a fifth aspect, the proposed technology provides a carrier comprising the computer program.
In a sixth aspect, the proposed technology provides an audio decoder configured to receive input signals representative of at least two audio input channels. The audio decoder comprises a representation module for providing a computer representation of direct signal paths and cross-feed signal paths for the input signals. The audio decoder also comprises a first filtering module for applying head shadowing filters in the direct signal paths and cross-feed signal paths for simulating head shadowing of loudspeakers placed at different angles to an intended listener. The audio decoder comprises a second filtering module for applying phase shift filters in the direct signal paths and cross-feed signal paths for introducing a phase difference between the direct signal paths and the cross-feed signal paths representing of a phase difference occurring between the ears of the intended listener. The audio decoder further comprises a summing module for summing the direct and cross-feed signal paths to provide output signals.
There is also provided a network client comprising an audio decoder as defined herein, and a network server comprising an audio decoder as defined herein.
For the particular application with earphones, the proposed technology provides a method of decoding the spatial cues present in a stereo signal (or in general a sound signal with more than one channel, i.e. L channels, where L>1) correctly for enabling earphone listening and adding missing spatial cues before the music signal is sent to the earphones.
In particular, the proposed technology aims at reproducing/simulating the perceived sound field proportions properly while not introducing a comb filtering effect.
Other advantages will be appreciated when reading the detailed description.
The proposed technology, together with further objects and advantages thereof, may best be understood by making reference to the following description taken together with the accompanying drawings, in which:
Throughout the drawings, the same reference numbers are used for similar or corresponding elements.
The method basically comprises the steps of:
By way of example, the step S2 of applying phase shift filters in the direct signal paths and cross-feed signal paths is performed for introducing a frequency-dependent phase difference that mimics a phase difference occurring between the ears of the intended listener due to different arrival times of sound at the ears from the loudspeakers positioned with different angles to the head of the intended listener, so-called ITDs.
It should be understood that the order of the steps S1 and S2 may be interchanged if desired, provided the steps are designed to be time-invariant.
Reference can also be made to the schematic diagram of
Preferably, the frequency-dependent phase difference is introduced for frequencies below a threshold frequency. As an example, the threshold frequency is around 1 kHz.
In this example, the method optionally further comprises the step S2′ of applying, before the summing step S3, decorrelating filters in the direct signal paths and cross-feed signal paths for introducing or adjusting a phase difference between the direct signal paths and the cross-feed signal paths to be around 90 degrees above a threshold frequency. By way of example, the threshold frequency is around 1 kHz.
This allows for decorrelation of the signals in the summation where the direct signal paths and cross-feed signal paths are summed to produce one output signal.
It should be understood that the order of the steps S1, S2 and S2′ may be interchanged if desired, provided the steps are designed to be time-invariant.
By way of example, the head shadowing filters may be based on Head Related Transfer Function, HRTF, responses with ITDs removed.
Preferably, the method is applied to pairs of channels in case of more than two input channels.
There is also provided a corresponding audio decoder configured to receive input signals representative of at least two audio input channels.
It should be understood that the order of the filter blocks 20 and 30 in
It should be understood that the order of the filter blocks 20, 30 and 35 in
It should be understood that the order of the filter blocks 20, 30 and 35 in
As exemplified in
Optionally, as indicated by the dashed lines in
As an example, the audio decoder 100 may be configured to apply phase shift filters in the direct signal paths and cross-feed signal paths by introducing a frequency-dependent phase difference that mimics a phase difference occurring between the ears of the intended listener due to different arrival times of sound at the ears from the loudspeakers positioned with different angles to the head of the intended listener, so-called ITDs.
Preferably, the frequency-dependent phase difference is modeled for frequencies below a threshold frequency. By way of example, the threshold frequency is around 1 kHz.
In a particular example, as illustrated in
As indicated above, the audio decoder 100 may be configured to provide the direct signal paths and cross-feed signal paths by means of a cross-feed network 10. In a particular example, the audio decoder 100 is further configured to apply head shadowing filters by means of an individual head shadowing filter arranged in each of the direct signal paths and cross-feed signal paths. The audio decoder 100 may also be configured to apply phase shift filters by means of a first all-pass filter arranged in each of the direct signal paths and a second different all-pass filter arranged in each of the cross-feed signal paths to provide a phase difference between the signals of the direct signal paths on the one hand and the signals of the cross-feed signal paths on the other hand.
For example, the head shadowing filters may be based on HRTF responses with ITDs removed. By way of example, the HRFTs may be obtained in any suitable way, e.g. based on HRTF modelling, accessed through public HRTF databases, and/or through HRTF measurements.
If there are more than two input channels, the audio decoder 100 is typically configured to apply to pairs of channels.
In a particular application, the output signals are intended to be sent to a set of earphones 130.
As indicated, a particular example of the audio decoder 100 is a stereo decoder. It should though be understood that the invention is not limited thereto.
It should also be understood that the decoder may be implemented in a server-client scenario, on the client side and/or on the server side. Naturally, the audio decoder 100 may be implemented in a network client, which may be a wired and/or wireless device including any type of user equipment such as mobile phones, smart phones, personal computers, laptops, pads and so on. Alternatively, the audio decoder 100 may be implemented in a network server, which is then configured to decode the audio signals and send the decoded audio signals in compressed or uncompressed form to the client which in turn effectuates the play-back. The audio signals may be decoded by the network server and transferred to the client in real-time, e.g. as streaming media files. Alternatively, the decoded audio signals are stored by the network server as pre-processed audio files, which may subsequently be transferred to the client. The pre-processed audio files includes the decoded audio signals or suitable representations thereof.
In a particular example, the decoder has two input channels and two output channels. As indicated above, the decoder may however be configured for more than two channels, and more generally for L channels, where L>1. For example, the decoder may be configured (duplicated) to apply to pairs of channels if the audio source has more than two channels.
In the following, however, a stereo input signal is assumed for convenience.
The head shadow block (1) splits up the signal into direct and cross-feed signals in the same way as depicted in
The Phase Equalizer (EQ) block (2) applies phase shift filters to the direct and cross-feed signals, designed in such a way so that low-frequency ITD is simulated with the corresponding phase shift between the direct and cross-feed signals and there is no comb-filtering effect when the direct and cross-feed signals are summed inside the block. ITD is more important for localization at low frequencies than at high frequencies, so the ITD does not need to be simulated in the frequency range where it gives rise to annoying comb filtering effects.
The Reverberation block (3) is optional and adds reverberation ambience to the sound, which is always present when listening to loudspeakers in a real room.
Below, examples of the signal processing blocks depicted in
Example of Block 1—Head Shadow
An example of a head shadow block simulates head shadowing at the ears corresponding to sound incident from two loudspeakers placed at different angles to the listener. In this example, the filters used for head shadowing correspond to average HRTF responses for a number of listeners but with ITDs removed. Preferably, this is done by aligning the start of the impulse responses corresponding to the head shadowing filters applied in the direct and cross-feed signal paths, respectively. For more information on the concepts of HRTF, ITD and relevant psychoacoustics, see reference [5].
As can be seen in
For head shadowing, an important design variable is the amount of head shadow as a function of frequency, i.e. the frequency-dependent amplitude difference occurring between the ears of an intended listener when a signal is applied at one of the inputs.
Another important design variable is how the head shadow filters influence the perceived timbre of the sound. Under certain conditions, frequency response correction through equalization can be performed to adjust the perceived timbral characteristics of the sound.
Example of Block 2—Phase EQ
An example of the design of the Phase EQ block is depicted in
For general information on all-pass filters and basic signal processing, see reference [4].
Example of Phase EQ Part 1—LF Interaural Phase Difference
For example, the first part 30 of the Phase EQ block may introduce a phase shift between at least two signals, such as the left and right ear signals by applying a separate all-pass filter HIAP1 to the direct path signals and a different all-pass filter HIAP2 to the cross-feed signals. An important design parameter for HIAP1 and HIAP2 is for example the frequency dependency of the phase difference between HIAP1 and HIAP2. A phase difference is achieved by designing HIAP1 and HIAP2 with slightly different filter coefficients.
By way of example, the phase difference applied mimics the phase difference occurring between the ears naturally due to the different arrival times (ITD) of sound at the ears from a pair of loudspeakers positioned with different angles to the head. Thus, the perceived sound stage width becomes more natural compared to just simulating head shadowing. The ITD phase difference is modeled up to a maximum frequency of around 1 kHz. Above this frequency the phase difference between the HIAP1 and HIAP2 filters approaches zero to avoid comb filtering effects in the summation of the direct and cross-feed signal paths at the output.
Example of Phase EQ Part 2—HF Crosstalk Decorrelation
For example, the second part 35 of the Phase EQ block may implement decorrelating all-pass filters between the direct and cross-feed signal paths in a structure similar to part 1. The purpose of HDC1 and HDC2 is to make the phase difference between the direct and cross-feed signal paths become close to 90 degrees at high frequencies (above for example 1 kHz, the phase difference between HDC1 and HDC2 approaches zero at low frequencies). This is because if the phase difference is too small between the direct and cross-feed signal paths, the stereo difference signal (the signal produced by taking L-R) is strongly weakened in a way that does not happen at the ears of a listener in regular loudspeaker listening.
Example of Block 3—Reverberation
For example, the reverberation signal processing part is optional and applies reverberation filters to the signal. The reverb impulse response can for example be designed to be statistically similar to that found at the ears of a listener in a listening room with a perfectly diffuse sound field.
Implementation and Usage Examples
Different implementations and usages of the decoder are possible, for example:
In general, the proposed technology can be implemented in software, hardware, firmware or any combination thereof.
For example, the steps, functions, procedures and/or blocks described above may be implemented in hardware using any conventional technology, such as discrete circuit or integrated circuit technology, including both general-purpose electronic circuitry and application-specific circuitry.
Alternatively, at least some of the steps, functions, procedures and/or blocks described above may be implemented in software for execution by a suitable computer or processing device such as a microprocessor, Digital Signal Processor (DSP) and/or any suitable programmable logic device such as a Field Programmable Gate Array (FPGA) device, a Graphics Processing Unit (GPU) and a Programmable Logic Controller (PLC) device.
It should also be understood that it may be possible to re-use the general processing capabilities of any conventional unit. It may also be possible to re-use existing software, e.g. by reprogramming of the existing software or by adding new software components.
The flow diagram or diagrams presented herein may therefore be regarded as a computer flow diagram or diagrams, when performed by one or more processors. A corresponding apparatus may be defined as a group of function modules, where each step performed by the processor corresponds to a function module. In this case, the function modules are implemented as a computer program running on the processor.
In the following, an example of a computer implementation will be described with reference to
The processor(s) 140 and memory 150 are interconnected to each other to enable normal software execution. An optional input/output device may also be interconnected to the processor(s) 140 and/or the memory 150 to enable input and/or output of relevant data such as input parameter(s) and/or resulting output parameter(s).
In particular, the memory 150 comprises instructions executable by the processor 140, whereby the audio decoder 100 is operative to apply the head shadowing filters, to apply the phase shift filters and to sum the direct and cross-feed signal paths to provide output signals.
The term ‘computer’ should be interpreted in a general sense as any system or device capable of executing program code or computer program instructions to perform a particular processing, determining or computing task.
In a particular embodiment, the computer program 155/165 comprises instructions, which when executed by the processor 140 causes the processor 140 to:
The proposed technology also provides a carrier 150/160 comprising the computer program 155/165, wherein the carrier is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium.
The software may be realized as a computer program product, which is normally carried on a computer-readable medium, for example a CD, DVD, USB memory, hard drive or any other conventional memory device. The software may thus be loaded into the operating memory of a computer/processor for execution by the processor of the computer. The computer/processor does not have to be dedicated to only execute the above-described steps, functions, procedure and/or blocks, but may also execute other software tasks.
As indicated herein, the audio decoder may alternatively be defined as a group of function modules, where the function modules are implemented as a computer program running on at least one processor.
The computer program residing in memory may thus be organized as appropriate function modules configured to perform, when executed by the processor, at least part of the steps and/or tasks described herein. An example of such function modules is illustrated in
The representation module 170 is adapted for providing a computer representation of direct signal paths and cross-feed signal paths for the input signals. The first filtering module 175 is adapted for applying head shadowing filters in the direct signal paths and cross-feed signal paths for simulating head shadowing of loudspeakers placed at different angles to an intended listener. The second filtering module 180 is adapted for applying phase shift filters in the direct signal paths and cross-feed signal paths for introducing a phase difference between the direct signal paths and the cross-feed signal paths representing a phase difference occurring between the ears of the intended listener. The summing module 185 is adapted for summing the direct and cross-feed signal paths to provide output signals.
In a particular example, the audio decoder 100 further comprises a third optional filtering module for applying decorrelating filters in the direct signal paths and cross-feed signal paths for adjusting the phase difference between the direct signal paths and cross-feed signal paths to be constant around 90 degrees above a threshold frequency.
The embodiments described above are merely given as examples, and it should be understood that the proposed technology is not limited thereto. It will be understood by those skilled in the art that various modifications, combinations and changes may be made to the embodiments without departing from the scope of the present invention. In particular, different part solutions in the different embodiments can be combined in other configurations, where technically possible.
[2] Thomas, Martin V., “Improving the Stereo Headphone Sound Image”, Journal of the Audio Engineering Society, Volume 25 Issue 7/8 pp. 474-478; August 1977.
[3] Linkwitz, Siegfried, “Improved Headphone Listening”, Audio, North American Publishing Company, pp. 42-43; December 1971.
[4] Proakis, John. G. and Manolakis, Dimitris K., “Digital Signal Processing”, Prentice Hall, 4 edition, 2006.
[5] Blauert, Jens, “Spatial hearing: the psychophysics of human sound localization”, MIT Press, October, 1996.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/SE2014/050434 | 4/8/2014 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
61818522 | May 2013 | US |