The present invention relates to the generation of a room reflection and/or reverberation related contribution of a binaural signal, the generation of a binaural signal itself, and the forming of an inter-similarity decreasing set of head-related transfer functions.
The human auditory system is able to determine the direction or directions where sounds perceived come from. To this end, the human auditory system evaluates certain differences between the sound received at the right hand ear and sound received at the left hand ear. The latter information comprises, for example, so-called inter-aural cues which may, in turn, refer to the sound signal difference between ears. Inter-aural cues are the most important means for localization. The pressure level difference between the ears, namely the inter-aural level difference (ILD) is the most important single cue for localization. When the sound arrives from the horizontal plane with a non-zero azimuth, it has a different level in each ear. The shadowed ear has a naturally suppressed sound image, compared to the unshadowed ear. Another very important property dealing with localization is the inter-aural time difference (ITD). The shadowed ear has a longer distance to the sound source, and thus gets the sound wave front later than the unshadowed ear. The meaning of ITD is emphasized in the low frequencies which do not attenuate much when reaching the shadowed ear compared to the unshadowed ear. ITD is less important at the higher frequencies because the wavelength of the sound gets closer to the distance between the ears. Hence, in other words, localization exploits the fact that sound is subject to different interactions with the head, ears, and shoulders of the listener traveling from the sound source to the left and right ear, respectively.
Problems occur when a person listens to a stereo signal that is intended for being reproduced by a loud speaker setup via headphones. It is very likely that the listener would regard the sound as unnatural, awkward, and disturbing as the listener feels that the sound source is located in the head. This phenomenon is often referred in the literature as “in-the-head” localization. Long-term listening to “in-the-head” sound may lead to listening fatigue. It occurs because the information on which the human auditory system relies, when positioning the sound sources, i.e. the inter-aural cues, is missing or ambiguous.
In order to render stereo signals, or even multi-channel signals with more than two channels for headphone reproduction, directional filters may be used in order to model these interactions. For example, the generation of a headphone output from a decoded multi-channel signal may comprise filtering each signal after decoding by means of a pair of directional filters. These filters typically model the acoustic transmission from a virtual sound source in a room to the ear canal of a listener, the so-called binaural room transfer function (BRTF). The BRTF performs time, level and spectral modifications, and model room reflections and reverberation. The directional filters may be implemented in the time or frequency domain.
However, since there are many filters necessitated, namely N×2 with N being the number of decoded channels, these directional filters are rather long, such as 20000 filter taps at 44.1 kHz, and the process of filtering is computationally demanding. Therefore, the directional filters are sometimes reduced to a minimum. The so-called head-related transfer functions (HRTFs) contain the directional information including the interaural cures. A common processing block is used to model the room reflections and reverberation. The room processing module can be a reverberation algorithm in time or frequency domain, and may operate on a one or two channel input signal obtained from the multi-channel input signal by means of a sum of the channels of the multi-channel input signal. Such a structure is, for example, described in WO 99/14983 A1. As just described, the room processing block implements room reflections and/or reverberation. Room reflections and reverberation are essential to localized sounds, especially with respect to distance and externalization—meaning sounds are perceived outside the listener's head. The aforementioned document also suggests implementing the directional filters as a set of FIR filters operating on differently delayed versions of the respective channel, so as to model the direct path from the sound source to the respective ear and distinct reflections. Moreover, in describing several measures for providing a more pleasant listening experience over a pair of headphones, this document also suggests delaying a mixture of the center channel and the front left channel, and the center channel and the front right channel, respectively, relative to a sum and a difference of the rear left and rear right channels, respectively.
However, the listening results achieved thus far still lack to a large extent a reduced spatial width of the binaural output signal and a lack of externalization. Further, it has been realized that despite the abovementioned measures for rendering multi-channel signals for headphone reproduction, portions of voice in movie dialogs and music are often perceived unnaturally reverberant and spectrally unequal.
According to an embodiment, a device for generating a binaural signal based on a multi-channel signal representing a plurality of channels and intended for reproduction by a speaker configuration having a virtual sound source position associated to each channel may have: a similarity reducer for differently processing, and thereby reducing a similarity between, at least one of a left and a right channel of the plurality of channels, a front and a rear channel of the plurality of channels, and a center and a non-center channel of the plurality of channels, in order to obtain an inter-similarity reduced set of channels; a plurality of directional filters for modeling an acoustic transmission of a respective one of the inter-similarity reduced set of channels from a virtual sound source position associated with the respective channel of the inter-similarity reduced set of channels to a respective ear canal of a listener; a first mixer for mixing outputs of the directional filters modeling the acoustic transmission to the first ear canal of the listener to obtain a first channel of the binaural signal; and a second mixer for mixing outputs of the directional filters modeling the acoustic transmission to the second ear canal of the listener to obtain a second channel of the binaural signal; a downmix generator for forming a mono or stereo downmix of the plurality of channels represented by the multi-channel signal; and a room processor for generating a room-reflections/reverberation related contribution of the binaural signal, including a first channel output and a second channel output, by modeling room reflections/reverberations based on the mono or stereo signal, a first adder configured to add the first channel output of the room processor to the first channel of the binaural signal; and a second adder configured to add the second channel output of the room processor to the second channel of the binaural signal.
According to another embodiment, a device for generating a binaural signal based on a multi-channel signal representing a plurality of channels and intended for reproduction by a speaker configuration having a virtual sound source position associated to each channel may have: a similarity reducer for causing a relative delay between, and/or performing—in a spectrally varying sense—a phase and/or magnitude modification differently between at least two channels of the plurality of channels, in order to obtain an inter-similarity reduced set of channels; a plurality of directional filters for modeling an acoustic transmission of a respective one of the inter-similarity reduced set of channels from a virtual sound source position associated with the respective channel of the inter-similarity reduced set of channels to a respective ear canal of a listener; a first mixer for mixing outputs of the directional filters modeling the acoustic transmission to the first ear canal of the listener to obtain a first channel of the binaural signal; a second mixer for mixing outputs of the directional filters modeling the acoustic transmission to the second ear canal of the listener to obtain a second channel of the binaural signal; a downmix generator for forming a mono or stereo downmix of the plurality of channels represented by the multi-channel signal; a room processor for generating a room-reflections/reverberation related contribution of the binaural signal, including a first channel output and a second channel output, by modeling room reflections/reverberations based on the mono or stereo signal; a first adder configured to add the first channel output of the room processor to the first channel of the binaural signal; and a second adder configured to add the second channel output of the room processor to the second channel of the binaural signal.
According to another embodiment, a device for forming an inter-similarity decreasing set of HRTFs for modeling an acoustic transmission of a plurality of channels from a virtual sound source position associated with the respective channel to ear canals of a listener may have: an HRTF provider for providing an original plurality of HRTFs implemented as FIR filters, by looking-up or computing filter taps for each of the original plurality of HRTFs responsive to a selection or change of the virtual sound source positions; and an HRTF processor for causing impulse responses of the HRTFs modeling the acoustic transmissions of a predetermined pair of channels to be delayed relative to each other, or differently modifying—in a spectrally varying sense—phase and/or magnitude responses thereof, the pair of channels being one of a left and a right channel of the plurality of channels, a front and a rear channel of the plurality of channels, and a center and a non-center channel of the plurality of channels.
According to another embodiment, a method for generating a binaural signal based on a multi-channel signal representing a plurality of channels and intended for reproduction by a speaker configuration having a virtual sound source position associated to each channel may have the steps of: differently processing, and thereby reducing a correlation between, at least one of a left and a right channel of the plurality of channels, a front and a rear channel of the plurality of channels, and a center and a non-center channel of the plurality of channels, in order to obtain an inter-similarity reduced set of channels; subject the inter-similarity reduced set of channels to a plurality of directional filters for modeling an acoustic transmission of a respective one of the inter-similarity reduced set of channels from a virtual sound source position associated with the respective channel of the inter-similarity reduced set of channels to a respective ear canal of a listener; mixing outputs of the directional filters modeling the acoustic transmission to the first ear canal of the listener to obtain a first channel of the binaural signal; mixing outputs of the directional filters modeling the acoustic transmission to the second ear canal of the listener to obtain a second channel of the binaural signal; forming a mono or stereo downmix of the plurality of channels represented by the multi-channel signal; generating a room-reflections/reverberation related contribution of the binaural signal, including a first channel output and a second channel output, by modeling room reflections/reverberations based on the mono or stereo signal, adding the first channel output of the room processor to the first channel of the binaural signal; and adding the second channel output of the room processor to the second channel of the binaural signal.
According to another embodiment, a method for generating a binaural signal based on a multi-channel signal representing a plurality of channels and intended for reproduction by a speaker configuration having a virtual sound source position associated to each channel may have the steps of: performing—in a spectrally varying sense—a phase and/or magnitude modification differently between at least two channels of the plurality of channels, in order to obtain an inter-similarity reduced set of channels; subject the similarity reduced set of channels to a plurality of directional filters for modeling an acoustic transmission of a respective one of the inter-similarity reduced set of channels from a virtual sound source position associated with the respective channel of the inter-similarity reduced set of channels to a respective ear canal of a listener; mixing outputs of the directional filters modeling the acoustic transmission to the first ear canal of the listener to obtain a first channel of the binaural signal; and mixing outputs of the directional filters modeling the acoustic transmission to the second ear canal of the listener to obtain a second channel of the binaural signal; forming a mono or stereo downmix of the plurality of channels represented by the multi-channel signal; generating a room-reflections/reverberation related contribution of the binaural signal, including a first channel output and a second channel output, by modeling room reflections/reverberations based on the mono or stereo signal, adding the first channel output of the room processor to the first channel of the binaural signal; and adding the second channel output of the room processor to the second channel of the binaural signal.
According to another embodiment, a method for forming an inter-similarity decreasing set of head-related transfer functions for modeling an acoustic transmission of a plurality of channels from a virtual sound source position associated with the respective channel to ear canals of a listener may have the steps of: providing an original plurality of HRTFs implemented as FIR filters, by looking-up or computing filter taps for each of the original plurality of HRTFs responsive to a selection or change of the virtual sound source positions; and differently modifying—in a spectrally varying sense—phase and/or magnitude responses of impulse responses of the HRTFs modeling the acoustic transmissions of a predetermined pair of channels such that group delays of a first one of the HRTFs relative to another one of the HRTFs, show, for bark bands, a standard deviation of at least an eighth of a sample, the pair of channels being one of a left and a right channel of the plurality of channels, a front and a rear channel of the plurality of channels, and a center and a non-center channel of the plurality of channels.
Another embodiment may have a computer program having instructions for performing, when running on a computer, the inventive methods.
The first idea underlying the present application is that a more stable and pleasant binaural signal for headphone reproduction may be achieved by differently processing, and thereby reducing the similarity between, at least one of a left and a right channel of the plurality of input channels, a front and a rear channel of the plurality of input channels, and a center and a non-center channel of the plurality of channels, thereby obtaining an inter-similarity reduced set of channels. This inter-similarity reduced set of channels is then fed to a plurality of directional filters followed by respective mixers for the left and the right ear, respectively. By reducing the inter-similarity of channels of the multi-channel input signal, the spatial width of the binaural output signal may be increased and the externalization may be improved.
A further idea underlying the present application is that a more stable and pleasant binaural signal for headphone reproduction may be achieved by performing—in a spectrally varying sense—a phase and/or magnitude modification differently between at least two channels of the plurality of channels, thereby obtaining the inter-similarity reduced set of channels which, in turn, may then be fed to a plurality of directional filters followed by respective mixers for the left and the right ear, respectively. Again, by reducing the inter-similarity of channels of the multi-channel input signal, the spatial width of the binaural output signal may be increased and the externalization may be improved.
The abovementioned advantages are also achievable when forming an inter-similarity decreasing set of head-related transfer functions by causing the impulse responses of an original plurality of head-related transfer functions to be delayed relative to each other, or—in a spectrally varying sense—phase and/or magnitude responses of the original plurality of head-related transfer functions differently relative to each other. The formation may be done offline as a design step, or online during binaural signal generation, by using the head-related transfer functions as directional filters such as, for example, responsive to an indication of virtual sound source locations to be used.
Another idea underlying the present application is that some portions in movies and music result in a more naturally perceived headphone reproduction, when the mono or stereo downmix of the channels of the multi-channel signal to be subject to the room processor for generating the room-reflections/reverberation related contribution of the binaural signal, is formed such that the plurality of channels contribute to the mono or stereo downmix at a level differing among at least two channels of the multi-channel signal. For example, the inventors realized that voices in movie dialogs and music are typically mixed mainly to the center channel of a multi-channel signal, and that the center-channel signal, when fed to the room processing module, results in an often unnatural reverberant and spectrally unequal perceived output. The inventors discovered, however, that these deficiencies may be overcome by feeding the center channel to the room processing module with a level reduction such as by, for example, an attenuation of 3-12 dB, or specifically, 6 dB.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
a and 4b show block diagrams of the room processor of
The similarity reducer 12 is configured to turn the multi-channel signal 18 representing the plurality of channels 18a-18d, into an inter-similarity reduced set 20 of channels 20a-20d. The number of channels 18a-18d represented by the multi-channel signal 18 may be two or more. For illustration purposes only, four channels 18a-18d have explicitly been shown in
According to the embodiment of
As will be outlined in more detail below, the similarity reducer 12 may, for example, achieve the different processing by causing the respective pairs to be delayed relative to each other, or by subjecting the respective pairs of channels to delays of different amounts in, for example, each of a plurality of frequency bands, thereby obtaining an inter-correlation reduced set 20 of channels. There are, of course, other possibilities in order to decrease the correlation between the channels. In even other words, the correlation reducer 12 may have a transfer function according to which the spectral energy distribution of each channel remains the same, i.e. the transfer function as a magnitude of one over the relevant audio spectrum range wherein, however, the similarity reducer 12 differently modifies phases of subbands or frequency components thereof. For example, the correlation reducer 12 could be configured such that same causes a phase modification on all of, or one or several of, the channels 18 such that a signal of a first channel for a certain frequency band is delayed relative to another one of the channels by at least one sample. Further, the correlation reducer 12 could be configured such that same causes the phase modification such that the group delays of a first channel relative to another one of the channels for a plurality of frequency bands, show a standard deviation of at least one eighth of a sample. The frequency bands considered could be the Bark bands or a subset thereof or any other frequency band sub-division.
Reducing the correlation is not the only way to prevent the human auditory system from in-the-head localization. Rather, correlation is one of several possible measures by use of which the human auditory system measures the similarity of the sound arriving at both ears, and thus, the in-bound direction of sound. Accordingly, the similarity reducer 12 may also achieve the different processing by subjecting the respective pairs of channels to level reductions of different amounts in, for example, each of a plurality of frequency bands, thereby obtaining an inter-similarity reduced set 20 of channels in a spectrally formed way. The spectral formation may, for example, exaggerate the relative spectrally formed reduction occurring, for example, for rear channel sound relative to front channel sound due to the shadowing by the earlap. Accordingly, the similarity reducer 12 may subject the rear channel(s) to a spectrally varying level reductions relative to other channels. In this spectral forming, the similarity reducer 12 may have phase response being constant over the relevant audio spectrum range wherein, however, the similarity reducer 12 differently modifies magnitudes of subbands or frequency components thereof.
The way in which the multi-channel signal 18 represents a plurality of channels 18a-18d is, in principle, not restricted to any specific representation. For example, the multi-channel signal 18 could represent the plurality of channels 18a-18d in a compressed manner, using spatial audio coding. According to the spatial audio coding, the plurality of channels 18a-18d could be represented by means of a downmix signal down to which the channels are downmixed, accompanied by downmix information revealing the mixing ratio according to which the individual channels 18a-18d have been mixed into the downmix channel or downmix channels, and spatial parameters describing the spatial image of the multi-channel signal by means of, for example, level/intensity differences, phase differences, time differences and/or measures of correlation/coherence between individual channels 18a-18d. The output of the correlation reducer 12 is divided-up into the individual channels 20a-20d. The latter channels may, for example, be output as time signals or as spectrograms such as, for example, spectrally decomposed into subbands.
The directional filters 14a-14h are configured to model an acoustic transmission of a respective one of channels 20a-20d from a virtual sound source position associated with the respective channel to a respective ear canal of the listener. In
As will be described in more detail below with the respective embodiments, further contributions may be added to signals 22a and 22b, in order to take into account room reflections and/or reverberation. By this measure, the complexity of the directional filters 14a-14h may be reduced.
In the device of
Before turning to the next embodiment,
It should be noted that most measures performed by similarity reducer 12 to reduce the similarity between channels of the plurality 18 of channels 18a-18d could also be achieved by removing similarity reducer 12 with concurrently modifying the directional filters to perform not only the aforementioned modeling of the acoustic transmission, but also achieve the dis-similarity such as decorrelation just mentioned. Accordingly, the directional filters would therefore, for example, not model HRTFs, but modified head-related transfer functions.
The HRTF provider 32 is configured to provide an original plurality of HRTFs. Step 32 may comprise measurements using a standard dummy head, in order to measure the head-related transfer functions from certain sound positions to the ear canals of a standard dummy listener.
Similarly, the HRTF provider 32 may be configured to simply look-up or load the original HRTFs from a memory. Even alternatively, the HRTF provider 32 may be configured to compute the HRTFs according to a predetermined formula, depending on, for example, virtual sound source positions of interest. Accordingly, HRTF provider 32 may be configured to operate in a design environment for designing a binaural output signal generator, or may be part of such a binaural output signal generator signal itself, in order to provide the original HRTFs online such as, for example, responsive to a selection or change of the virtual sound source positions. For example, device 30 may be part of a binaural output signal generator which is able to accommodate multi-channel signals being intended for different speaker configurations having different virtual sound source positions associated with their channels. In this case, the HRTF provider 32 may be configured to provide the original HRTFs in a way adapted to the currently intended virtual sound source positions.
The HRTF processor 34, in turn, is configured to cause the impulse responses of at least a pair of the HRTFs to be displaced relative to each other or modify—in a spectrally varying sense—the phase and/or magnitude responses thereof differently relative to each other. The pair of HRTFs may model the acoustic transmission of one of left and right channels, front and rear channels, and center and non-center channels. In effect, this may be achieved by one or a combination of the following techniques applied to one or several channels of the multi-channel signal, namely delaying the HRTF of a respective channel, modifying the phase response of a respective HRTF and/or applying a decorrelation filter such as an all-pass filter to the respective HRTF, thereby obtaining a inter-correlation reduced set of HRTFs, and/or modifying—in a spectrally modifying sense—the magnitude response of a respective HRTF, thereby obtaining an, at least, inter-similarity reduced set of HRTFs. In either case, the resulting decorrelation/dissimilarity between the respective channels may support the human auditory system in externally localizing the sound source and thereby prevent in-the-head localization from occurring. For example, the HRTF processor 34 could be configured such that same causes a modification of the phase response of all of, or of one or several of, the channels HRTFs such that a group delay of a first HRTF for a certain frequency band is introduced—or a certain frequency band of a first HRTF is delayed—relative to another one of the HRTFs by at least one sample. Further, the HRTF processor 34 could be configured such that same causes the modification of the phase response such that the group delays of a first HRTF relative to another one of the HRTFs for a plurality of frequency bands, show a standard deviation of at least an eighth of a sample. The frequency bands considered could be the Bark bands or a subset thereof or any other frequency band sub-division.
The inter-similarity decreasing set of HRTFs resulting from the HRTF processor 34 may be used for setting the HRTFs of the directional filters 14a-14h of the device of
As already described above, the device of
The idea underlying the room processor 44 is that the room reflection/reverberation which occurs in, for example, a room, may be modeled in a manner transparent for the listener, based on a downmix such as a simple sum of the channels of the multi-channel signal 18. Since the room reflections/reverberation occur later than sounds traveling along the direct path or line of sight from the sound source to the ear canals, the room processor's impulse response is representative for, and substitutes, the tail of the impulse responses of the directional filters shown in
a and 4b show possible implementations for the room processor's internal structure. According to
Although it has been described that the downmix generator 42 may simply sum the channels of the multi-channel signal 18—with weighing each channel equally—this is not exactly the case with the embodiment of
For example, the downmix generator 42 of
The effect of the level reduction with respect to the center channel is that the binaural output signal obtained via contributions 56a and 56b is—at least in some circumstances which are discussed in more detail below—more naturally perceived by listeners than without the level reduction. In other words, the downmix generator 42 forms a weighted sum of the channels of the channels of the multi-channel signal 18, with the weighting value associated with the center channel being reduced relative to the weighting values of the other channels.
The level reduction of the center channel is especially advantageous during voice portions of movie dialogs or music. The audio impression improvement obtained during these voice portions over-compensates minor penalties due to the level reduction in non-voice phases. However, according to an alternative embodiment, the level reduction is not constant. Rather, the downmix generator 42 may be configured to switch between a mode where the level reduction is switched off, and a mode where the level reduction is switched on. In other words, the downmix generator 42 may be configured to vary the amount of level reduction in a time-varying manner. The variation may be of a binary or analogous nature, between zero and a maximum value. The downmix generator 42 may be configured to perform the mode switching or level reduction amount variation dependent on information contained within the multi-channel signal 18. For example, the downmix generator 42 may be configured to detect voice phases or distinguish these voice phases from non-voice phases, or may assign a voice content measure measuring the voice content, being of at least ordinal scale, to consecutive frames of the center channel. For example, the downmix generator 42 detects the presence of voice in the center channel by means of a voice filter and determines as to whether the output level of this filter exceeds the sum threshold. However, the detection of voice phases within the center channel by the downmix generator 42 is not the only way to make the afore-mentioned mode switching of level reduction amount variation time-dependent. For example, the multi-channel signal 18 could have side information associated therewith, which is especially intended for distinguishing between voice phases and non-voice phases, or measuring the voice content quantitatively. In this case, the downmix generator 42 would operate responsive to this side information. Another probability would be that the downmix generator 42 performs the aforementioned mode switching or level reduction amount variations dependent on a comparison between, for example, the current levels of the center channel, the left channel, and the right channel. In case the center channel is greater than the left and right channels, either individually or relative to the sum thereof, by more than a certain threshold ratio, then the downmix generator 42 may assume that a voice phase is currently present and act accordingly, i.e. by performing the level reduction. Similarly, the downmix generator 42 may use the level differences between the center, left and right channels in order to realize the abovementioned dependences.
Besides this, the downmix generator 42 may be responsive to spatial parameters used to describe the spatial image of the multiple channels of the multi-channel signal 18. This is shown in
Although level reduction with respect to the center channel has been described in order to exemplify the weighted summation of the plurality of channels such that same contribute to the mono or stereo downmix at a level differing among at least two channels of the multi-channel signal 18, there are also other examples where other channels are advantageously level-reduced or level-amplified relative to another channel or other channels because some sound source content present in this or these channels is/are to, or is/are not to, be subject to the room processing at the same level as other contents in the multi-channel signal but at a reduced/increased level.
It should be noted that the aforementioned embodiments may be combined with each other. Some combination possibilities have already been mentioned above. Further possibilities will be mentioned in the following with respect to the embodiments of
Similarly, downmix generator 42 may be configured to suitably combine the spatial parameters 64 and the level reduction amount to be achieved for the center channel in order to derive the mono or stereo downmix 48 intended for the room processor 44.
In order to ease the understanding of the following description of the device of
The device of
Typically, multi-channel sound is produced such that the dominating sound energy is contained in the front channels, i.e. left front, right front, center. Voices in movie dialogs and music are typically mixed mainly to the center channel. If center channel signals are fed to the room processing module 122, the resulting output is often perceived unnaturally reverberant and spectrally unequal. Therefore, according to the embodiment of
Thus, according to the embodiment of
Similarly to
As described above, device 30 could operate responsive to the change in the loudspeaker configuration for which the bitstream at bitstream input 126 is intended.
The embodiments of
The general structure of a spatial audio decoder for headphone output is given in
Internally, the subband modifier 202 comprises an analysis filterbank 208, a matrixing unit or linear combiner 210 and a synthesis filterbank 212 connected in the order mentioned between the downmix signal input and the output of subband modifier 202. Further, the subband modifier 202 comprises a parameter converter 214 which is fed by the spatial parameters 206 and a modified set of HRTFs as obtained by device 30.
In
Thus, the output of the matrixing unit 210 is a modified spectrogram as shown in
In case of
As
The parameters 270 for the weighting stages 264a-264e are, as described above, selected such that the above-described center channel level reduction in the stereo downmix 48 is achieved resulting, as described above, in the advantages with respect to natural sound perception.
Thus, in other words,
The EQ parameters 270 fed into the modified spatial audio subband modifier 234 may have the following properties. Firstly, the center channel signal may be attenuated by at least 6 dB. Further, the center channel signal may have a low-pass characteristic. Even further, the difference signal of the remaining channels may be boosted at low frequencies. In order to compensate the lower level of the center channel 242 relative to the other channels 244 and 246, the gain of the HRTF parameters for the center channel used in the binaural spatial audio subband modifier 202 should be increased accordingly.
The main goal of the setting of the EQ parameters is the reduction of the center channel signal in the output for the room processing module. However, the center channel should only be suppressed to a limited extent: the center channel signal is subtracted from the left and the right downmix channels inside the TTT box. If the center level is reduced, artifacts in the left and right channel may become audible. Therefore, center level reduction in the EQ stage is a trade-off between suppression and artifacts. Finding a fixed setting of EQ parameters is possible, but may not be optimal for all signals. Accordingly, according to an embodiment, an adaptive algorithm or module 274 may be used to control the amount of center level reduction by one, or a combination of the following parameters:
The spatial parameters 206 used to decode the center channel 242 from the left and right downmix channel 204 inside the TTT box 262 may be used as indicated by dashed line 276.
The level of center, left and right channels may be used as indicated by dashed line 278.
The level differences between center, left and right channels 242-246 may be used as also indicated by dashed line 278.
The output of a single-type detection algorithm, such as a voice activity detector, may be used as also indicated by dashed line 278.
Lastly, static of dynamic metadata describing the audio content may be used in order to determine the amount of center level reduction as indicated by dashed line 280.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, wherein a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus such as a part of an ASIC, a sub-routine of a program code or a part of a programmed programmable logic.
The inventive encoded audio signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are performed by any hardware apparatus.
While this invention has been described in terms of several advantageous embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
This application is a continuation of copending International Application No. PCT/EP2009/005548, filed Jul. 30, 2009, which is incorporated herein by reference in its entirety, and additionally claims priority from U.S. application Ser. No. 61/085,286, filed Jul. 31, 2008, which is incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
4359605 | Haramoto et al. | Nov 1982 | A |
5371799 | Lowe et al. | Dec 1994 | A |
6236730 | Cowieson et al. | May 2001 | B1 |
6614912 | Yamada et al. | Sep 2003 | B1 |
7394903 | Herre et al. | Jul 2008 | B2 |
7945054 | Kim | May 2011 | B2 |
8027479 | Villemoes | Sep 2011 | B2 |
8284946 | Moon et al. | Oct 2012 | B2 |
20030014136 | Wang et al. | Jan 2003 | A1 |
20030044032 | Irwan et al. | Mar 2003 | A1 |
20050100171 | Reilly et al. | May 2005 | A1 |
20050157883 | Herre et al. | Jul 2005 | A1 |
20060115091 | Kim et al. | Jun 2006 | A1 |
20070172086 | Dickins et al. | Jul 2007 | A1 |
20070223708 | Villemoes et al. | Sep 2007 | A1 |
20080025519 | Yu et al. | Jan 2008 | A1 |
20080037796 | Jot et al. | Feb 2008 | A1 |
20080091436 | Breebaart et al. | Apr 2008 | A1 |
20080273708 | Sandgren et al. | Nov 2008 | A1 |
Number | Date | Country |
---|---|---|
1879450 | Dec 2006 | CN |
H09-247799 | Sep 1997 | JP |
10336800 | Dec 1998 | JP |
H11-275696 | Oct 1999 | JP |
2000-069598 | Mar 2000 | JP |
2001517050 | Oct 2001 | JP |
2003-333698 | Nov 2003 | JP |
2006217210 | Aug 2006 | JP |
2008507184 | Mar 2008 | JP |
10-2006-0120109 | Nov 2006 | KR |
2323551 | Apr 2008 | RU |
2329548 | Jul 2008 | RU |
2330390 | Jul 2008 | RU |
WO9914983 | Mar 1999 | WO |
WO2005048653 | May 2005 | WO |
WO-2007031906 | Mar 2007 | WO |
WO-2007106553 | Sep 2007 | WO |
WO2008003881 | Jan 2008 | WO |
Entry |
---|
Christof Faller, Binaural Cue Coding-Part II: Schemes and Application, p. 527. |
Brebaart J et al: “Multi-channel goes mobile: MPEG surround binaural rendering” AES International Conference. Audio for Mobile and Handheld Devices, Seoul, Korea, Sep. 2, 2006, pp. 1-13, XP007902577 figure 1. |
Int'l Search Report and Written Opinion, mailed Jan. 27, 2010, in related PCT patent application No. PCT/EP2009/005548, 20 pages. |
Int'l Preliminary Report on Patentability, mailed Sep. 14, 2010, in related PCT patent application No. PCT/EP2009/005548, 15 pages. |
Number | Date | Country | |
---|---|---|---|
20110211702 A1 | Sep 2011 | US |
Number | Date | Country | |
---|---|---|---|
61085286 | Jul 2008 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2009/005548 | Jul 2009 | US |
Child | 13015335 | US |