The present application is related to an apparatus and a method for mapping first and second input channels to at least one output channel and, in particular, an apparatus and a method suitable to be used in a format conversion between different loudspeaker channel configurations.
Spatial audio coding tools are well-known in the art and are standardized, for example, in the MPEG-surround standard. Spatial audio coding starts from a plurality of original input, e.g., five or seven input channels, which are identified by their placement in a reproduction setup, e.g., as a left channel, a center channel, a right channel, a left surround channel, a right surround channel and a low frequency enhancement (LFE) channel. A spatial audio encoder may derive one or more downmix channels from the original channels and, additionally, may derive parametric data relating to spatial cues such as interchannel level differences in the channel coherence values, interchannel phase differences, interchannel time differences, etc. The one or more downmix channels are transmitted together with the parametric side information indicating the spatial cues to a spatial audio decoder for decoding the downmix channels and the associated parametric data in order to finally obtain output channels which are an approximated version of the original input channels. The placement of the channels in the output setup may be fixed, e.g., a 5.1 format, a 7.1 format, etc.
Also, spatial audio object coding tools are well-known in the art and are standardized, for example, in the MPEG SAOC standard (SAOC=spatial audio object coding). In contrast to spatial audio coding starting from original channels, spatial audio object coding starts from audio objects which are not automatically dedicated for a certain rendering reproduction setup. Rather, the placement of the audio objects in the reproduction scene is flexible and may be set by a user, e.g., by inputting certain rendering information into a spatial audio object coding decoder. Alternatively or additionally, rendering information may be transmitted as additional side information or metadata; rendering information may include information at which position in the reproduction setup a certain audio object is to be placed (e.g. over time). In order to obtain a certain data compression, a number of audio objects is encoded using an SAOC encoder which calculates, from the input objects, one or more transport channels by downmixing the objects in accordance with certain downmixing information. Furthermore, the SAOC encoder calculates parametric side information representing inter-object cues such as object level differences (OLD), object coherence values, etc. As in SAC (SAC=Spatial Audio Coding), the inter object parametric data is calculated for individual time/frequency tiles. For a certain frame (for example, 1024 or 2048 samples) of the audio signal a plurality of frequency bands (for example 24, 32, or 64 bands) are considered so that parametric data is provided for each frame and each frequency band. For example, when an audio piece has 20 frames and when each frame is subdivided into 32 frequency bands, the number of time/frequency tiles is 640.
A desired reproduction format, i.e. an output channel configuration (output loudspeaker configuration) may differ from an input channel configuration, wherein the number of output channels is generally different from the number of input channels. Thus, a format conversion may be necessitated to map the input channels of the input channel configuration to the output channels of the output channel configuration.
It is the object underlying the invention to provide for an apparatus and a method which permit an improved sound reproduction, in particular in case of a format conversion between different loudspeaker channel configurations.
An embodiment may have an apparatus for mapping a first input channel and a second input channel of an input channel configuration to at least one output channel of an output channel configuration, wherein each input channel and each output channel has a direction in which an associated loudspeaker is located relative to a central listener position, wherein the apparatus is configured to: map the first input channel to a first output channel of the output channel configuration; and at least one of a) map the second input channel to the first output channel, including processing the second input channel by applying at least one of an equalization filter and a decorrelation filter to the second input channel; and b) despite of the fact that an angle deviation between a direction of the second input channel and a direction of the first output channel is less than an angle deviation between a direction of the second input channel and the second output channel and/or is less than an angle deviation between the direction of the second input channel and the direction of the third output channel, map the second input channel to the second and third output channels by panning between the second and third output channels.
According to another embodiment, a method for mapping a first input channel and a second input channel of an input channel configuration to at least one output channel of an output channel configuration, wherein each input channel and each output channel has a direction in which an associated loudspeaker is located relative to a central listener position, may have the steps of: mapping the first input channel to a first output channel of the output channel configuration; and at least one of a) mapping the second input channel to the first output channel, including processing the second input channel by applying at least one of an equalization filter and a decorrelation filter to the second input channel; and b) despite of the fact that an angle deviation between a direction of the second input channel and a direction of the first output channel is less than an angle deviation between a direction of the second input channel and the second output channel and/or is less than an angle deviation between the direction of the second input channel and the direction of the third output channel, mapping the second input channel to the second and third output channels by panning between the second and third output channels.
Another embodiment may have a non-transitory digital storage medium having a computer program stored thereon to perform the inventive method when said computer program is run by a computer.
Embodiments of the invention provide for an apparatus for mapping a first input channel and a second input channel of an input channel configuration to at least one output channel of an output channel configuration, wherein each input channel and each output channel has a direction in which an associated loudspeaker is located relative to a central listener position, wherein the apparatus is configured to:
map the first input channel to a first output channel of the output channel configuration; and at least one of
a) map the second input channel to the first output channel, comprising processing the second input channel by applying at least one of an equalization filter and a decorrelation filter to the second input channel; and
b) despite of the fact that an angle deviation between a direction of the second input channel and a direction of the first output channel is less than an angle deviation between a direction of the second input channel and the second output channel and/or is less than an angle deviation between the direction of the second input channel and the direction of the third output channel, map the second input channel to the second and third output channels by panning between the second and third output channels.
Embodiments of the invention provide for a method for mapping a first input channel and a second input channel of an input channel configuration to at least one output channel of an output channel configuration, wherein each input channel and each output channel has a direction in which an associated loudspeaker is located relative to a central listener position, comprising:
mapping the first input channel to a first output channel of the output channel configuration; and at least one of
a) mapping the second input channel to the first output channel, comprising processing the second input channel by applying at least one of an equalization filter and a decorrelation filter to the second input channel; and
b) despite of the fact that an angle deviation between a direction of the second input channel and a direction of the first output channel is less than an angle deviation between a direction of the second input channel and the second output channel and/or is less than an angle deviation between the direction of the second input channel and the direction of the third output channel, mapping the second input channel to the second and third output channels by panning between the second and third output channels.
Embodiments of the invention are based on the finding that an improved audio reproduction can be achieved even in case of a downmixing process from a number of input channels to a smaller number of output channels if an approach is used which is designed to attempt to preserve the spatial diversity of at least two input channels which are mapped to at least one output channel. According to embodiments of the invention, this is achieved by processing one of the input channels mapped to the same output channel by applying at least one of an equalization filter and a decorrelation filter. In embodiments of the invention, this is achieved by generating a phantom source for one of the input channels using two output channels, at least one of which has an angle deviation from the input channel which is larger than an angle deviation from the input channel to another output channel.
In embodiments of the invention, an equalization filter is applied to the second input channel and is configured to boost a spectral portion of the second input channel, which is known to give the listener the impression that sound comes from a position corresponding to the position of the second input channel. In embodiments of the invention, an elevation angle of the second input channel may be larger than an elevation angle of the one or more output channels the input channel is mapped to. For example, a loudspeaker associated with the second input channel may be at a position above a horizontal listener plane, while loudspeakers associated with the one or more output channels may be at a position in the horizontal listener plane. The equalization filter may be configured to boost a spectral portion of the second channel in a frequency range between 7 kHz and 10 kHz. By processing the second input signal in this manner, a listener may be given the impression that the sound comes from an elevated position even if it actually does not come from an elevated position.
In embodiments of the invention, the second input channel is processed by applying an equalization filter configured to process the second input channel in order to compensate for timbre differences caused by different positions of the second input channel and the at least one output channel which the second input channel is mapped to. Thus, the timbre of the second input channel, which is reproduced by a loudspeaker at a wrong position may be manipulated so that a user may get the impression that the sound stems from another position closer to the original position, i.e. the position of the second input channel.
In embodiments of the invention, a decorrelation filter is applied to the second input channel. Applying a decorrelation filter to the second input channel may also give a listener the impression that sound signals reproduced by the first output channel stem from different input channels located at different positions in the input channel configuration. For example, the decorrelation filter may be configured to introduce frequency dependent delays and/or randomized phases into the second input channel. In embodiments of the invention, the decorrelation filter may be a reverberation filter configured to introduce reverberation signal portions into the second input channel, so that a listener may get the impression that the sound signals reproduced via the first output channel stem from different positions. In embodiments of the invention, the decorrelation filter may be configured to convolve the second input channel with an exponentially decaying noise sequence in order to simulate diffuse reflections in the second input signal.
In embodiments of the invention, coefficients of the equalization filter and/or the decorrelation filter are set based on a measured binaural room impulse response (BRIR) of a specific listening room or are set based on empirical knowledge about room acoustics (which may also take into consideration a specific listening room). Thus, the respective processing in order to take spatial diversity of the input channels into consideration may be adapted through the specific scenery, such as the specific listening room, in which the signal is to be reproduced by means of the output channel configuration.
Embodiments of the invention will be detailed below referring to the accompanying figures, in which:
Before describing embodiments of the inventive approach in detail, an overview of a 3D audio codec system in which the inventive approach may be implemented is given.
The encoding/decoding system depicted in
The pre-renderer/mixer 102 may be optionally provided to convert a channel plus object input scene into a channel scene before encoding. Functionally, it is identical to the object renderer/mixer that will be described in detail below. Pre-rendering of objects may be desired to ensure a deterministic signal entropy at the encoder input that is basically independent of the number of simultaneously active object signals. With pre-rendering of objects, no object metadata transmission is necessitated. Discrete object signals are rendered to the channel layout that the encoder is configured to use. The weights of the objects for each channel are obtained from the associated object metadata (OAM).
The USAC encoder 116 is the core codec for loudspeaker-channel signals, discrete object signals, object downmix signals and pre-rendered signals. It is based on the MPEG-D USAC technology. It handles the coding of the above signals by creating channel- and object mapping information based on the geometric and semantic information of the input channel and object assignment. This mapping information describes how input channels and objects are mapped to USAC-channel elements, like channel pair elements (CPEs), single channel elements (SCEs), low frequency effects (LFEs) and channel quad elements (QCEs) and CPEs, SCEs and LFEs, and the corresponding information is transmitted to the decoder. All additional payloads like SAOC data 114, 118 or object metadata 126 are considered in the encoders rate control. The coding of objects is possible in different ways, depending on the rate/distortion requirements and the interactivity requirements for the renderer. In accordance with embodiments, the following object coding variants are possible:
The SAOC encoder 112 and the SAOC decoder 220 for object signals may be based on the MPEG SAOC technology. The system is capable of recreating, modifying and rendering a number of audio objects based on a smaller number of transmitted channels and additional parametric data, such as OLDs, IOCs (Inter Object Coherence), DMGs (Down Mix Gains). The additional parametric data exhibits a significantly lower data rate than necessitated for transmitting all objects individually, making the coding very efficient. The SAOC encoder 112 takes as input the object/channel signals as monophonic waveforms and outputs the parametric information (which is packed into the 3D-Audio bitstream 128) and the SAOC transport channels (which are encoded using single channel elements and are transmitted). The SAOC decoder 220 reconstructs the object/channel signals from the decoded SAOC transport channels 210 and the parametric information 214, and generates the output audio scene based on the reproduction layout, the decompressed object metadata information and optionally on the basis of the user interaction information.
The object metadata codec (see OAM encoder 124 and OAM decoder 224) is provided so that, for each object, the associated metadata that specifies the geometrical position and volume of the objects in the 3D space is efficiently coded by quantization of the object properties in time and space. The compressed object metadata cOAM 126 is transmitted to the receiver 200 as side information.
The object renderer 216 utilizes the compressed object metadata to generate object waveforms according to the given reproduction format. Each object is rendered to a certain output channel 218 according to its metadata. The output of this block results from the sum of the partial results. If both channel based content as well as discrete/parametric objects are decoded, the channel based waveforms and the rendered object waveforms are mixed by the mixer 226 before outputting the resulting waveforms 228 or before feeding them to a postprocessor module like the binaural renderer 236 or the loudspeaker renderer module 232.
The binaural renderer module 236 produces a binaural downmix of the multichannel audio material such that each input channel is represented by a virtual sound source. The processing is conducted frame-wise in the QMF (Quadrature Mirror Filterbank) domain, and the binauralization is based on measured binaural room impulse responses.
The loudspeaker renderer 232 converts between the transmitted channel configuration 228 and the desired reproduction format. It may also be called “format converter”. The format converter performs conversions to lower numbers of output channels, i.e., it creates downmixes.
A possible implementation of a format converter 232 is shown in
Embodiments of the present invention relate to an implementation of the loudspeaker renderer 232, i.e. apparatus and methods for implementing part of the functionality of the loudspeaker renderer 232.
Reference is now made to
In the following, the low frequency enhancement channel is not considered since the exact position of the loudspeaker (subwoofer) associated with the low frequency enhancement channel is not important.
The channels are arranged at specific directions with respect to a central listener position P. The direction of each channel is defined by an azimuth angle α and an elevation angle β, see
The elevation angle β of a channel defines the angle between the horizontal listener plane 300 and the direction of a virtual connection line between the central listener position and the loudspeaker associated with the channel. In the configuration shown in
The position of a particular channel in space, i.e. the loudspeaker position associated with the particular channel) is given by the azimuth angle, the elevation angle and the distance of the loudspeaker from the central listener position. It is to be noted that the term “position of a loudspeaker” is often described by those skilled in the art by referring to the azimuth angle and the elevation angle only.
Generally, a format conversion between different loudspeaker channel configurations is performed as a downmixing process that maps a number of input channels to a number of output channels, wherein the number of output channels is generally smaller than the number of input channels, and wherein the output channel positions may differ from the input channel positions. One or more input channels may be mixed together to the same output channel. At the same time, one or more input channels may be rendered over more than one output channel. This mapping from the input channels to the output channel is typically determined by a set of downmix coefficients, or alternatively formulated as a downmix matrix. The choice of downmix coefficients significantly affects the achievable downmix output sound quality. Bad choices may lead to an unbalanced mix or bad spatial reproduction of the input sound scene.
Each channel has associated therewith an audio signal to be reproduced by the associated loudspeaker. The teaching that a specific channel is processed (such as by applying a coefficient, by applying an equalization filter or by applying a decorrelation filter) means that the corresponding audio signal associated with this channel is processed. In the context of this application, the term “equalization filter” is meant to encompass any means to apply an equalization to the signal such that a frequency dependent weighting of portions of the signal is achieved. For example, an equalization filter may be configured to apply frequency-dependent gain coefficients to frequency bands of the signal. In the context of this application, the term “decorrelation filter” is meant to encompass any means to apply a decorrelation to the signal, such as by introducing frequency dependent delays and/or randomized phases to the signal. For example, a decorrelation filter may be configured to apply frequency dependent delay coefficients to frequency bands of the signal and/or to apply randomized phase coefficients to the signal.
In embodiments of the invention, mapping an input channel to one or more output channels includes applying at least one coefficient to be applied to the input channel for each output channel to which the input channel is mapped. The at least one coefficient may include a gain coefficient, i.e. a gain value, to be applied to the input signal associated with the input channel, and/or a delay coefficient, i.e. a delay value to be applied to the input signal associated with the input channel. In embodiments of the invention, mapping may include applying frequency selective coefficients, i.e. different coefficients for different frequency bands of the input channels. In embodiments of the invention, mapping the input channels to the output channels includes generating one or more coefficient matrices from the coefficients. Each matrix defines a coefficient to be applied to each input channel of the input channel configuration for each output channel of the output channel configuration. For output channels, which the input channel is not mapped to, the respective coefficient in the coefficient matrix will be zero. In embodiments of the invention, separate coefficient matrices for gain coefficients and delay coefficients may be generated. In embodiments of the invention, a coefficient matrix for each frequency band may be generated in case the coefficients are frequency selective. In embodiments of the invention, mapping may further include applying the derived coefficients to the input signals associated with the input channels.
To obtain good downmix coefficients, an expert (e.g. a sound engineer) may manually tune the coefficients, taking into account his expert knowledge. Another possibility is to automatically derive downmix coefficients for a given combination of input and output configurations by treating each input channel as a virtual sound source whose position in space is given by the position in space associated with the particular channel, i.e. the loudspeaker position associated with the particular input channel. Each virtual source can be reproduced by a generic panning algorithm like tangent-law panning in 2D or vector base amplitude panning (VBAP) in 3D, see V. Pulkki: “Virtual Sound Source Positioning Using Vector Base Amplitude Panning”, Journal of the Audio Engineering Society, vol. 45, pp. 456-466, 1997. Another proposal for a mathematical, i.e. automatic, derivation of downmix coefficients for a given combination of input and output configurations has been made by A. Ando: “Conversion of Multichannel Sound Signal Maintaining Physical Properties of Sound in Reproduced Sound Field”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 6, August 2011.
Accordingly, existing downmix approaches are mainly based on three strategies for the derivation of downmix coefficients. The first strategy is a direct mapping of discarded input channels to output channels at the same or comparable azimuth position. Elevation offsets are neglected. For example, it is a common practice to render height channels directly with horizontal channels at the same or comparable azimuth position, if the height layer is not present in the output channel configuration. A second strategy is the usage of generic panning algorithms, which treat the input channels as virtual sound sources and preserve azimuth information by introducing phantom sources at the position of discarded input channels. Elevation offsets are neglected. In state of the art methods panning is only used if there is no output loudspeaker available at the desired output position, for example at the desired azimuth angle. A third strategy is the incorporation of expert knowledge for the derivation of optimal downmix coefficients in empirical, artistic or psychoacoustic sense. Separate or combined application of different strategies may be used.
Embodiments of the invention provide for a technical solution allowing to improve or optimize a downmixing process such that higher quality downmix output signals can be obtained than without utilizing this solution. In embodiments, the solution may improve the downmix quality in cases where the spatial diversity inherent to the input channel configuration would be lost during downmixing without applying the proposed solution.
To this end, embodiments of the invention allow preserving the spatial diversity that is inherent to the input channel configuration and that is not preserved by a straightforward downmix (DMX) approach. In downmix scenarios, in which the number of acoustic channels is reduced, embodiments of the invention mainly aim at reducing the loss of diversity and envelopment, which implicitly occurs when mapping from a higher to a lower number of channels.
The inventors recognized that, dependent on the specific configuration, the inherent spatial diversity and the spatial envelopment of an input channel configuration is often considerably decreased or completely lost in the output channel configuration. Furthermore, if auditory events are simultaneously reproduced from several speakers in the input configuration, they get more coherent, condensed and focused in the output configuration. This may lead to a perceptually more pressing spatial impression, which often appears to be less enjoyable than the input channel configuration. Embodiments of the invention aim for an explicit preservation of spatial diversity in the output channel configuration for the first time. Embodiments of the invention aim at preserving the perceived location of an auditory event as close as possible compared to the case of using the original input channel loudspeaker configuration.
Accordingly, embodiments of the invention provide for a specific approach of mapping a first input channel and a second input channel, which are associated with different loudspeaker positions of an input channel configuration and therefore comprise a spatial diversity, to at least one output channel. In embodiments of the invention, the first and second input channels are at different elevations relative to a horizontal listener plane. Thus, elevation offsets between the first input channel and the second input channel may be taken into consideration in order to improve the sound reproduction using the loudspeakers of the output channel configuration.
In the context of this application, diversity can be described as follows. Different loudspeakers of an input channel configuration result in different acoustic channels from loudspeakers to ears, such as ears of the listener at position P. There is a number of direct acoustic paths and a number of indirect acoustic paths, also known as reflections or reverberation, which emerge from a diverse listening room excitement and which add additional decorrelation and timbre changes to the perceived signals from different loudspeaker positions. Acoustic channels can be fully modeled by BRIRs, which are characteristic for each listening room. The listening experience of an input channel configuration is strongly dependent on a characteristic combination of different input channels and diverse BRIRs, which correspond to specific loudspeaker positions. Thus, diversity and envelopment arises from diverse signal modifications, which are inherently applied to all loudspeaker signals by the listening room.
A reasoning for the need of downmix approaches, which preserve the spatial diversity of an input channel configuration is now given. An input channel configuration may utilize more loudspeakers than an output channel configuration or may use at least one loudspeaker not present in the output loudspeaker configuration. Merely for illustration purposes, an input channel configuration may utilize loudspeakers LC, CC, RC, ECC as shown in
In the following, embodiments of the invention are described referring to the specific scenario shown in
Input channel configuration: four loudspeakers LC, CC, RC and ECC at positions x1=(α1, β1), x2=(α2, β1), x3=(α3, β1) and x4=(α4, β2), wherein α2≈α4 or α2=α4.
Output channel configuration: three loudspeakers at position x1=(α1, β1), x2=(α2, β1) and x3=(α3, β1), i.e. the loudspeaker at position x4 is discarded in the downmix. α represents the azimuth angle and β represents the elevation angle.
As explained above, a straightforward DMX approach would prioritize the preservation of directional azimuth information and just neglect any elevation offset. Thus, signals from loudspeaker ECC at position x4 would be simply passed to loudspeaker CC at position x2. However, when doing so characteristics are lost. Firstly, timbre differences, due to different BRIRs, which are inherently applied at the reproduction positions x2 and x4 are lost. Secondly, spatial diversity of the input signals, which are reproduced at different positions x2 and x4 are lost. Thirdly, an inherent decorrelation of input signals due to different acoustic propagation paths from positions x2 and x4 to the listeners ears is lost.
Embodiments of the invention aim at a preservation or emulation of one or more of the described characteristics by applying the strategies explained herein separately or in combination for the downmixing process.
It is clear to those skilled in the art that the apparatuses explained and described in the present application may be implemented by means of respective computers or processors configured and/or programmed to obtain the functionality described. Alternatively, the apparatuses may be implemented as other programmed hardware structures, such as field programmable gate arrays and the like.
The first input channel 12 in
In embodiments of the invention, a decorrelation filter is configured to preserve an inherent decorrelation of input signals due to different acoustic propagation paths from the different loudspeaker positions associated with the first and second input channels to the listener's ears.
In an embodiment of the invention, an equalization filter is applied to the second input channel, i.e. the audio signal associated with the second input channel at position x4, if it is downmixed to the loudspeaker CC at the position x2. The equalization filter compensates for timbre changes of different acoustical channels and may be derived based on empirical expert knowledge and/or measured BRIR data or the like. For example, it is assumed that the input channel configuration provides a Voice of God (VoG) channel at 90° elevation. If the output channel configuration only provides loudspeakers in one layer and the VoG channel is discarded like, e.g. with a 5.1 output configuration, it is a simple straightforward approach to distribute the VoG channel to all output loudspeakers to preserve the directional information of the VoG channel at least in the sweet spot. However, the original VoG loudspeaker is perceived quite differently due to a different BRIR. By applying a dedicated equalization filter to the VoG channel before the distribution to all output loudspeakers, the timbre difference can be compensated.
In embodiments of the invention, the equalization filter may be configured to perform a frequency-dependent weighting of the corresponding input channel to take into consideration psychoacoustic findings about directional perception of audio signals. An example of such findings are the so called Blauert bands, representing direction determining bands.
In embodiments of the invention, the equalization filter is configured utilizing this recognition. In other words, the equalization filter may be configured to apply higher gain coefficients (boost) to frequency bands which are known to give a user the impression that sound comes from a specific directions, when compared to the other frequency bands. To be more specific, in case an input channel is mapped to a lower output channel, a spectral portion of the input channel in the frequency band 1200 range between 7 kHz and 10 kHz may be boosted when compared to other spectral portions of the second input channels so that the listener may get the impression that the corresponding signal stems from an elevated position. Likewise, the equalization filter may be configured to boost other spectral portions of the second input channel as shown in
In embodiments of the invention, the apparatus is configured to apply a decorrelation filter to the second input channel. For example, a decorrelation/reverberation filter may be applied to the input signal associated with the second input channel (associated with the loudspeaker at position x4), if it is downmixed to a loudspeaker at the position x2. Such a decorrelation/reverberation filter may be derived from BRIR measurements or empirical knowledge about room acoustics or the like. If the input channel is mapped to multiple output channels, the filter signal may be reproduced over the multiple loudspeakers, where for each loudspeaker different filters may be applied. The filter(s) may also only model early reflections.
In embodiments of the invention, filter 32 may be a decorrelation or a reverberation filter in order to model the additional room effect perceived when two separate acoustic channels are present. Decorrelation may have the additional benefit that DMX cancellation artifacts may be reduced by this notification. In embodiments of the invention, filter 32 may be an equalization filter and may be configured to perform a timbre equalization. In other embodiments of the invention, a decorrelation filter and a reverberation filter may be applied in order to apply timbre equalization and decorrelation before downmixing the signal of the elevated loudspeaker. In embodiments of the invention, filter 32 may be configured to combine both functionalities, i.e. timbre equalization and decorrelation.
In embodiments of the invention, the decorrelation filter may be implemented as a reverberation filter introducing reverberations into the second input channel. In embodiments of the inventions, the decorrelation filter may be configured to convolve the second input channel with an exponentially decaying noise sequence. In embodiments of the invention, any decorrelation filter may be used that decorrelates the second input channel in order to preserve the impression for a listener in that the signal from the first input channel and the second input channel stem from loudspeakers at different positions.
When considering the scenery in
In the embodiments described with respect to
In embodiments of the invention, in addition to panning, an equalization filter may be applied to compensate for possible timbre changes due to different BRIRs.
An embodiment of an apparatus 60 implementing the panning approach is shown in
In embodiments of the invention, panning may be achieved using common panning algorithms, such as generic panning algorithms like tangent-law panning in 2D or vector base amplitude panning in 3D, see V. Pulkki: “Virtual Sound Source Positioning Using Vector Base Amplitude Panning”, Journal of the Audio Engineering Society, vol. 45, pp. 456-466, 1997, and need not be described in more detail herein. The panning gains of the applied panning law determine the gains that are applied when mapping the input channels to the output channels. The respective signals obtained are added to the second and third output channels 42 and 44, see adder blocks 64 in
In alternative embodiments, block 62 may be modified in order to additionally provide for the functionality of an equalization filter in addition to the panning functionality. Thus, possible timbre changes due to different BRIRs can be compensated for in addition to preserving spatial diversity by the panning approach.
As shown in
Some of the rules 400 may be designed so that the signal processing unit 420 implements an embodiment of the invention. Exemplary rules for mapping an input channel to one or more output channels are given in Table 1.
The labels used in table 1 for the respective channels are to be interpreted as follows: Characters “CH” stand for “Channel”. Character “M” stands for “horizontal listener plane”, i.e. an elevation angle of 0°. This is the plane in which loudspeakers are located in a normal 2D setup such as stereo or 5.1. Character “L” stands for a lower plane, i.e. an elevation angle<0°. Character “U” stands for a higher plane, i.e. an elevation angle>0°, such as 30° as an upper loudspeaker in a 3D setup. Character “T” stands for top channel, i.e. an elevation angle of 90°, which is also known as “voice of god” channel. Located after one of the labels M/L/U/T is a label for left (L) or right (R) followed by the azimuth angle. For example, CH_M_L030 and CH_M_R030 represent the left and right channel of a conventional stereo setup. The azimuth angle and the elevation angle for each channel are indicated in Table 1, except for the LFE channels and the last empty channel.
Table 1 shows a rules matrix in which one or more rules are associated with each input channel (source channel). As can be seen from Table 1, each rule defines one or more output channels (destination channels), which the input channel is to be mapped to. In addition, each rule defines gain value G in the third column thereof. Each rule further defines an EQ index indicating whether an equalization filter is to be applied or not and, if so, which specific equalization filter (EQ index 1 to 4) is to be applied. Mapping of the input channel to one output channel is performed with the gain G given in column 3 of Table 1. Mapping of the input channel to two output channels (indicated in the second column) is performed by applying panning between the two output channels, wherein panning gains g1 and g2 resulting from applying the panning law are additionally multiplied by the gain given by the respective rule (column three in Table 1). Special rules apply for the top channel. According to a first rule, the top channel is mapped to all output channels of the upper plane, indicated by ALL_U, and according to a second (less prioritized) rule, the top channel is mapped to all output channels of the horizontal listener plane, indicated by ALL_M.
When considering the rules indicated in Table 1, the rules defining mapping of channel CH_U_000 to left and right channels represent an implementation of an embodiment of the invention. In addition, the rules defining that equalization is to be applied represent implementations of embodiments of the invention.
As can be seen from Table 1, one of equalizer filters 1 to 4 is applied if an elevated input channel is mapped to one or more lower channels. Equalizer gain values GEQ may be determined as follows based on normalized center frequencies given in Table 2 and based on parameters given in Table 3.
GEQ consists of gain values per frequency band k and equalizer index e. Five predefined equalizers are combinations of different peak filters. As can be seen from Table 3, equalizers GEQ,1, GEQ,2 and GEQ,5 include a single peak filter, equalizer GEQ,3 includes three peak filters and equalizer GEQ,4 includes two peak filters. Each equalizer is a serial cascade of one or more peak filters and a gain:
where band(k) is the normalized center frequency of frequency band j, specified in Table 2, fs is the sampling frequency, and function peak( ) is for negative G
and otherwise
The parameters for the equalizers are specified in Table 3. In the above Equations 1 and 2, b is given by band(k)·fs/2, Q is given by PQ for the respective peak filter (1 to n), G is given by Pg for the respective peak filter, and f is given by Pf for the respective peak filter.
As an example, the equalizer gain values GEQ,4 for the equalizer having the index 4 are calculated with the filter parameters taken from the according row of Table 3. Table 3 lists two parameter sets for peak filters for GEQ,4, i.e. sets of parameters for n=1 and n=2. The parameters are the peak-frequency Pf in Hz, the peak filter quality factor PQ, the gain Pg (in dB) that is applied at the peak-frequency, and an overall gain g in dB that is applied to the cascade of the two peak filters (cascade of filters for parameters n=1 and n=2).
Thus
The equalizer definition as stated above defines zero-phase gains GEQ,4 independently for each frequency band k. Each band k is specified by its normalized center frequency band(k) where 0<=band<=1. Note that the normalized frequency band=1 corresponds to the unnormalized frequency fs/2, where fs denotes the sampling frequency. Therefore band (k)·fs/2 denotes the unnormalized center frequency of band k in Hz.
Thus, different equalizer filter that may be used in embodiments of the invention have been described. It is, however, clear that the description of these equalization filters is for illustrative purposes and that other equalization filters or decorrelation filters may be used in other embodiments.
Table 4 shows exemplary channels having associated therewith a respective azimuth angle and elevation angle.
In embodiments of the invention, panning between two destination channels may be achieved by applying tangent law amplitude panning. In panning a source channel to a first and second destination channel, a gain coefficient G1 is calculated for the first destination channel and a gain coefficient G2 is calculated for the second destination channel:
G1=(value of Gain column in Table 4)*g1, and
G2=(value of Gain column of Table 4)*g2.
Gains g1 and g2 are computed by applying tangent law amplitude panning in the following way:
In other embodiments, different panning laws may be applied.
In principle, embodiments of the invention aim at modeling a higher number of acoustic channels in the input channel configuration by means of changed channel mappings and signal modifications in the output channel configuration. Compared to straightforward approaches, which are often reported to be spatially more pressing, less diverse and less enveloping than the input channel configuration, the spatial diversity and overall listening experience may be improved and more enjoyable by employing embodiments of the invention.
In other words, in embodiments of the invention two or more input channels are mixed together in a downmixing application, wherein a processing module is applied to one of the input signals to preserve the different characteristics of the different transmission paths from the original input channels to the listener's ears. In embodiments of the invention, the processing module may involve filters that modify the signal characteristics, e.g. equalizing filters or decorrelation filters. Equalizing filters may in particular compensate for the loss of different timbres of input channels with different elevation assigned to them. In embodiments of the invention, the processing module may route at least one of the input signals to multiple output loudspeakers to generate a different transmission path to the listener, thus preserving spatial diversity of the input channels. In embodiments of the invention, filter and routing modifications may be applied separately or in combination. In embodiments of the invention, the processing module output may be reproduced over one or multiple loudspeakers.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus. In embodiments of the invention, the methods described herein are processor-implemented or computer-implemented.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a non-transitory storage medium such as a digital storage medium, for example a floppy disc, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may, for example, be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive method is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary.
A further embodiment of the invention method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example, via the internet.
A further embodiment comprises a processing means, for example, a computer or a programmable logic device, programmed to, configured to, or adapted to, perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
In some embodiments, a programmable logic device (for example, a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods may be performed by any hardware apparatus.
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which will be apparent to others skilled in the art and which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
13177360 | Jul 2013 | EP | regional |
13189243 | Oct 2013 | EP | regional |
This application is a continuation of copending U.S. patent application Ser. No. 16/178,228 filed Nov. 1, 2018, which is a continuation of U.S. patent application Ser. No. 15/002,094, filed Jan. 20, 2016 (U.S. Pat. No. 10,154,362 issued Dec. 11, 2018), which in turn is a continuation of copending International Application No. PCT/EP2014/065153, filed Jul. 15, 2014, which are both incorporated herein by reference in their entirety, and additionally claims priority from European Application No. 13177360.8, filed Jul. 22, 2013, and from European Application No. 13189243.2, filed Oct. 18, 2013, which are also incorporated herein by reference in their entirety.
Number | Name | Date | Kind |
---|---|---|---|
4308423 | Cohen | Dec 1981 | A |
4841573 | Fujita | Jun 1989 | A |
6128597 | Kolluru et al. | Oct 2000 | A |
6421446 | Cashion et al. | Jul 2002 | B1 |
8050434 | Kato et al. | Nov 2011 | B1 |
8086331 | Ikeda et al. | Dec 2011 | B2 |
8306233 | Sinton et al. | Nov 2012 | B2 |
8526484 | Sato | Sep 2013 | B2 |
8638959 | Hall | Jan 2014 | B1 |
20020006081 | Fujishita | Jan 2002 | A1 |
20040062401 | Davis | Apr 2004 | A1 |
20050157883 | Herre et al. | Jul 2005 | A1 |
20050276420 | Davis | Dec 2005 | A1 |
20060072764 | Mertens et al. | Apr 2006 | A1 |
20070011004 | Liebchen | Jan 2007 | A1 |
20070019812 | Kim | Jan 2007 | A1 |
20070080485 | Kerscher et al. | Apr 2007 | A1 |
20070255572 | Miyasaka et al. | Nov 2007 | A1 |
20070280485 | Villemoes | Dec 2007 | A1 |
20080221907 | Pang et al. | Sep 2008 | A1 |
20080279389 | Yoo et al. | Nov 2008 | A1 |
20080298610 | Virolainen et al. | Dec 2008 | A1 |
20090092259 | Jot et al. | Apr 2009 | A1 |
20090292544 | Virette et al. | Nov 2009 | A1 |
20100014692 | Schreiner et al. | Jan 2010 | A1 |
20100260483 | Strub | Oct 2010 | A1 |
20110013790 | Hilpert et al. | Jan 2011 | A1 |
20110103590 | Christoph | May 2011 | A1 |
20110135098 | Kuhr | Jun 2011 | A1 |
20110200197 | Kim et al. | Aug 2011 | A1 |
20110222693 | Lee et al. | Sep 2011 | A1 |
20110249819 | Davis | Oct 2011 | A1 |
20110255714 | Neusinger et al. | Oct 2011 | A1 |
20110255715 | Doh et al. | Oct 2011 | A1 |
20120051565 | Iwata et al. | Mar 2012 | A1 |
20120093322 | Lee | Apr 2012 | A1 |
20120093323 | Lee | Apr 2012 | A1 |
20120177204 | Hellmuth et al. | Jul 2012 | A1 |
20120209615 | Thesing et al. | Aug 2012 | A1 |
20120213375 | Mahabub et al. | Aug 2012 | A1 |
20120263307 | Armstrong et al. | Oct 2012 | A1 |
20120288124 | Fejzo et al. | Nov 2012 | A1 |
20130182853 | Chang et al. | Jul 2013 | A1 |
20130216070 | Keiler et al. | Aug 2013 | A1 |
20130259236 | Chon et al. | Oct 2013 | A1 |
20130272525 | Yoo et al. | Oct 2013 | A1 |
20140093101 | Lee et al. | Apr 2014 | A1 |
20140133683 | Robinson et al. | May 2014 | A1 |
20140233762 | Vilkamo et al. | Aug 2014 | A1 |
20150350804 | Crockett et al. | Dec 2015 | A1 |
Number | Date | Country |
---|---|---|
2013206557 | Jul 2013 | AU |
2494454 | Mar 2004 | CA |
1714598 | Dec 2005 | CN |
101010726 | Aug 2007 | CN |
101460997 | Jun 2009 | CN |
101669167 | Mar 2010 | CN |
102273233 | Dec 2011 | CN |
102547551 | Jul 2012 | CN |
102656627 | Sep 2012 | CN |
103210668 | Jul 2013 | CN |
2434491 | Mar 2012 | EP |
06128724 | May 1994 | JP |
08009499 | Jan 1996 | JP |
2003331532 | Nov 2003 | JP |
2005535266 | Nov 2005 | JP |
2009077379 | Apr 2009 | JP |
2009100144 | May 2009 | JP |
20110102660 | Sep 2011 | KR |
1020120038891 | Apr 2012 | KR |
2329548 | Jul 2008 | RU |
2330390 | Jul 2008 | RU |
2008140140 | Apr 2010 | RU |
2394283 | Jul 2010 | RU |
2406166 | Dec 2010 | RU |
2449388 | Apr 2012 | RU |
200803190 | Jan 2008 | TW |
200939208 | Sep 2009 | TW |
201034005 | Sep 2010 | TW |
201108204 | Mar 2011 | TW |
I342718 | May 2011 | TW |
201320059 | May 2013 | TW |
201329959 | Jul 2013 | TW |
8706090 | Oct 1987 | WO |
2009046460 | Apr 2009 | WO |
2010006719 | Jan 2010 | WO |
2010012478 | Feb 2010 | WO |
2011152044 | Dec 2011 | WO |
2012109019 | Aug 2012 | WO |
2012154823 | Nov 2012 | WO |
2013006338 | Jan 2013 | WO |
2014015299 | Jan 2014 | WO |
2014041067 | Mar 2014 | WO |
Entry |
---|
“Universal Mobile Telecommunications System (UMTS); Mandatory Speech Codec speech processing functions AMR Wideband speech codec; Transcoding functions”, ETSI TS 126 190 V5.1.0 (Dec. 2001); 3GPP TS 26.190 version 5.1.0 Release 5;Universal Mobile Telecommunications System (UMTS); Mandatory Speech Codec speech processing functions AMR Wideband speech codec; Transcoding functions (3GPP TS 26.190 version 5.1.0 Release 5), Dec. 2001, 55 pp. |
Ando, Akio, “Conversion of Multichannel Sound Signal Maintaining Physical Properties of Sound in Reproduced Sound Field”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, No. 6, pp. 1467-1474. |
Blauert, Jens, “Ein Neuartiges Prasenfilter”, Fernseh- und KinotechnikNr. 3. Retrieved from the Internet: URL:http://www.sengpielaudio.com/Blauert-Filter.pdf, pp. 75-78. |
Pulkki, Ville, “Virtual Sound Source Positioning Using Vector Base Amplitude Panning”, Journal of Audio Eng. Soc. vol. 45, No. 6., pp. 456-466. |
Number | Date | Country | |
---|---|---|---|
20200396557 A1 | Dec 2020 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16178228 | Nov 2018 | US |
Child | 16912228 | US | |
Parent | 15002094 | Jan 2016 | US |
Child | 16178228 | US | |
Parent | PCT/EP2014/065153 | Jul 2014 | US |
Child | 15002094 | US |