The present invention is related to audio encoding/decoding and, in particular, to spatial audio coding and spatial audio object coding.
Spatial audio coding tools are well-known in the art and are, for example, standardized in the MPEG-surround standard. Spatial audio coding starts from original input channels such as five or seven channels which are identified by their placement in a reproduction setup, i.e., a left channel, a center channel, a right channel, a left surround channel, a right surround channel and a low frequency enhancement channel. A spatial audio encoder typically derives one or more downmix channels from the original channels and, additionally, derives parametric data relating to spatial cues such as interchannel level differences in the channel coherence values, interchannel phase differences, interchannel time differences, etc. The one or more downmix channels are transmitted together with the parametric side information indicating the spatial cues to a spatial audio decoder which decodes the downmix channel and the associated parametric data in order to finally obtain output channels which are an approximated version of the original input channels. The placement of the channels in the output setup is typically fixed and is, for example, a 5.1 format, a 7.1 format, etc.
Additionally, spatial audio object coding tools are well-known in the art and are standardized in the MPEG SAOC standard (SAOC=spatial audio object coding). In contrast to spatial audio coding starting from original channels, spatial audio object coding starts from audio objects which are not automatically dedicated for a certain rendering reproduction setup. Instead, the placement of the audio objects in the reproduction scene is flexible and can be determined by the user by inputting certain rendering information into a spatial audio object coding decoder. Alternatively or additionally, rendering information, i.e., information at which position in the reproduction setup a certain audio object is to be placed typically over time can be transmitted as additional side information or metadata. In order to obtain a certain data compression, a number of audio objects are encoded by an SAOC encoder which calculates, from the input objects, one or more transport channels by downmixing the objects in accordance with certain downmixing information. Furthermore, the SAOC encoder calculates parametric side information representing inter-object cues such as object level differences (OLD), object coherence values, etc. As in SAC (SAC=Spatial Audio Coding), the inter object parametric data is calculated for individual time/frequency tiles, i.e., for a certain frame of the audio signal comprising, for example, 1024 or 2048 samples, 24, 32, or 64, etc., frequency bands are considered so that, in the end, parametric data exists for each frame and each frequency band. As an example, when an audio piece has 20 frames and when each frame is subdivided into 32 frequency bands, then the number of time/frequency tiles is 640.
Up to now no flexible technology exists combining channel coding on the one hand and object coding on the other hand so that acceptable audio qualities at low bit rates are obtained.
According to an embodiment, an audio decoder for decoding encoded audio data may have: an input interface configured for receiving the encoded audio data, the encoded audio data having either a plurality of encoded audio channels and a plurality of encoded audio objects and compressed metadata related to the plurality of audio objects, or a plurality of encoded audio channels without any encoded audio objects; a core decoder configured for decoding the plurality of encoded audio channels received by the input interface and the plurality of encoded audio objects received by the input interface to obtain a plurality of decoded audio channels and a plurality of decoded audio objects, when the encoded audio data has the plurality of encoded audio channels and the plurality of encoded audio objects and the compressed metadata related to the plurality of encoded audio objects, or decoding the plurality of encoded audio channels received by the input interface to obtain a plurality of decoded audio channels, when the encoded audio data has the plurality of encoded audio channels without any encoded audio objects; a metadata decompressor configured for decompressing the compressed metadata to obtain decompressed metadata, when the encoded audio data has the plurality of encoded audio channels and the plurality of encoded audio objects and the compressed metadata related to the plurality of encoded audio objects; an object processor configured for processing the plurality of decoded audio objects using the decompressed metadata and the plurality of decoded audio channels to obtain a number of output audio channels having audio data from the plurality of decoded audio objects and the plurality of decoded audio channels, when the encoded audio data has the plurality of encoded audio channels and the plurality of encoded audio objects and the compressed metadata related to the plurality of encoded audio objects; and a post-processor configured for converting the number of output audio channels into an output format, wherein the audio decoder is configured to either bypass the object processor and to feed the plurality of decoded audio channels as the output audio channels into the post-processor, when the encoded audio data has the plurality of encoded audio channels without any audio objects, or to feed the plurality of decoded audio objects and the plurality of decoded audio channels into the object processor, when the encoded audio data has the plurality of encoded audio channels and the plurality of encoded audio objects and the compressed metadata related to the plurality of encoded audio objects.
According to another embodiment, a method of decoding encoded audio data may have the steps of: receiving the encoded audio data, the encoded audio data having either a plurality of encoded audio channels and a plurality of encoded audio objects and compressed metadata related to the plurality of audio objects, or a plurality of encoded audio channels without any encoded audio objects; core decoding the encoded audio data to obtain a plurality of decoded audio channels and a plurality of decoded audio objects, when the encoded audio data has the plurality of encoded audio channels and the plurality of encoded audio objects and the compressed metadata related to the plurality of encoded audio objects, or the plurality of encoded audio channels to obtain a plurality of decoded audio channels, when the encoded audio data has the plurality of encoded audio channels without any encoded audio objects; decompressing the compressed metadata to obtain decompressed metadata, when the encoded audio data has the plurality of encoded audio channels and the plurality of encoded audio objects and the compressed metadata related to the plurality of encoded audio objects, processing the plurality of decoded audio objects using the decompressed metadata, and the plurality of decoded audio channels to obtain a number of output audio channels having audio data from the plurality of decoded audio objects and the plurality of decoded audio channels, when the encoded audio data has the plurality of encoded audio channels and the plurality of encoded audio objects and the compressed metadata related to the plurality of encoded audio objects; and converting the number of output audio channels into an output format, wherein, in the method of decoding the encoded audio data, either the processing the plurality of decoded audio objects is bypassed and the plurality of decoded audio channels obtained by the core decoding is fed, as the output audio channels, into the converting, when the encoded audio data has the plurality of encoded audio channels without any audio objects, or the plurality of decoded audio objects and the plurality of decoded audio channels obtained by the core decoding are fed into processing the plurality of decoded audio objects, when the encoded audio data has the plurality of encoded audio channels and the plurality of encoded audio objects and the compressed metadata related to the plurality of encoded audio objects.
Still another embodiment may have a non-transitory digital storage medium having stored thereon a computer program for performing a method of decoding encoded audio data having the steps of: receiving the encoded audio data, the encoded audio data having either a plurality of encoded audio channels and a plurality of encoded audio objects and compressed metadata related to the plurality of audio objects, or a plurality of encoded audio channels without any encoded audio objects; core decoding the encoded audio data to obtain a plurality of decoded audio channels and a plurality of decoded audio objects, when the encoded audio data has the plurality of encoded audio channels and the plurality of encoded audio objects and the compressed metadata related to the plurality of encoded audio objects, or the plurality of encoded audio channels to obtain a plurality of decoded audio channels, when the encoded audio data has the plurality of encoded audio channels without any encoded audio objects; decompressing the compressed metadata to obtain decompressed metadata, when the encoded audio data has the plurality of encoded audio channels and the plurality of encoded audio objects and the compressed metadata related to the plurality of encoded audio objects, processing the plurality of decoded audio objects using the decompressed metadata, and the plurality of decoded audio channels to obtain a number of output audio channels having audio data from the plurality of decoded audio objects and the plurality of decoded audio channels, when the encoded audio data has the plurality of encoded audio channels and the plurality of encoded audio objects and the compressed metadata related to the plurality of encoded audio objects; and converting the number of output audio channels into an output format, wherein, in the method of decoding the encoded audio data, either the processing the plurality of decoded audio objects is bypassed and the plurality of decoded audio channels obtained by the core decoding is fed, as the output audio channels, into the converting, when the encoded audio data has the plurality of encoded audio channels without any audio objects, or the plurality of decoded audio objects and the plurality of decoded audio channels obtained by the core decoding are fed into processing the plurality of decoded audio objects, when the encoded audio data has the plurality of encoded audio channels and the plurality of encoded audio objects and the compressed metadata related to the plurality of encoded audio objects, when said computer program is run by a computer.
The present invention is based on the finding that, for an optimum system being flexible on the one hand and providing a good compression efficiency at a good audio quality on the other hand is achieved by combining spatial audio coding, i.e., channel-based audio coding with spatial audio object coding, i.e., object based coding. In particular, providing a mixer for mixing the objects and the channels already on the encoder-side provides a good flexibility, particularly for low bit rate applications, since any object transmission can then be unnecessary or the number of objects to be transmitted can be reduced. On the other hand, flexibility may be useful so that the audio encoder can be controlled in two different modes, i.e., in the mode in which the objects are mixed with the channels before being core-encoded, while in the other mode the object data on the one hand and the channel data on the other hand are directly core-encoded without any mixing in between.
This makes sure that the user can either separate the processed objects and channels on the encoder-side so that a full flexibility is available on the decoder side but, at the price of an enhanced bit rate. On the other hand, when bit rate requirements are more stringent, then the present invention already allows to perform a mixing/pre-rendering on the encoder-side, i.e., that some or all audio objects are already mixed with the channels so that the core encoder only encodes channel data and any bits that may be used for transmitting audio object data either in the form of a downmix or in the form of parametric inter object data are not required.
On the decoder-side, the user has again high flexibility due to the fact that the same audio decoder allows the operation in two different modes, i.e., the first mode where individual or separate channel and object coding takes place and the decoder has the full flexibility to rendering the objects and mixing with the channel data. On the other hand, when a mixing/pre-rendering has already taken place on the encoder-side, the decoder is configured to perform a post processing without any intermediate object processing. On the other hand, the post processing can also be applied to the data in the other mode, i.e., when the object rendering/mixing takes place on the decoder-side. Thus, the present invention allows a framework of processing tasks which allows a great re-use of resources not only on the encoder side but also on the decoder side. The post-processing may refer to downmixing and binauralizing or any other processing to obtain a final channel scenario such as an intended reproduction layout.
Furthermore, in case of very low bit rate requirements, the present invention provides the user with enough flexibility to react to the low bit rate requirements, i.e., by pre-rendering on the encoder-side so that, for the price of some flexibility, nevertheless very good audio quality on the decoder-side is obtained due to the fact that the bits which have been saved by not providing any object data anymore from the encoder to the decoder can be used for better encoding the channel data such as by finer quantizing the channel data or by other means for improving the quality or for reducing the encoding loss when enough bits are available.
In a embodiment of the present invention, the encoder additionally comprises an SAOC encoder and furthermore allows to not only encode objects input into the encoder but to also SAOC encode channel data in order to obtain a good audio quality at even lower bit rates that may be used. Further embodiments of the present invention allow a post processing functionality which comprises a binaural renderer and/or a format converter. Furthermore, it is advantageous that the whole processing on the decoder side already takes place for a certain high number of loud speakers such as a 22 or 32 channel loudspeaker setup. However, then the format converter, for example, determines that only a 5.1 output, i.e., an output for a reproduction layout may be used which has a lower number than the maximum number of channels, then it is advantageous that the format converter controls either the USAC decoder or the SAOC decoder or both devices to restrict the core decoding operation and the SAOC decoding operation so that any channels which are, in the end, nevertheless down mixed into a format conversion are not generated in the decoding. Typically, the generation of upmixed channels may use decorrelation processing and each decorrelation processing introduces some level of artifacts. Therefore, by controlling the core decoder and/or the SAOC decoder by the output format that may finally be used, a great deal of additional decorrelation processing is saved compared to a situation when this interaction does not exist which not only results in an improved audio quality but also results in a reduced complexity of the decoder and, in the end, in a reduced power consumption which is particularly useful for mobile devices housing the inventive encoder or the inventive decoder. The inventive encoders/decoders, however, cannot only be introduced in mobile devices such as mobile phones, smart phones, notebook computers or navigation devices but can also be used in straightforward desktop computers or any other non-mobile appliances.
The above implementation, i.e. to not generate some channels, may be not optimum, since some information may be lost (such as the level difference between the channels that will be downmixed). This level difference information may not be critical, but may result in a different downmix output signal, if the downmix applies different downmix gains to the upmixed channels. An improved solution only switches off the decorrelation in the upmix, but still generates all upmix channels with correct level differences (as signalled by the parametric SAC). The second solution results in a better audio quality, but the first solution results in greater complexity reduction.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
Furthermore, the encoder comprises a core encoder 300 for core encoding core encoder input data, a metadata compressor 400 for compressing the metadata related to the one or more of the plurality of audio objects. Furthermore, the encoder can comprise a mode controller 600 for controlling the mixer, the core encoder and/or an output interface 500 in one of several operation modes, wherein in the first mode, the core encoder is configured to encode the plurality of audio channels and the plurality of audio objects received by the input interface 100 without any interaction by the mixer, i.e., without any mixing by the mixer 200. In a second mode, however, in which the mixer 200 was active, the core encoder encodes the plurality of mixed channels, i.e., the output generated by block 200. In this latter case, it is advantageous to not encode any object data anymore. Instead, the metadata indicating positions of the audio objects are already used by the mixer 200 to render the objects onto the channels as indicated by the metadata. In other words, the mixer 200 uses the metadata related to the plurality of audio objects to pre-render the audio objects and then the pre-rendered audio objects are mixed with the channels to obtain mixed channels at the output of the mixer. In this embodiment, any objects may not necessarily be transmitted and this also applies for compressed metadata as output by block 400. However, if not all objects input into the interface 100 are mixed but only a certain amount of objects is mixed, then only the remaining non-mixed objects and the associated metadata nevertheless are transmitted to the core encoder 300 or the metadata compressor 400, respectively.
Furthermore, as illustrated in
The
In addition to the first and the second modes as discussed in the context of
Finally, the SAOC encoder 800 can encode, when the encoder is configured in the fourth mode, the channels plus pre-rendered objects as generated by the pre-renderer/mixer. Thus, in the fourth mode the lowest bit rate applications will provide good quality due to the fact that the channels and objects have completely been transformed into individual SAOC transport channels and associated side information as indicated in
The decoder comprises a metadata decompressor 1400, a core decoder 1300, an object processor 1200, a mode controller 1600 and a postprocessor 1700.
Specifically, the audio decoder is configured for decoding encoded audio data and the input interface is configured for receiving the encoded audio data, the encoded audio data comprising a plurality of encoded channels and the plurality of encoded objects and compressed metadata related to the plurality of objects in a certain mode.
Furthermore, the core decoder 1300 is configured for decoding the plurality of encoded channels and the plurality of encoded objects and, additionally, the metadata decompressor is configured for decompressing the compressed metadata.
Furthermore, the object processor 1200 is configured for processing the plurality of decoded objects as generated by the core decoder 1300 using the decompressed metadata to obtain a predetermined number of output channels comprising object data and the decoded channels. These output channels as indicated at 1205 are then input into a postprocessor 1700. The postprocessor 1700 is configured for converting the number of output channels 1205 into a certain output format which can be a binaural output format or a loudspeaker output format such as a 5.1, 7.1, etc., output format.
Advantageously, the decoder comprises a mode controller 1600 which is configured for analyzing the encoded data to detect a mode indication. Therefore, the mode controller 1600 is connected to the input interface 1100 in
Advantageously, the indication whether mode 1 or mode 2 is to be applied is included in the encoded audio data and then the mode controller 1600 analyses the encoded data to detect a mode indication. Mode 1 is used when the mode indication indicates that the encoded audio data comprises encoded channels and encoded objects and mode 2 is applied when the mode indication indicates that the encoded audio data does not contain any audio objects, i.e., only contain pre-rendered channels obtained by mode 2 of the
Furthermore, the postprocessor 1700 can be implemented as a binaural renderer 1710 or a format converter 1720. Alternatively, a direct output of data 1205 of
In a embodiment of the present invention, the object processor 1200 comprises the SAOC decoder 1800 and the SAOC decoder is configured for decoding one or more transport channels output by the core decoder and associated parametric data and using decompressed metadata to obtain the plurality of rendered audio objects. To this end, the OAM output is connected to box 1800.
Furthermore, the object processor 1200 is configured to render decoded objects output by the core decoder which are not encoded in SAOC transport channels but which are individually encoded in typically single channeled elements as indicated by the object renderer 1210.
Furthermore, the decoder comprises an output interface corresponding to the output 1730 for outputting an output of the mixer to the loudspeakers.
In a further embodiment, the object processor 1200 comprises a spatial audio object coding decoder 1800 for decoding one or more transport channels and associated parametric side information representing encoded audio objects or encoded audio channels, wherein the spatial audio object coding decoder is configured to transcode the associated parametric information and the decompressed metadata into transcoded parametric side information usable for directly rendering the output format, as for example defined in an earlier version of SAOC. The postprocessor 1700 is configured for calculating audio channels of the output format using the decoded transport channels and the transcoded parametric side information. The processing performed by the post processor can be similar to the MPEG Surround processing or can be any other processing such as BCC processing or so.
In a further embodiment, the object processor 1200 comprises a spatial audio object coding decoder 1800 configured to directly upmix and render channel signals for the output format using the decoded (by the core decoder) transport channels and the parametric side information
Furthermore, and importantly, the object processor 1200 of
The mixer 1220 is connected to the output interface 1730, the binaural renderer 1710 and the format converter 1720. The binaural renderer 1710 is configured for rendering the output channels into two binaural channels using head related transfer functions or binaural room impulse responses (BRIR). The format converter 1720 is configured for converting the output channels into an output format having a lower number of channels than the output channels 1205 of the mixer and the format converter 1720 may use information on the reproduction layout such as 5.1 speakers or so.
The
Furthermore, a vector base amplitude panning (VBAP) stage 1810 is configured which receives, from the SAOC decoder, information on the reproduction layout and which outputs a rendering matrix to the SAOC decoder so that the SAOC decoder can, in the end, provide rendered channels without any further operation of the mixer in the high channel format of 1205, i.e., 32 loudspeakers.
the VBAP block advantageously receives the decoded OAM data to derive the rendering matrices. More general, it may use geometric information not only of the reproduction layout but also of the positions where the input signals should be rendered to on the reproduction layout. This geometric input data can be OAM data for objects or channel position information for channels that have been transmitted using SAOC.
However, if only a specific output interface may be used then the VBAP state 1810 can already provide the rendering matrix that may be used for the e.g., 5.1 output. The SAOC decoder 1800 then performs a direct rendering from the SAOC transport channels, the associated parametric data and decompressed metadata, a direct rendering into the output format that may be used without any interaction of the mixer 1220. However, when a certain mix between modes is applied, i.e., where several channels are SAOC encoded but not all channels are SAOC encoded or where several objects are SAOC encoded but not all objects are SAOC encoded or when only a certain amount of pre-rendered objects with channels are SAOC decoded and remaining channels are not SAOC processed then the mixer will put together the data from the individual input portions, i.e., directly from the core decoder 1300, from the object renderer 1210 and from the SAOC decoder 1800.
Subsequently,
In accordance with the first coding mode, the mixer 200 in the
In the second mode, the mixer 200 in
Then, in the third coding mode, the SAOC encoder of
In a fourth coding mode as illustrated in
Furthermore, a fifth coding mode exists which can by any mix of modes 1 to 4. In particular, a mix coding mode will exist when the mixer 1220 in
Each input portion of the mixer 1220 can then, exemplarily, have at least a potential for receiving the number of channels such as 32 as indicated at 1205. Thus, basically, the mixer could receive 32 channels from the USAC decoder and, additionally, 32 pre-rendered/mixed channels from the USAC decoder and, additionally, 32 “channels” from the object renderer and, additionally, 32 “channels” from the SAOC decoder, where each “channel” between blocks 1210 and 1218 on the one hand and block 1220 on the other hand has a contribution of the corresponding objects in a corresponding loudspeaker channel and then the mixer 1220 mixes, i.e., adds up the individual contributions for each loudspeaker channel.
In a embodiment of the present invention, the encoding/decoding system is based on an MPEG-D USAC codec for coding of channel and object signals. To increase the efficiency for coding a large amount of objects, MPEG SAOC technology has been adapted. Three types of renderers perform the task of rendering objects to channels, rendering channels to headphones or rendering channels to a different loudspeaker setup. When object signals are explicitly transmitted or parametrically encoded using SAOC, the corresponding object metadata information is compressed and multiplexed into the encoded output data.
In an embodiment, the pre-renderer/mixer 200 is used to convert a channel plus object input scene into a channel scene before encoding. Functionally, it is identical to the object renderer/mixer combination on the decoder side as illustrated in
As a core/encoder/decoder for loudspeaker channel signals, discrete object signals, object downmix signals and pre-rendered signals, a USAC technology is advantageous. It handles the coding of the multitude of signals by creating channel and object mapping information (the geometric and semantic information of the input channel and object assignment). This mapping information describes how input channels and objects are mapped to USAC channel elements as illustrated in
The coding of objects is possible in different ways, depending on the rate/distortion requirements and the interactivity requirements for the renderer. The following object coding variants are possible:
The SAOC encoder and decoder for object signals are based on MPEG SAOC technology. The system is capable of recreating, modifying and rendering a number of audio objects based on a smaller number of transmitted channels and additional parametric data (OLDs, IOCs (Inter Object Coherence), DMGs (Down Mix Gains)). The additional parametric data exhibits a significantly lower data rate than that may be used for transmitting all objects individually, making the coding very efficient.
The SAOC encoder takes as input the object/channel signals as monophonic waveforms and outputs the parametric information (which is packed into the 3D-Audio bitstream) and the SAOC transport channels (which are encoded using single channel elements and transmitted).
The SAOC decoder reconstructs the object/channel signals from the decoded SAOC transport channels and parametric information, and generates the output audio scene based on the reproduction layout, the decompressed object metadata information and optionally on the user interaction information.
For each object, the associated metadata that specifies the geometrical position and volume of the object in 3D space is efficiently coded by quantization of the object properties in time and space. The compressed object metadata cOAM is transmitted to the receiver as side information. The volume of the object may comprise information on a spatial extent and/or information of the signal level of the audio signal of this audio object.
The object renderer utilizes the compressed object metadata to generate object waveforms according to the given reproduction format. Each object is rendered to certain output channels according to its metadata. The output of this block results from the sum of the partial results.
If both channel based content as well as discrete/parametric objects are decoded, the channel based waveforms and the rendered object waveforms are mixed before outputting the resulting waveforms (or before feeding them to a postprocessor module like the binaural renderer or the loudspeaker renderer module).
The binaural renderer module produces a binaural downmix of the multichannel audio material, such that each input channel is represented by a virtual sound source. The processing is conducted frame-wise in QMF (Quadrature Mirror Filterbank) domain.
The binauralization is based on measured binaural room impulse responses
As illustrated in the context of
Advantageously, the “shortcut” as illustrated by control line 1727 comprises controlling the decoder 1300 to decode to a lower number of channels, i.e., skipping the complete OTT processing block in the decoder or a format converting to a lower number of channels and, as illustrated in
In a further embodiment, an efficient interfacing between processing blocks may be used. Particularly in
Subsequently, reference is made to
Furthermore, it is advantageous to perform an enhanced noise filling procedure to enable uncompromised full-band (18 kHz) coding at 1200 kbps.
The encoder has been operated in a ‘constant rate with bit-reservoir’ fashion, using a maximum of 6144 bits per channel as rate buffer for the dynamic data.
All additional payloads like SAOC data or object metadata have been passed through extension elements and have been considered in the encoder's rate control.
In order to take advantage of the SAOC functionalities also for 3D audio content, the following extensions to MPEG SAOC have been implemented:
The binaural renderer module produces a binaural downmix of the multichannel audio material, such that each input channel (excluding the LFE channels) is represented by a virtual sound source. The processing is conducted frame-wise in QMF domain.
The binauralization is based on measured binaural room impulse responses. The direct sound and early reflections are imprinted to the audio material via a convolutional approach in a pseudo-FFT domain using a fast convolution on-top of the QMF domain.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a non-transitory storage medium such as a digital storage medium, for example a floppy disc, a DVD, a Blu-Ray, a CD, a ROM, a PROM, and EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may, for example, be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive method is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary.
A further embodiment of the invention method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example, via the internet.
A further embodiment comprises a processing means, for example, a computer or a programmable logic device, configured to, or adapted to, perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
In some embodiments, a programmable logic device (for example, a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
13177378 | Jul 2013 | EP | regional |
This application is a continuation of copending U.S. patent application Ser. No. 16/277,851, filed Feb. 15, 2019, which in turn is continuation of copending U.S. patent application Ser. No. 15/002,148 filed Jan. 20, 2016, which is a continuation of International Application No. PCT/EP2014/065289, filed Jul. 16, 2014, which is incorporated herein by reference in its entirety, and additionally claims priority from European Application No. EP 13177378.0, filed Jul. 22, 2013, which is also incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
2605361 | Cutler | Jul 1952 | A |
7979282 | Kim et al. | Jul 2011 | B2 |
8255212 | Villemoes | Aug 2012 | B2 |
8417531 | Kim et al. | Apr 2013 | B2 |
8504184 | Ishikawa et al. | Aug 2013 | B2 |
8504377 | Oh et al. | Aug 2013 | B2 |
8798776 | Schildbach et al. | Aug 2014 | B2 |
8824688 | Schreiner et al. | Sep 2014 | B2 |
9167346 | Kraemer et al. | Oct 2015 | B2 |
9530421 | Jot et al. | Dec 2016 | B2 |
9788136 | Borss et al. | Oct 2017 | B2 |
20040028125 | Sato | Feb 2004 | A1 |
20060083385 | Allamanche et al. | Apr 2006 | A1 |
20060136229 | Kjoerling et al. | Jun 2006 | A1 |
20060165184 | Purnhagen et al. | Jul 2006 | A1 |
20070063877 | Shmunk et al. | Mar 2007 | A1 |
20070121954 | Kim et al. | May 2007 | A1 |
20070280485 | Villemoes | Dec 2007 | A1 |
20070291951 | Faller | Dec 2007 | A1 |
20080234845 | Malvar et al. | Sep 2008 | A1 |
20090006103 | Koishida et al. | Jan 2009 | A1 |
20090043591 | Breebaart | Feb 2009 | A1 |
20090125313 | Hellmuth et al. | May 2009 | A1 |
20090125314 | Helimuth | May 2009 | A1 |
20090210239 | Yoon et al. | Aug 2009 | A1 |
20090271015 | Oh et al. | Oct 2009 | A1 |
20090278995 | Oh | Nov 2009 | A1 |
20090326958 | Kim et al. | Dec 2009 | A1 |
20100014680 | Oh et al. | Jan 2010 | A1 |
20100017195 | Villemoes | Jan 2010 | A1 |
20100083344 | Schildbach et al. | Apr 2010 | A1 |
20100094631 | Engdegard | Apr 2010 | A1 |
20100121647 | Beack et al. | May 2010 | A1 |
20100135510 | Yoo | Jun 2010 | A1 |
20100153097 | Hotho et al. | Jun 2010 | A1 |
20100153118 | Hotho et al. | Jun 2010 | A1 |
20100174548 | Beack et al. | Jul 2010 | A1 |
20100191354 | Oh et al. | Jul 2010 | A1 |
20100202620 | Oh et al. | Aug 2010 | A1 |
20100211400 | Oh et al. | Aug 2010 | A1 |
20100226500 | Wang et al. | Sep 2010 | A1 |
20100262420 | Herre et al. | Oct 2010 | A1 |
20100310081 | Lien | Dec 2010 | A1 |
20100324915 | Seo et al. | Dec 2010 | A1 |
20110022402 | Engdegard | Jan 2011 | A1 |
20110029113 | Ishikawa et al. | Feb 2011 | A1 |
20110173006 | Nagel et al. | Jul 2011 | A1 |
20110182432 | Ishikawa et al. | Jul 2011 | A1 |
20110200198 | Grill et al. | Aug 2011 | A1 |
20110202355 | Grill | Aug 2011 | A1 |
20110238425 | Neuendorf | Sep 2011 | A1 |
20110293025 | Mudulodu et al. | Dec 2011 | A1 |
20110305344 | Sole et al. | Dec 2011 | A1 |
20120002818 | Heiko | Jan 2012 | A1 |
20120057715 | Johnston et al. | Mar 2012 | A1 |
20120062700 | Antonellis et al. | Mar 2012 | A1 |
20120093213 | Moriya et al. | Apr 2012 | A1 |
20120143613 | Herre et al. | Jun 2012 | A1 |
20120183162 | Chabanne et al. | Jul 2012 | A1 |
20120230497 | Dressler et al. | Sep 2012 | A1 |
20120243690 | Engdegard et al. | Sep 2012 | A1 |
20120269353 | Herre et al. | Oct 2012 | A1 |
20120294449 | Beack et al. | Nov 2012 | A1 |
20120308049 | Schreiner et al. | Dec 2012 | A1 |
20120314875 | Lee | Dec 2012 | A1 |
20120323584 | Koishida et al. | Dec 2012 | A1 |
20130013321 | Oh et al. | Jan 2013 | A1 |
20130110523 | Beack et al. | May 2013 | A1 |
20130132098 | Beack et al. | May 2013 | A1 |
20130246077 | Riedmiller et al. | Sep 2013 | A1 |
20140133682 | Chabanne et al. | May 2014 | A1 |
20140133683 | Robinson | May 2014 | A1 |
20140257824 | Taleb et al. | Sep 2014 | A1 |
20140350944 | Jot | Nov 2014 | A1 |
20160111099 | Hirvonen et al. | Apr 2016 | A1 |
20160133267 | Adami et al. | May 2016 | A1 |
Number | Date | Country |
---|---|---|
1969317 | May 2007 | CN |
101151660 | Mar 2008 | CN |
101288115 | Oct 2008 | CN |
101529501 | Sep 2009 | CN |
101542595 | Sep 2009 | CN |
101542596 | Sep 2009 | CN |
101542597 | Sep 2009 | CN |
101553865 | Oct 2009 | CN |
101617360 | Dec 2009 | CN |
101632118 | Jan 2010 | CN |
101689368 | Mar 2010 | CN |
101743586 | Jun 2010 | CN |
101809654 | Aug 2010 | CN |
101821799 | Sep 2010 | CN |
101849257 | Sep 2010 | CN |
101884227 | Nov 2010 | CN |
101926181 | Dec 2010 | CN |
101930741 | Dec 2010 | CN |
102016981 | Apr 2011 | CN |
102016982 | Apr 2011 | CN |
102089816 | Jun 2011 | CN |
102099856 | Jun 2011 | CN |
102100088 | Jun 2011 | CN |
102123341 | Jul 2011 | CN |
102124517 | Jul 2011 | CN |
102165520 | Aug 2011 | CN |
102171754 | Aug 2011 | CN |
102171755 | Aug 2011 | CN |
102239520 | Nov 2011 | CN |
102387005 | Mar 2012 | CN |
102388417 | Mar 2012 | CN |
102449689 | May 2012 | CN |
102576532 | Jul 2012 | CN |
102640213 | Aug 2012 | CN |
102768836 | Nov 2012 | CN |
102883257 | Jan 2013 | CN |
102892070 | Jan 2013 | CN |
102931969 | Feb 2013 | CN |
102100088 | Oct 2013 | CN |
105612577 | Oct 2019 | CN |
2137726 | Dec 2009 | EP |
2137824 | Dec 2009 | EP |
2146522 | Jan 2010 | EP |
2194527 | Jun 2010 | EP |
2209328 | Jul 2010 | EP |
2479750 | Jul 2012 | EP |
2560161 | Feb 2013 | EP |
2417866 | Mar 2006 | GB |
2010521013 | Jun 2010 | JP |
2010525403 | Jul 2010 | JP |
2013506164 | Feb 2013 | JP |
20080029940 | Apr 2008 | KR |
20110002489 | Jan 2011 | KR |
2339088 | Nov 2008 | RU |
2406166 | Dec 2010 | RU |
2411594 | Feb 2011 | RU |
2439719 | Jan 2012 | RU |
2449387 | Apr 2012 | RU |
2483364 | May 2013 | RU |
200813981 | Mar 2008 | TW |
200828269 | Jul 2008 | TW |
201010450 | Mar 2010 | TW |
201027517 | Jul 2010 | TW |
2006048204 | May 2006 | WO |
2008039042 | Apr 2008 | WO |
2008046531 | Apr 2008 | WO |
2008078973 | Jul 2008 | WO |
2008111770 | Sep 2008 | WO |
2008111773 | Sep 2008 | WO |
2008114982 | Sep 2008 | WO |
2008131903 | Nov 2008 | WO |
2009049895 | Apr 2009 | WO |
2009049896 | Apr 2009 | WO |
2010076040 | Jul 2010 | WO |
2010105695 | Sep 2010 | WO |
2010149700 | Dec 2010 | WO |
2011020067 | Feb 2011 | WO |
2011039195 | Apr 2011 | WO |
2011102967 | Aug 2011 | WO |
2012072804 | Jun 2012 | WO |
2012075246 | Jun 2012 | WO |
2012125855 | Sep 2012 | WO |
2012125855 | Sep 2012 | WO |
WO-2012125855 | Sep 2012 | WO |
2013006325 | Jan 2013 | WO |
2013006325 | Jan 2013 | WO |
2013006338 | Jan 2013 | WO |
2013024085 | Feb 2013 | WO |
2013064957 | May 2013 | WO |
2013075753 | May 2013 | WO |
2013006330 | Jul 2013 | WO |
Entry |
---|
“Information technology—Generic Coding of Moving Pictures and Associated Audio Information”, ISO/IEC 13818-7, MPEG-2 AAC 3rd edition, ISO/IEC JTC1/SC29/WG11 N6428, Mar. 2004, Mar. 2004, pp. 1-206. |
Breebaart, Jeroen , et al., “Spatial Audio Object Coding (SAOC)—The Upcoming MPEG Standard on Parametric Object Based Audio Coding”, AES Convention 124; May 2008, AES, 60 East 42nd Street, Room 2520 New York 10165-2520, USA, pp. 1-15. |
Engdegard, Jonas , et al., “Spatial Audio Object Coding (SAOC)—The Upcoming MPEG Standard on Parametric Object Based Audio Coding”, 124th AES Convention, Audio Engineering Society, Paper 7377, pp. 1-15, (2008). |
Geier, Matthias , et al., “Object-based Audio Reproduction and the Audio Scene Description Format”, Organised Sound, vol. 15, No. 3, pp. 219-227. |
Herre, J. , et al., “The Reference Model Architecture for MPEG Spatial Audio Coding”, AES 118th Convention, Convention paper 6447, Barcelona, Spain, 13 pp., (2005). |
Herre, Jurgen , et al., “MPEG Spatial Audio Object Coding—The ISO/MPEG Standard for Efficient Coding of Interactive Audio Scenes”, J. Audio Eng. Soc. vol. 60, No. 9, pp. 655-673, (2012). |
ISO/IEC , “MPEG audio technologies—Part 2: Spatial Audio Object Coding (SAOC)”, ISO/IEC JTC1/SC29/WG11 (MPEG) International Standard 23003-2., Oct. 1, 2010, pp. 1-130. |
ISO/IEC 14496-3 , “Information technology—Coding of audio-visual objects/ Part 3: Audio”, ISO/IEC 2009, 2009, 1416 pp. |
Peters, Nils , et al., “SpatDIF: Principles, Specification, and Examples”, Proceedings of the 9th Sound and Music Computing Conference, Copenhagen, Denmark, pp. SMC2012-500-SMC2012-505, (2012). |
Peters, Nils , et al., “The Spatial Sound Description Interchange Format: Principles, Specification, and Examples”, Computer Music Journal, 37:1, XP055137982, DOI: 10.1162/COMJ_a_00167, Retrieved from the Internet: URL:http://www.mitpressjournals.org/doi/pdfplus/10.1162/COMJ_a_00167 [retrieved on Sep. 3, 2014], pp. 1-22. |
Pulkki, Ville “Virtual Sound Source Positioning Using Vector Base Amplitude Panning”, Journal of Audio Eng. Soc. vol. 45, No. 6., pp. 456-466. |
Ramer, Urs , “An Iterative Procedure”, Computer Graphics and Image, vol. 1, pp. 244-256. |
Wright, Matthew , et al., “Open SoundControl: A New Protocol for Communicating with Sound Synthesizers”, Proceedings of the 1997 International Computer Music Conference, vol. 2013, No. 8, 5 pp. |
“Extensible Markup Language (XML) 1.0 (Fifth Edition)”, World Wide Web Consortium [online], http://www.w3.org/TR/2008/REC-XML-20081126/ (printout of internet site on Jun. 23, 2016), Nov. 26, 2008, 35 pp. |
“Information technology—Generic Coding of Moving Pictures and Associated Audio Information”, ISO/IEC 13818-7, MPEG-2 AAC 3rd edition, ISO/IEC JTC1/SC29/WG11 N6428, Mar. 2004, pp. 1-206. |
“Information technology—Generic coding of moving pictures and associated audio information—Part 7: Advanced Audio Coding (AAC)”, ISO/IEC 13818-7, Part 7 MPEG-2AAC, Aug. 2003, 198 pp. |
“Information technology—Generic coding of moving pictures and associated audio information—Part 7: Advanced Audio Coding (AAC)”, ISO/IEC 13818-7:2004(E), Third edition, Oct. 15, 2004, 206 pp. |
“Information technology—MPEG audio technologies—Part 3: Unified speech and audio coding”, ISO/IEC FDIS 23003-3:2011(E), Sep. 20, 2011, 291 pp. |
“International Standard ISO/IEC 14772-1:1997—The Virtual Reality Modeling Language (VRML), Part 1: Functional specification and UTF-8 encoding”, http://tecfa.unige.ch/guides/vrml/vrml97/spec/, 1997, 2 pp. |
“IT—Generic Coding of Moving Pictures and Associated Audio Information”, ISO/IEC 13818-7. MPEG-2 AAC 3rd edition. ISO/IEC JTC1/SC29/WG11 N6428., Mar. 2004, pp. 1-206. |
“Synchronized Multimedia Integration Language (SMIL 3.0)”, URL: http://www.w3.org/TR/2008/REC-SMIL3-20081201/, Dec. 2008, 200 pp. |
Chen, Chung Yuan, et al., “Dynamic Light Scattering of poly (vinyl alcohol)—borax aqueous solution near overlap concentration”, Polymer Papers, vol. 38, No. 9., Elsevier Science Ltd., XP4058593A, 1997, pp. 2019-2025. |
Douglas, David H, et al., “Algorithms for the Reduction of the Number of Points Required to Represent a Digitized Line or its Caricature”, Cartographica: The International Journal for Geographic Information and Geovisualization 10.2, 1973, pp. 112-122. |
Engdegard, et al., “Spatial Audio Object Coding (SAOC)—The Upcoming MPEG Standard on Parametric Object Based Audio Coding”, Convention paper 7377, Presented at the 124th Convention May 17-20, 2008; Amsterdam, The Netherlands, XP-002541458, May 2008. |
Engdegard, Jonas, et al., “Spatial Audio Object Coding (SAOC)—The Upcoming MPEG Standard on Parametric Object Based Audio Coding”, 124th AES Convention, Audio Engineering Society, Paper 7377, May 2008, pp. 1-15. |
Geier, Matthias, et al., “Object-based Audio Reproduction and the Audio Scene Description Format”, Organised Sound, vol. 15, No. 3, pp. 219-227; Dec. 2010. |
Herre, J., et al., “The Reference Model Architecture for MPEG Spatial Audio Coding”, AES 118th Convention, Convention paper 6447, Barcelona, Spain, May 2005, 13 pp. |
Herre, Jurgen, et al., “From SAC To SAOC—Recent Developments in Parametric Coding of Spatial Audio”, Fraunhofer Institute for Integrated Circuits, Illusions in Sound, AES 22nd UK Conference 2007, Apr. 2007, pp. 12-1-12-8. |
Herre, Jurgen, et al., “MPEG Spatial Audio Object Coding—The ISO/MPEG Standard for Efficient Coding of Interactive Audio Scenes”, J. Audio Eng. Soc. vol. 60, No. 9, Sep. 2012, pp. 655-673. |
Herre, Jurgen, et al., “MPEG Surround—the ISO/MPEG Standard for Efficient and Compatible Multi-Channel Audio Coding”, AES Convention 122, Convention Paper 7084, XP040508156, New York, May 1, 2007, 24 pp. |
Herre, Jurgen, et al., “New Concepts in Parametric Coding of Spatial Audio: From SAC to SAOC”, IEEE International Conference on Multimedia and Expo; ISBN 978-1-4244-1016-3, Jul. 2007, pp. 1894-1897. |
ISO/IEC 14496-3, “Information technology—Coding of audio-visual objects, Part 3 Audio”, Proof Reference No. ISO/IEC 14496-3:2009(E), Fourth Edition, 2009, 1416 pp. |
ITU-T, “Information technology—Generic coding of moving pictures and associated audio information: Systems”, Series H: Audiovisual and Multimedia Systems; ITU-T Recommendation H.222.0, May 2012, 234 pp. |
Peters, Nils, et al., “SpatDIF: Principles, Specification, and Examples”, Jun. 28, 2013, 6 pp. |
Peters, Nils, et al., “SpatDIF: Principles, Specification, and Examples”, Proceedings of the 9th Sound and Music Computing Conference, Copenhagen, Denmark, pp. SMC2012-500-SMC2012-505, Jul. 2012. |
Peters, Nils, et al., “The Spatial Sound Description Interchange Format: Principles, Specification, and Examples”, Computer Music Journal, 37:1, XP055137982, DOI: 10.1162/COMJ_a_00167, Retrieved from the Internet: URL: http://www.mitpressjournals.org/doi/pdfplus/10.1162/COMJ_a_00167 [retrieved on Sep. 3, 2014], May 2013, pp. 1-22. |
Pulkki, Ville, “Virtual Sound Source Positioning Using Vector Base Amplitude Panning”, Journal of Audio Eng. Soc. vol. 45, No. 6., Jun. 1997, pp. 456-466. |
Ramer, Urs, “An Iterative Procedure”, Computer Graphics and Image, vol. 1, 1972, pp. 244-256. |
Schmidt, Jurgen, et al., “New and Advanced Features for Audio Presentation in the MPEG-4 Standard”, 116th AES Convention, Berlin, Germany, May 2004, pp. 1-13. |
Sporer, Thomas, “Codierung räumlicher Audiosignale mit leicht-gewichtigen Audio-Objekten”, Proc. Annual Meeting of the German Audiological Society (DGA), Erlangen, Germany, Mar. 2012, 22 pp. |
Wright, Matthew, et al., “Open SoundControl: A New Protocol for Communicating with Sound Synthesizers”, Proceedings of the 1997 International Computer Music Conference, vol. 2013, No. 8, 1997, 5 pp. |
Number | Date | Country | |
---|---|---|---|
20220101867 A1 | Mar 2022 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16277851 | Feb 2019 | US |
Child | 17549413 | US | |
Parent | 15002148 | Jan 2016 | US |
Child | 16277851 | US | |
Parent | PCT/EP2014/065289 | Jul 2014 | US |
Child | 15002148 | US |