This patent application is a U.S. National Stage application of International Patent Application Number PCT/FI2019/050766 filed Oct. 28, 2019, which is hereby incorporated by reference in its entirety, and claims priority to GB 1817887.1 filed Nov. 1, 2018.
Examples of the disclosure relate to apparatus, methods and computer programs for encoding spatial metadata. Some relate to apparatus, methods and computer programs for encoding spatial metadata associated with spatial audio content.
Spatial audio content may be used in immersive audio applications such as mediated reality content applications which could be virtual reality, augmented reality, mixed reality, extended reality or any other suitable type of applications. Spatial metadata may be associated with the spatial audio content. The spatial metadata may contain information which enables the spatial properties of the spatial audio content to be recreated.
According to various, but not necessarily all, examples of the disclosure there may be provided an apparatus comprising means for: obtaining spatial metadata associated with spatial audio content; obtaining a configuration parameter indicative of a source format of the spatial audio content; and using the configuration parameter to select a method of compression of the spatial metadata associated with the spatial audio content.
The configuration parameter may be used to select a codebook to compress the spatial metadata associated with the spatial audio content.
The configuration parameter may be used to enable a codebook for compressing the spatial metadata to be created.
The codebook may be used for encoding and decoding the spatial metadata.
The source format indicated by the configuration parameter may indicate a format of spatial audio that was used to obtain the spatial metadata.
The spatial metadata may comprise data indicative of spatial parameters of the spatial audio content.
The method of compression may be selected independently of the content of the obtained spatial audio content.
The means may be configured to obtain the spatial audio content.
The source configuration parameter may be obtained with the spatial audio content.
The source configuration parameter may be obtained separately to the spatial audio content.
According to various, but not necessarily all, examples of the disclosure there may be provided an apparatus processing circuitry; and memory circuitry including computer program code, the memory circuitry and the computer program code configured to, with the processing circuitry, cause the apparatus to: obtain spatial metadata associated with spatial audio content; obtain a configuration parameter indicative of a source format of the spatial audio content; and use the configuration parameter to select a method of compression of the spatial metadata associated with the spatial audio content.
According to various, but not necessarily all, examples of the disclosure there may be provided an encoding device comprising an apparatus as claimed in any preceding claim and one or more transceivers configured to transmit at least the spatial metadata to a decoding device.
According to various, but not necessarily all, examples of the disclosure there may be provided a method comprising: obtaining spatial metadata associated with spatial audio content; obtaining a configuration parameter indicative of a source format of the spatial audio content; and using the configuration parameter to select a method of compression of the spatial metadata associated with the spatial audio content.
The configuration parameter may be used to select a codebook to compress the spatial metadata associated with the spatial audio content.
According to various, but not necessarily all, examples of the disclosure there may be provided a computer program comprising computer program instructions that, when executed by processing circuitry, cause: obtaining spatial metadata associated with spatial audio content; obtaining a configuration parameter indicative of a source format of the spatial audio content; and using the configuration parameter to select a method of compression of the spatial metadata associated with the spatial audio content.
The configuration parameter may be used to select a codebook to compress the spatial metadata associated with the spatial audio content.
According to various, but not necessarily all, examples of the disclosure there may be provided a physical entity embodying the computer program as described above.
According to various, but not necessarily all, examples of the disclosure there may be provided an electromagnetic carrier signal carrying the computer program as described above.
According to various, but not necessarily all, examples of the disclosure there may be provided an apparatus comprising means for: receiving spatial audio content; receiving spatial metadata associated with the spatial audio content; and receiving information indicative of a method used to compress the spatial metadata associated with the spatial audio content wherein the method used to compress the spatial metadata is selected based on a source format of the spatial audio content.
The information indicative of the method used to compress the spatial metadata may comprise a source configuration parameter.
The information indicative of the method used to compress the spatial metadata may comprise a codebook that has been selected using a source configuration parameter.
According to various, but not necessarily all, examples of the disclosure there may be provided an apparatus comprising processing circuitry; and memory circuitry including computer program code, the memory circuitry and the computer program code configured to, with the processing circuitry, cause the apparatus to: receive spatial audio content; receive spatial metadata associated with the spatial audio content; and receive information indicative of a method used to compress the spatial metadata associated with the spatial audio content wherein the method used to compress the spatial metadata is selected based on a source format of the spatial audio content.
According to various, but not necessarily all, examples of the disclosure there may be provided a decoding device comprising an apparatus as described above and one or more transceivers configured to receive the spatial audio content and the spatial metadata from a decoding device.
According to various, but not necessarily all, examples of the disclosure there may be provided a method comprising: receiving spatial audio content; receiving spatial metadata associated with the spatial audio content; and receiving information indicative of a method used to compress the spatial metadata associated with the spatial audio content wherein the method used to compress the spatial metadata is selected based on a source format of the spatial audio content.
The information indicative of the method used to compress the spatial metadata may comprise a source configuration parameter.
According to various, but not necessarily all, examples of the disclosure there may be provided a computer program comprising computer program instructions that, when executed by processing circuitry, cause: receiving spatial audio content; receiving spatial metadata associated with the spatial audio content; and receiving information indicative of a method used to compress the spatial metadata associated with the spatial audio content wherein the method used to compress the spatial metadata is selected based on a source format of the spatial audio content.
The information indicative of the method used to compress the spatial metadata may comprise a source configuration parameter.
According to various, but not necessarily all, examples of the disclosure there may be provided a physical entity embodying the computer program as described above.
According to various, but not necessarily all, examples of the disclosure there may be provided an electromagnetic carrier signal carrying the computer program as described above.
Some example embodiments will now be described with reference to the accompanying drawings in which:
The figures illustrate an apparatus 101 comprising means for obtaining spatial metadata associated with spatial audio content. The spatial audio content may represent immersive audio content or any other suitable type of content. The means may also be configured for obtaining a configuration parameter indicative of a source format of the spatial audio content; and using the configuration parameter to select a method of compression of the spatial metadata associated with the spatial audio content.
The apparatus 101 may be for recording and/or processing captured audio signals.
In the example of
As illustrated in
The processor 105 is configured to read from and write to the memory 107. The processor 105 may also comprise an output interface via which data and/or commands are output by the processor 105 and an input interface via which data and/or commands are input to the processor 105.
The memory 107 is configured to store a computer program 109 comprising computer program instructions (computer program code 111) that controls the operation of the apparatus 101 when loaded into the processor 105. The computer program instructions, of the computer program 109, provide the logic and routines that enables the apparatus 101 to perform the methods illustrated in
The apparatus 101 therefore comprises: at least one processor 105; and at least one memory 107 including computer program code 111, the at least one memory 107 and the computer program code 111 configured to, with the at least one processor 105, cause the apparatus 101 at least to perform: spatial metadata associated with spatial audio content; obtaining 203 a configuration parameter indicative of a source format of the spatial audio content; and using 205 the configuration parameter to select a method of compression of the spatial metadata associated with the spatial audio content.
As illustrated in
The computer program 109 comprises computer program instructions for causing an apparatus 101 to perform at least the following: obtaining 201 spatial metadata associated with spatial audio content; obtaining 203 a configuration parameter indicative of a source format of the spatial audio content; and using 205 the configuration parameter to select a method of compression of the spatial metadata associated with the spatial audio content.
The computer program instructions may be comprised in a computer program 109, a non-transitory computer readable medium, a computer program product, a machine readable medium. In some but not necessarily all examples, the computer program instructions may be distributed over more than one computer program 109.
Although the memory 107 is illustrated as a single component/circuitry it may be implemented as one or more separate components/circuitry some or all of which may be integrated/removable and/or may provide permanent/semi-permanent/dynamic/cached storage.
Although the processor 105 is illustrated as a single component/circuitry it may be implemented as one or more separate components/circuitry some or all of which may be integrated/removable. The processor 105 may be a single core or multi-core processor.
References to “computer-readable storage medium”, “computer program product”, “tangibly embodied computer program” etc. or a “controller”, “computer”, “processor” etc. should be understood to encompass not only computers having different architectures such as single/multi-processor architectures and sequential (Von Neumann)/parallel architectures but also specialized circuits such as field-programmable gate arrays (FPGA), application specific circuits (ASIC), signal processing devices and other processing circuitry. References to computer program, instructions, code etc. should be understood to encompass software for a programmable processor or firmware such as, for example, the programmable content of a hardware device whether instructions for a processor, or configuration settings for a fixed-function device, gate array or programmable logic device etc.
As used in this application, the term “circuitry” may refer to one or more or all of the following:
This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor and its (or their) accompanying software and/or firmware. The term circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit for a mobile device or a similar integrated circuit in a server, a cellular network device, or other computing or network device.
The method comprises, at block 201 obtaining spatial metadata associated with spatial audio content. In some examples the spatial metadata could be obtained with the spatial audio content. In other examples the spatial metadata could be obtained separately to the spatial audio content. For instance, the apparatus 101 could obtain the spatial audio content and the could separately process the spatial audio content to obtain the spatial metadata.
The spatial audio content comprises content which can be rendered so that a user can perceive spatial properties of the audio content. For example, the spatial audio content may be rendered so that the user can perceive the direction of origin and the distance from an audio source. The spatial audio may enable an immersive audio experience to be provided to a user. The immersive audio experience could comprise a virtual reality, augmented reality, mixed reality or extended reality experience or any other suitable experience.
The spatial metadata that is associated with the spatial audio content comprises information relating to the spatial properties of a sound space represented by the spatial audio content. The spatial metadata may comprise information such as the direction of arrival of audio, distances to an audio source, direct-to-total energy ratios, diffuse-to-total energy ratio or any other suitable information. The spatial metadata may be provided in frequency bands.
At block 203 the method comprises obtaining a configuration parameter indicative of a source format of the spatial audio content. The configuration parameter may indicate the format of the spatial audio that has been used to obtain spatial metadata. In some examples the source format may indicate a configuration of the microphones that have been used to capture the spatial audio content that is then used to obtain spatial metadata.
The source format could be any suitable type of format. Examples of different source formats comprise configurations such as three dimensional spatial microphone configurations, two dimensional spatial microphone configurations, mobile phones with four or more microphones configured for three dimensional audio capture, mobile phones with three or more microphones configured for two dimensional audio capture, mobile phone with two microphones, surround sound such as 5.1 mix or 7.1 mix or any other suitable type of source format. The different source formats will produce spatial audio content which has associated spatial metadata. The different spatial metadata associated with the different source formats may have different characteristics.
The configuration parameter could comprise bits of data which indicate the source format. For instance, in some examples the configuration parameter could comprise eight bits of data which enables 256 different combinations for indicating the source format. Other numbers of bits could be used in other examples of the disclosure.
In such examples the bits of data could be configured in a predefined format. For instance, where the configuration parameter comprises eight bits the first two bits could define the overall source type. The overall source type could indicate whether the source is a microphone array, a channel-based source, a mobile device or a mixture. A mixture source could comprise audio captured by a microphone array mixed with a channel based source. For instance, a microphone array could be used to capture spatial audio and then a channel based music track is added as background audio. The channel based track could be provided from an audio file selected via a user interface or by any other suitable control means. It is to be appreciated that other mixture sources could be used in other examples of the disclosure.
The third bit could indicate whether or not the source contains elevation. For example, the third bit could indicate true or false depending on whether or not the source contains elevation.
The remaining five bits could comprise more detailed information about the source format. The more detailed information about the source format could be the type of microphone array which could indicate the number of microphones and the relative positions of the microphones or any other suitable type of format. In some examples the more detailed information about the source format could define a channel configuration such as 5.1, 7.1, 7.1+4, 22.2, 2.0 or any other suitable type of channel configuration. In some examples the more detailed information about the source format could indicate the type of mobile device that has been used to capture the spatial audio. For instance, it could indicate that the device was a specific six microphone mobile device, a generic four microphone device, a generic three microphone device or any other suitable type of device. In some examples the more detailed information about the source type could define a combination of different source types. For instance, it could comprise a 5.1 channel based format and one or more mobile devices or any other type of combination.
It is to be appreciated that other arrangements of the bits could be used in other examples of the disclosure. For instance, in some examples it may be possible to determine whether or not the source contains elevation from the indication of the source format and so in such cases the third bit indicating whether or not the source contains elevation might not be needed. For instance, if the source format is indicated as 5.1 then it would be inherent that this is a source format with no elevation while if the source format is indicated as 7.1+4 then it would be inherent that this is a source format with elevation.
In some examples a list of source formats could be used and the source configuration parameter could be indicative of a source format from the list.
At block 205 the method comprises using the configuration parameter to select a method of compression of the spatial metadata associated with the spatial audio content. For example, a plurality of compression methods may be available and the configuration parameter may be used to select one of these available parameters.
In some examples the configuration parameter may be used to select a codebook to compress the spatial metadata associated with the spatial audio content. The codebook could be any suitable spatial metadata compression codebook that can be used both for encoding and decoding the spatial metadata. The codebook may comprise a look-up table of values that can be used to compress and then reconstruct the spatial metadata. In some examples the codebook could comprise a combination of look-up tables and algorithms and any other suitable methods. In some examples a switching system could be used which could enable switching between different types of codebooks.
In some examples the configuration parameter may be used to select one or more algorithms. The algorithms could then be used to generate a codebook or other method of compression. For instance, in some examples the configuration parameter could enable the selection of an algorithm that enables values to be computed based on a transmitted index value.
Where the configuration parameter enables selection of a codebook, the codebook could be prepared in advanced based on statistics of a set of input samples that represent the category of source format. The correct codebook could then be selected from the prepared codebooks based, at least partly, on the source configuration parameter.
In some examples the configuration parameter could be used to enable a codebook for compressing the spatial metadata to be created. The source configuration parameter could provide some information about the statistics of the parameters and this information could be used to create a new codebook and/or modify an existing codebook.
Information indicative of the codebook that has been selected may be transmitted from an encoding device to a decoding device. The information indicative of the codebook that has been selected could be transmitted as a dynamic value within a metadata stream. In other examples the information indicative of the codebook that has been selected could be transmitted through a separate channel at the start of a transmission or at specific time points during the transmission.
The encoding device 303 may be any device which is configured to obtain spatial metadata associated with spatial audio content. In some examples the encoding device 303 could be configured to encode the spatial audio content and spatial metadata.
In the example of
In some examples the analysis processor 105A may be configured to analyse the input audio signal 311 to obtain a spatial audio content and spatial metadata. It is to be appreciated that in other examples the analysis processor 105A could receive both the spatial audio content and the spatial metadata. In such examples it would not be necessary for the analysis processor 105A to analyse the spatial audio content to obtain the spatial metadata.
The analysis processor 105A is configured to create the transport signals 313 for the spatial audio content and spatial metadata. The analysis processor 105A may be configured to encode both the spatial audio content and the spatial metadata to provide the transport signal 313.
In the example system 301 shown in
In the example of
The synthesis processor 105B uses the spatial metadata to create the spatial properties of the spatial audio content so as to provide to a listener spatial audio content that represents the spatial properties of the captured sound scene. The spatial audio may enable immersive audio to be provided to a user. The spatial audio output signals 315 could be a multichannel loudspeaker signal, a binaural signal, a spherical harmonic signal or any other suitable type of signal.
The spatial audio output signals 315 can be provided to any suitable rendering device such as one or more loudspeakers, a head set or any other suitable rendering device.
The transport audio signal generator 401 receives the input audio signal 311 comprising spatial audio content. The transport audio signal generator 401 is configured to generate the transport audio signal 411 from the received input audio signal 311. The source format of the spatial audio content may be used to generate the transport audio signal. For instance, in order to generate a stereo transport audio signal, if the spatial audio content was captured by a microphone array such as a spherical microphone grid, then two opposite microphones could be selected as the transport signals. Equalization or other suitable processing may be applied to the transport signals.
The transport audio signal 411 could comprise a mono signal, a stereo signal, a binauralized stereo signal, or any other suitable signal, e.g. a FOA signal.
The spatial analyser 403 also receives the input audio signal 311 comprising spatial audio content. The spatial analyser 403 is configured to analyse the spatial audio content to provide spatial parameters which form spatial metadata. The spatial parameters represent the spatial properties of a sound space represented by the spatial audio content. The spatial parameters may comprise information such as the direction of arrival of audio, distances to and audio source, direct-to-total energy ratios, diffuse-to-total energy ratio or any other suitable parameters. The spatial analyser 403 may analyse different frequency bands of the spatial audio content so that the spatial metadata may be provided in frequency bands. For instance a suitable set of frequency bands would be 24 frequency bands that follow the Bark scale. Other sets of frequency bands could be used in other examples of the disclosure.
The spatial analyser 403 provides one or more output signals comprising spatial metadata. In the example shown in
The multiplexer 405 is configured to receive the transport audio signal 411 and the spatial metadata outputs 415, 417 and combine these to generate the transport signal 313.
In the example of
In the example of
The multiplexer 405 is configured to encode the spatial audio content and also the spatial metadata. The source configuration parameter is used to select the method of compression of the spatial metadata. For instance, the source configuration parameter may be configured to select a codebook to use to encode the spatial metadata.
In the example of
The multiplexer also comprises a datastream generator/combiner module 425. The datastream generator/combiner module 425 is configured to combine the compressed transport audio signal and the compressed spatial metadata into a transport signal 313 which is provided as an output of the encoding device 303.
In the example shown in
The demultiplexer 501 receives the transport signal 313 comprising the encoded spatial audio content and the encoded spatial metadata as an input. The transport signal may comprise the configuration parameter. The demultiplexer 501 is configured to receive the transport signal 313 and separate this into two or more separate components. In the example in
In the example of
The demultiplexer 501 also comprises a transport audio signal decompressor/decoder module 523. The transport audio signal decompressor/decoder module 523 is configured to receive the component comprising the audio content from the datastream receiver/splitter module 521 and decompress the audio content. The transport audio signal decompressor/decoder module 523 then provides the decoded transport audio signal 511 as an output.
In the example shown in
In the example of
The prototype signal 541 from the prototype signal generator module 531 is provided to both the direct stream generator module 505 and the diffuse stream generator module 507. In the example shown in
In the example shown in
The diffuse stream 545 and the direct stream 543 are provided to the stream combiner module 509. The stream combiner module 509 is configured to combine the direct stream 543 and the diffuse stream 545 to provide spatial audio output signals 315. The spatial metadata relating to the energy ratios may be used to combine the direct stream 543 and the diffuse stream 545.
The spatial audio output signals 315 could be provided to a rendering device such as one or more loudspeakers, a headset or any other suitable device which is configured to convert the electronic spatial audio output signals 315 into audible signals.
In the example shown in
At block 601 a source configuration is selected. The source configuration is the format that is used for capturing audio signals. The selecting of the source configuration could comprise selecting the microphone arrangement that is to be used to capture the audio signals, selecting the devices that are to be used to capture the audio signals, selecting the pre-mixed channel format, or any other selections.
At block 603 spatial audio content is obtained. The spatial audio content that is obtained is captured using the source configuration that is selected at block 601. The spatial audio content could comprise a representative set of audio samples. The representative set of samples could comprise a standard set of acoustic signals that can be used for the purposes of creating a codebook for compression of the spatial metadata. The representative set of samples could comprise one or more acoustic samples with different spatial properties.
At block 605 spatial analysis is performed on the obtained spatial audio content. The spatial analysis determines one or more spatial parameters of the spatial audio content. The spatial parameters could be direction parameters, energy ratio parameters, coherence parameters or any other suitable parameters. The spatial analysis that is performed could be the same spatial analysis process that is performed by the spatial analyser 403 of the encoding device 303 to obtain spatial metadata. Where the obtained spatial audio content comprises a representative set of samples the same spatial analysis may be performed on each of the samples within the set.
At block 607 the statistics of the spatial parameters obtained at block 605 are analyzed. The analysis enables the probability of occurance for each parameter value to be determined. The analysis could comprise counting each occurrence of a parameter value from the obtained spatial audio. The occurrences could be counted using a histogram or any other suitable means.
At block 609 the method comprises using the statistics obtained at block 607 to design a codebook. For instance, the codebook could be designed so that the most probable parameters have the shortest code values while the least probable parameters are assigned longer code values. This may be achieved by ordering the parameter values from the highest occurrence to the lowest occurrence and then assigning code values to the ordered parameter values starting with the parameter value with the highest occurrence which is assigned the shortest available code value. This ensures that the spatial metadata will use fewer bits per value after it has been compressed. The codebook that this creates could comprise look-up tables, or any other suitable information. In some examples one or more algorithms could be used to generate the codebook.
At block 611 the codebook is stored. The codebook could be stored in a memory of the encoding device 303 or in any other suitable storage location. The codebook is stored so that it can be accessed during compression and decompression of the spatial metadata.
The method of
At block 701 the multiplexer 405 obtains audio content. The audio content may be obtained in transport audio signals 411. The transport audio signal 411 could be obtained from a transport audio signal generator 401 as shown in
At block 703 the multiplexer 405 obtains spatial metadata. The spatial metadata may comprise outputs 415, 417 from a spatial analyser 403. The spatial metadata may be provided in a parametric format which comprises values for one or more spatial parameters of the spatial audio content that is provided within the transport signal 411. The spatial metadata could be obtained from a spatial analyser 403 as shown in
At block 705 the multiplexer 405 obtains a source configuration parameter. The input source configuration parameter indicates the source format that was used to capture the spatial audio or an equivalent description of the source configuration. The source configuration parameter could be received as an input from the capturing device or could be received in response to a user input via a user interface or by any other suitable means. The source configuration parameter could be obtained as part of the spatial metadata package. In such examples obtaining the source configuration parameter could comprise reading the parameter from the spatial metadata package.
At block 707 the spatial audio content is compressed. The spatial audio content may be compressed using any suitable technique. In the example shown in
At block 709 the method of compression for the spatial metadata is selected. The obtained source configuration parameter is used to select the method of compression of the spatial metadata. Selecting the method of compression could comprise selecting a pre-formed codebook which corresponds to the source format for the captured spatial audio. The pre-formed codebook could be stored in a memory of the encoding device 303 or in any memory which is accessible by the encoding device 303. In some examples selecting the method of compression could comprise selecting a computable or algebraic codebook, where the codebook is based on an algorithm.
Once the pre-formed codebook has been retrieved from the memory it may be passed to a spatial metadata encoding module 423 so that at block 711 the codebook can be used to compress the spatial metadata. The method of compressing the spatial metadata could be any method of compression which uses the codebook. For instance, the method could comprise Huffman coding or any other suitable process.
In some examples before the spatial metadata is compressed a quantization process may be performed. The quantization process may comprise quantizing the parameter values of the parametric spatial metadata so that each parameter value has a corresponding code value. In some examples the source configuration parameter could also be used for the quantization process as the optimal quantization may also depend on the source format. For instance a spherically uniform quantization could be applied to a direction parameter when there is elevation in the source format so as to obtain a more uniform, and perceptually better, quantized direction distribution than would be achieved with other quantization processes.
In some examples the source configuration parameter can be used to determine the quantization process that is used. In such cases it might not be necessary to provide a separate indication of the source configuration parameter to a decoder device 305 as the correct source configuration and/or method compression could be inherent from the quantization process.
At block 713 the compressed spatial audio content and the compressed spatial metadata are encoded together to form an encoded transport signal 313. The combining of the compressed spatial audio content and the compressed spatial metadata could be performed by a datastream generator/combiner module 425 or any other suitable module. In some examples the combining of the compressed spatial audio content and the compressed spatial metadata could also comprise further compression such as run-length encoding or any other lossless encoding.
At block 801 spatial audio is captured. The spatial audio is captured using a source format.
At block 805 the captured spatial audio is processed to form an audio transport signal 411. The audio transport signal 411 comprises the audio content. The processing of the captured spatial audio to form an audio transport signal 411 may be performed by a transport audio signal generator 401 or any other suitable component.
At block 807 spatial analysis is performed on the spatial audio content to obtain the spatial metadata. The spatial analysis could be performed by a spatial analyser 403 as shown in
At block 803 a source configuration parameter is obtained. The input source configuration parameter indicates the source format that was used to capture the spatial audio. The source configuration parameter could be stored in the memory of the audio capturing device or could be received in response to a user input via a user interface or by any other suitable means.
At block 809 the audio transport signals 411 comprising the spatial audio content are compressed. The audio transport signals 411 may be compressed using any suitable technique. In the example shown in
At block 811 the method of compression for the spatial metadata is selected. The obtained source configuration parameter is used to select the method of compression of the spatial metadata. As shown in the method of
Once the pre-formed codebook has been retrieved from the memory it may be passed to a spatial metadata encoding module 423 so that at block 813 the codebook can be used to compress the spatial metadata. The method of compressing the spatial metadata could be any method of compression which uses the codebook. For instance the method could comprise Huffman coding or any other suitable process. A quantization process may be applied to the spatial metadata before the spatial metadata is compressed.
At block 815 the compressed spatial audio content and the compressed spatial metadata are encoded together to form an encoded transport signal 313. The combining of the compressed spatial audio content and the compressed spatial metadata could be performed by a datastream generator/combiner module 425 or any other suitable module. In some examples the combining of the compressed spatial audio content and the compressed spatial metadata could also comprise further compression such as run-length encoding or any other lossless encoding.
At block 901 the received encoded transport signal 313 is decoded into a separate transport audio stream and spatial metadata stream. The transport audio stream comprises the audio content and the spatial metadata stream comprises parametric values relating to the spatial properties of the transport audio stream.
At block 903 the spatial audio content from the transport audio stream is decompressed. Any suitable process may be used for the decompression of the spatial audio content. At block 905 a prototype signal 541 is formed. The prototype signal 541 may be formed by a prototype signal generator module 531 as shown in
At block 907 the source configuration parameter is obtained. In some examples the source configuration parameter could be received with the encoded transport signal 313. For instance the source configuration parameter could be encoded into the spatial metadata stream. In such examples the source configuration parameter could be provided as the first value in the spatial metadata stream or any other defined value in the spatial metadata stream. Providing the source configuration parameter with the spatial metadata stream could allow for updating of the source configuration for different signal frames which can help to increase the efficiency of the compression.
In other examples the source configuration parameter could be received separately to the encoded transport signal 313. This could be provided through a separate signaling channel to the spatial metadata or the spatial audio content. For instance the source configuration parameter could be provided separately to the bitstream that transmits the audio content and the spatial metadata.
At block 909 the source configuration parameter is used to select a method of decompression for the spatial metadata. Selecting the method of decompression could comprise selecting a codebook based on the source configuration parameter.
At block 911 the selected method of decompression is used to decompress the spatial metadata and provide spatial metadata parameters to the synthesizer. The decompression of the spatial metadata may be an inverse of the process which has been used to compress the spatial metadata. For example, decompressing the spatial metadata may comprise reading code values from the spatial metadata stream and retrieving a corresponding parameter value from the selected codebook. In other examples the code vales from the spatial metadata stream could be used in an algorithm that provides the corresponding parameter value via computational means. In some examples the algorithms could be used instead of a look-up table. In other examples the algorithms could be used in addition to the look-up tables.
At block 913 the spatial metadata and the prototype signal 541 are synthesized into spatial audio output signals.
In the example method shown in
Examples of the disclosure therefore provide apparatus and methods and computer programs for efficiently encoding spatial metadata by enabling an appropriate compression method to be used for the spatial metadata. This can be done as a separate process to the encoding of the audio content.
The above described examples find application as enabling components of: automotive systems; telecommunication systems; electronic systems including consumer electronic products; distributed computing systems; media systems for generating or rendering media content including audio, visual and audio visual content and mixed, mediated, virtual and/or augmented reality; personal systems including personal health systems or personal fitness systems; navigation systems; user interfaces also known as human machine interfaces; networks including cellular, non-cellular, and optical networks; ad-hoc networks; the internet; the internet of things; virtualized networks; and related software and services.
The term “comprise” is used in this document with an inclusive not an exclusive meaning. That is any reference to X comprising Y indicates that X may comprise only one Y or may comprise more than one Y. If it is intended to use “comprise” with an exclusive meaning then it will be made clear in the context by referring to “comprising only one . . . ” or by using “consisting”.
In this description, reference has been made to various examples. The description of features or functions in relation to an example indicates that those features or functions are present in that example. The use of the term “example” or “for example” or “can” or “may” in the text denotes, whether explicitly stated or not, that such features or functions are present in at least the described example, whether described as an example or not, and that they can be, but are not necessarily, present in some of or all other examples. Thus “example”, “for example”, “can” or “may” refers to a particular instance in a class of examples. A property of the instance can be a property of only that instance or a property of the class or a property of a sub-class of the class that includes some but not all of the instances in the class. It is therefore implicitly disclosed that a feature described with reference to one example but not with reference to another example, can where possible be used in that other example as part of a working combination but does not necessarily have to be used in that other example.
Although embodiments have been described in the preceding paragraphs with reference to various examples, it should be appreciated that modifications to the examples given can be made without departing from the scope of the claims.
Features described in the preceding description may be used in combinations other than the combinations explicitly described above.
Explicitly indicate that features from different embodiments (e.g. different methods with different flow charts) can be combined, to
Although functions have been described with reference to certain features, those functions may be performable by other features whether described or not.
Although features have been described with reference to certain embodiments, those features may also be present in other embodiments whether described or not.
The term “a” or “the” is used in this document with an inclusive not an exclusive meaning. That is any reference to X comprising a/the Y indicates that X may comprise only one Y or may comprise more than one Y unless the context clearly indicates the contrary. If it is intended to use “a” or “the” with an exclusive meaning then it will be made clear in the context. In some circumstances the use of “at least one” or “one or more” may be used to emphasis an inclusive meaning but the absence of these terms should not be taken to infer and exclusive meaning.
The presence of a feature (or combination of features) in a claim is a reference to that feature or (combination of features) itself and also to features that achieve substantially the same technical effect (equivalent features). The equivalent features include, for example, features that are variants and achieve substantially the same result in substantially the same way. The equivalent features include, for example, features that perform substantially the same function, in substantially the same way to achieve substantially the same result.
In this description, reference has been made to various examples using adjectives or adjectival phrases to describe characteristics of the examples. Such a description of a characteristic in relation to an example indicates that the characteristic is present in some examples exactly as described and is present in other examples substantially as described.
Whilst endeavoring in the foregoing specification to draw attention to those features believed to be of importance it should be understood that the Applicant may seek protection via the claims in respect of any patentable feature or combination of features hereinbefore referred to and/or shown in the drawings whether or not emphasis has been placed thereon.
Number | Date | Country | Kind |
---|---|---|---|
1817887 | Nov 2018 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/FI2019/050766 | 10/28/2019 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2020/089523 | 5/7/2020 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
9762907 | Fallon et al. | Sep 2017 | B2 |
11632549 | Setiawan | Apr 2023 | B2 |
20060233379 | Villemoes et al. | Oct 2006 | A1 |
20060235865 | Sperschneider et al. | Oct 2006 | A1 |
20080162523 | Kraus et al. | Jul 2008 | A1 |
20090248425 | Vetterli | Oct 2009 | A1 |
20090271184 | Goto | Oct 2009 | A1 |
20130208903 | Ojala | Aug 2013 | A1 |
20150332691 | Kim | Nov 2015 | A1 |
20160092461 | Flick | Mar 2016 | A1 |
20160133267 | Adami et al. | May 2016 | A1 |
20160254028 | Atkins et al. | Sep 2016 | A1 |
20180213229 | Setiawan et al. | Jul 2018 | A1 |
20200311077 | Zhang | Oct 2020 | A1 |
Number | Date | Country |
---|---|---|
106023999 | Oct 2016 | CN |
106575166 | Apr 2017 | CN |
2008536410 | Sep 2008 | JP |
2013543146 | Nov 2013 | JP |
2016525715 | Aug 2016 | JP |
2016526189 | Sep 2016 | JP |
WO-2005116916 | Dec 2005 | WO |
WO-2006108463 | Oct 2006 | WO |
WO 2008100098 | Aug 2008 | WO |
WO 2014125289 | Aug 2014 | WO |
WO-2015010998 | Jan 2015 | WO |
WO-2015176003 | Nov 2015 | WO |
Entry |
---|
Herre, Jurgen et al. “MPEG-H 3D Audio—The New Standard for Coding of Immersive Spatial Audio” IEEE Journal of Selected Topics in Signal Processing, vol. 9, No. 5, Aug. 2015. |
Li, Gang, et al., “The Perceptual Lossless Quantization of Spatial Parameter for 3D Audio Signals”, Dec. 2017, International Conference on Multimedia Modeling, 1 pg., abstract only. |
Wang, Jing, et al., “Context-based adaptive arithmetic coding in time and frequency domain for the lossless compression of audio coding parameters at variable rate”, May 2013, EURASI Journal on Audio Speech and Music Processing, 1 pg., abstract only. |
Neuendorf, Max, et al., “Draft of the 2nd edition of ISO/IEC 23008-3 3D Audio”, Fraunhofer IIS, ISO/IEC JTC1/SC29/WG11 MPEG2016/M39243, Oct. 16, 2016. |
Number | Date | Country | |
---|---|---|---|
20220115024 A1 | Apr 2022 | US |