APPARATUS AND METHOD FOR ENCODING A PLURALITY OF AUDIO OBJECTS OR APPARATUS AND METHOD FOR DECODING USING TWO OR MORE RELEVANT AUDIO OBJECTS

Information

  • Patent Application
  • 20230298602
  • Publication Number
    20230298602
  • Date Filed
    April 06, 2023
    a year ago
  • Date Published
    September 21, 2023
    8 months ago
Abstract
Apparatus for encoding a plurality of audio objects, having: an object parameter calculator configured for calculating, for one or more frequency bins of a plurality of frequency bins related to a time frame, parameter data for at least two relevant audio objects, wherein a number of the at least two relevant audio objects is lower than a total number of the plurality of audio objects, and an output interface for outputting an encoded audio signal having information on the parameter data for the at least two relevant audio objects for the one or more frequency bins.
Description
Claims
  • 1. An apparatus for encoding a plurality of audio objects, comprising: an object parameter calculator configured for calculating, for one or more frequency bins of a plurality of frequency bins related to a time frame, parameter data for at least two relevant audio objects, wherein a number of the at least two relevant audio objects is lower than a total number of the plurality of audio objects, wherein the object parameter calculator is configured for performing a selection of the number of the at least two relevant audio objects and for not indicating the total number of audio objects as being relevant, andan output interface for outputting an encoded audio signal comprising information on the parameter data for the at least two relevant audio objects.
  • 2. The apparatus of claim 1, wherein the object parameter calculator is configured to convert each audio object of the plurality of audio objects into a spectral representation comprising the plurality of frequency bins,to calculate a selection information from each audio object for the one or more frequency bins, andto derive object identifications as the parameter data indicating the at least two relevant audio objects, based on the selection information, andwherein the output interface is configured to introduce information on the object identifications into the encoded audio signal.
  • 3. The apparatus of claim 1, wherein the object parameter calculator is configured to quantize and encode one or more amplitude related measures or one or more combined values derived from the amplitude related measures of the relevant audio objects in the one or more frequency bins as the parameter data, and wherein the output interface is configured to introduce the quantized one or more amplitude related measure or the quantized one or more combined values into the encoded audio signal.
  • 4. The apparatus of claim 2, wherein the selection information is an amplitude-related measure such as an amplitude value, a power value or a loudness value or an amplitude raised to a power being different from one for the audio object, andwherein the object parameter calculator is configured to calculate a combined value such as a ratio from an amplitude related measure of a relevant audio object and a sum of two or more amplitude related measures of the relevant audio objects, andwherein the output interface is configured to introduce an information on the combined value into the encoded audio signal, wherein a number of information items on the combined values in the encoded audio signal is equal to at least one and is lower than the number of relevant audio objects for the one or more frequency bins.
  • 5. The apparatus of claim 2, wherein the object parameter calculator is configured to select the object identifications based on an order of the selection information of the plurality of audio objects in the one or more frequency bins.
  • 6. The apparatus of claim 2, wherein the object parameter calculator is configured to calculate a signal power as the selection information,to derive the object identifications for the two or more audio objects comprising the greatest signal power values in the corresponding one or more frequency bins for each frequency bin separately,to calculate a power ratio between the sum of the signal powers of the two or more audio objects comprising the greatest signal power values and the signal power of each of the audio objects comprising the derived object identifications as the parameter data, andto quantize and encode the power ratio, andwherein the output interface is configured to introduce the quantized and encoded power ratio into the encoded audio signal.
  • 7. The apparatus of claim 1, wherein the output interface is configured to introduce, into the encoded audio signal, one or more encoded transport channels,as the parameter data, two or more encoded object identifications for the relevant audio objects for each one of the one or more frequency bins of the plurality of frequency bins in the time frame, and one or more encoded combined values or encoded amplitude related measures, andquantized and encoded direction data for each audio object in the time frame, the direction data being constant for all frequency bins of the one or more frequency bins.
  • 8. The apparatus of claim 1, wherein the object parameter calculator is configured to calculate the parameter data for at least the most dominant object and the second most dominant object in the one or more frequency bins, wherein the most dominant object and the second most dominant object represent the relevant objects, or wherein a number of audio objects of the plurality of audio objects is three or more, the plurality of audio objects comprising a first audio object, a second audio object and a third audio object, andwherein the object parameter calculator is configured to calculate for a first one of the one or more frequency bins, as the relevant audio objects, only a first group of audio objects such as the first audio object and the second audio object, and to calculate, as the relevant audio objects for a second frequency bin of the one or more frequency bins, only a second group of audio objects, such as the second audio object and the third audio object or the first audio object and the third audio object, wherein the first group of audio objects is different from the second group of audio objects at least with respect to one group member.
  • 9. The apparatus of claim 1, wherein the object parameter calculator is configured to calculate raw parametric data with a first time or frequency resolution and to combine the raw parametric data into combined parametric data comprising a second time or frequency resolution being lower than the first time of frequency resolution, and, and to calculate the parameter data for the at least two relevant audio objects with respect to the combined parametric data comprising the second time or frequency resolution, orto determine parameter bands comprising a second time or frequency resolution being different from a first time or frequency resolution used in a time or frequency decomposition of the plurality of audio objects, and to calculate the parameter data for the at least two relevant audio objects for the parameter bands comprising the second time or frequency resolution.
  • 10. The apparatus of claim 1, wherein the plurality of audio objects comprise related metadata indicating direction information on the plurality of audio objects, and wherein the apparatus further comprises: a downmixer for downmixing the plurality of audio objects to acquire one or more transport channels, wherein the downmixer is configured to downmix the plurality of audio objects in response to the direction information on the plurality of audio objects; anda transport channel encoder for encoding one or more transport channels to acquire one or more encoded transport channels; andwherein the output interface is configured to introduce the one or more transport channels into the encoded audio signal.
  • 11. The apparatus of claim 10, wherein the downmixer is configured to generate two transport channels as two virtual microphone signals arranged at the same position and comprising different orientations or at two different positions with respect to a reference position or orientation such as a virtual listener position or orientation, orto generate three transport channels as three virtual microphone signals arranged at the same position and comprising different orientations or at three different positions with respect to a reference position or orientation such as a virtual listener position or orientation, orto generate four transport channels as four virtual microphone signals arranged at the same position and comprising different orientations or at four different positions with respect to a reference position or orientation such as a virtual listener position or orientation, orwherein the virtual microphone signals are virtual first order microphone signals, or virtual cardioid microphone signals, or virtual figure of 8 or dipole or bidirectional microphone signals, or virtual directional microphone signals, or virtual subcardioid microphone signals, or virtual unidirectional microphone signals, or virtual hypercardioid microphone signals, or virtual omnidirectional microphone signals.
  • 12. The apparatus of claim 10, wherein the downmixer is configured to derive, for each audio object of the plurality of audio objects, a weighting information for each transport channel using the direction information for the corresponding audio object;to weight the corresponding audio object using the weighting information for the audio object for a specific transport channel to acquire an object contribution for the specific transport channel, andto combine the object contributions for the specific transport channel from the plurality of audio objects to acquire the specific transport channel.
  • 13. The apparatus of claim 10, wherein the downmixer is configured to calculate the one or more transport channels as one or more virtual microphone signals arranged at the same position and comprising different orientations or at different positions with respect to a reference position or orientation such as a virtual listener position or orientation, to which the direction information is related,wherein the different positions or orientations are on or to a left side of a center line and on or to a right side of the center line, or wherein the different positions or orientations are equally or non-equally distributed to horizontal positions or orientations such as +90 degrees or -90 degrees with respect to the center line or -120 degrees, 0 degrees and +120 degrees with respect to the center line, or wherein the different positions or orientations comprise at least one position or orientation being directed upwards or downwards with respect to a horizontal plane in which a virtual listener is placed, wherein the direction information on the plurality of audio objects is related to the virtual listener position or reference position or orientation.
  • 14. The apparatus in accordance with claim 10, further comprising: a parameter processor for quantizing the metadata indicating the direction information on the plurality of audio objects to acquire quantized direction items for the plurality of audio objects,wherein the downmixer is configured to operate in response to the quantized direction items as the direction information, andwherein the output interface is configured to introduce information on the quantized direction items into the encoded audio signal.
  • 15. The apparatus of claim 10, wherein the downmixer is configured to perform an analysis of the direction information on the plurality of audio objects and to place one or more virtual microphones for the generation of the transport channels depending on a result of the analysis.
  • 16. The apparatus of claim 10, wherein the downmixer is configured to downmix using a downmixing rule being static over the plurality of time frames, orwherein the direction information is variable over a plurality of time frames, and wherein the downmixer is configured to downmix using a downmixing rule being variable over the plurality of time frames.
  • 17. The apparatus of claim 10, wherein the downmixer is configured to downmix in a time domain using a sample-by-sample weighting and combining of samples of the plurality of audio objects.
  • 18. A decoder for decoding an encoded audio signal comprising one or more transport channels and direction information for a plurality of audio objects, and, for one or more frequency bins of a time frame, parameter data for at least two relevant audio objects, wherein a number of the at least two relevant audio objects is lower than a total number of the plurality of audio objects, wherein the number of the at least two relevant audio objects is a selection from the total number of audio objects, wherein the total number of audio objects is not indicated as being relevant, the decoder comprising: an input interface for providing the one or more transport channels in a spectral representation comprising, in the time frame, the plurality of frequency bins; andan audio renderer for rendering the one or more transport channels into a number of audio channels using the direction information, so that a contribution from the one or more transport channels in accordance with a first direction information associated with a first one of the at least two relevant audio objects and in accordance with a second direction information associated with a second one of the at least two relevant audio objects is accounted for, orwherein the audio renderer is configured to calculate, for each one of the one or more frequency bins, a contribution from the one or more transport channels in accordance with a first direction information associated with a first one of the at least two relevant audio objects and in accordance with a second direction information associated with a second one of the at least two relevant audio objects.
  • 19. The decoder of claim 18, wherein the audio renderer is configured to ignore, for the one or more frequency bins, a direction information of an audio object different from the at least two relevant audio objects.
  • 20. The decoder of claim 18, wherein the encoded audio signal comprises an amplitude related measure for each relevant audio object or a combined value related to at least two relevant audio objects in the parameter data, andwherein the audio renderer is configured to determine a quantitative contribution of the one or more transport channels in accordance with the amplitude-related measure or the combined value.
  • 21. The decoder of claim 20, wherein the encoded signal comprises the combined value in the parameter data, and wherein the audio renderer is configured to determine the contribution of the one or more transport channels using the combined value for one of the relevant audio objects and the direction information for the one relevant audio object, andwherein the audio renderer is configured to determine the contribution for the one or more transport channels using a value derived from the combined value for another of the relevant audio objects in the one or more frequency bins and the direction information of the other relevant audio object.
  • 22. The decoder of claim 18, wherein the audio renderer is configured to calculate a direct response information from the relevant audio objects per each frequency bin of the plurality of frequency bins and the direction information associated with the relevant audio objects in the frequency bins,.
  • 23. The decoder of claim 22, wherein the audio renderer is configured to determine a diffuse signal per each frequency bin of the plurality of frequency bins using a diffuseness information such as a diffuseness parameter included in the metadata or a decorrelation rule and to combine a direct response as determined by the direct response information and the diffuse signal to acquire a spectral domain rendered signal for a channel of the number of channels, orto calculate a synthesis information using the direct response information and an information on the number of audio channels, and to apply the covariance synthesis information to the one or more transport channels to acquire the number of audio channels, orwherein the direct response information is a direct response vector for each relevant audio object, and wherein the covariance synthesis information is a covariance synthesis matrix, and wherein the audio renderer is configured to perform a matrix operation per frequency bin in applying the covariance synthesis information.
  • 24. The decoder of claim 22, wherein the audio renderer is configured to derive, in the calculation of the direct response information, a direct response vector for each relevant audio object and to calculate, for each relevant audio object, a covariance matrix from each direct response vector,to derive, in the calculation of the covariance synthesis information, a target covariance information from the covariance matrices from each one of the relevant audio objects,a power information on the respective relevant audio object, anda power information derived from the one or more transport channels.
  • 25. The decoder of claim 24, wherein the audio renderer is configured to derive, in the calculation of the direct response information, a direct response vector for each relevant audio object and to calculate, for each relevant audio object, a covariance matrix from each direct response vector,to derive an input covariance information from the transport channels, andto derive a mixing information from the target covariance information, the input covariance information and the information on the number of channels, andto apply the mixing information to the transport channels for each frequency bin in the time frame.
  • 26. The decoder of claim 25, wherein a result of the application of the mixing information for each frequency bin in the time frame is converted into a time domain to acquire the number of audio channels in the time domain.
  • 27. The decoder of claim 22, wherein the audio renderer is configured to only use main diagonal elements of an input covariance matrix derived from the transport channels in a decomposition of the input covariance matrix, orto perform a decomposition of a target covariance matrix using a direct response matrix and a matrix of powers of the objects or transport channels, orto perform a decomposition of the input covariance matrix by taking the root of each main diagonal element of the input covariance matrix, orto calculate a regularized inverse of a decomposed input covariance matrix, orto perform a singular value decomposition in calculating an optimum matrix to be used in an energy compensation without an extended identity matrix.
  • 28. A method of encoding a plurality of audio objects, comprising: calculating, for one or more frequency bins of a plurality of frequency bins related to a time frame, parameter data for at least two relevant audio objects, wherein a number of the at least two relevant audio objects is lower than a total number of the plurality of audio objects, wherein the calculating comprises performing a selection of the number of the at least two relevant audio objects and not indicating the total number of audio objects as being relevant, andoutputting an encoded audio signal comprising information on the parameter data for the at least two relevant audio objects.
  • 29. A method of decoding an encoded audio signal comprising one or more transport channels and direction information for a plurality of audio objects, and, for one or more frequency bins of a time frame, parameter data for at least two relevant audio objects, wherein a number of the at least two relevant audio objects is lower than a total number of the plurality of audio objects, wherein the number of the at least two relevant audio objects is a selection from the total number of audio objects, wherein the total number of audio objects is not indicated as being relevant, the method of decoding comprising: providing the one or more transport channels in a spectral representation comprising, in the time frame, the plurality of frequency bins; andaudio rendering the one or more transport channels into a number of audio channels using the direction information,wherein the audio rendering comprises calculating, for each one of the one or more frequency bins, a contribution from the one or more transport channels in accordance with a first direction information associated with a first one of the at least two relevant audio objects and in accordance with a second direction information associated with a second one of the at least two relevant audio objects, or so that a contribution from the one or more transport channels in accordance with a first direction information associated with a first one of the at least two relevant audio objects and in accordance with a second direction information associated with a second one of the at least two relevant audio objects is accounted for.
  • 30. A non-transitory digital storage medium having stored thereon a computer program for performing a method of encoding a plurality of audio objects, comprising: calculating, for one or more frequency bins of a plurality of frequency bins related to a time frame, parameter data for at least two relevant audio objects, wherein a number of the at least two relevant audio objects is lower than a total number of the plurality of audio objects, wherein the calculating comprises performing a selection of the number of the at least two relevant audio objects and not indicating the total number of audio objects as being relevant, andoutputting an encoded audio signal comprising information on the parameter data for the at least two relevant audio objects,when said computer program is run by a computer.
  • 31. A non-transitory digital storage medium having stored thereon a computer program for performing a method of decoding an encoded audio signal comprising one or more transport channels and direction information for a plurality of audio objects, and, for one or more frequency bins of a time frame, parameter data for at least two relevant audio objects, wherein a number of the at least two relevant audio objects is lower than a total number of the plurality of audio objects, wherein the number of the at least two relevant audio objects is a selection from the total number of audio objects, wherein the total number of audio objects is not indicated as being relevant, the method of decoding comprising: providing the one or more transport channels in a spectral representation comprising, in the time frame, the plurality of frequency bins; andaudio rendering the one or more transport channels into a number of audio channels using the direction information,wherein the audio rendering comprises calculating, for each one of the one or more frequency bins, a contribution from the one or more transport channels in accordance with a first direction information associated with a first one of the at least two relevant audio objects and in accordance with a second direction information associated with a second one of the at least two relevant audio objects, or so that a contribution from the one or more transport channels in accordance with a first direction information associated with a first one of the at least two relevant audio objects and in accordance with a second direction information associated with a second one of the at least two relevant audio objects is accounted for,when said computer program is run by a computer.
  • 32. An encoded audio signal comprising information on the parameter data for at least two relevant audio objects for one or more frequency bins of a plurality of frequency bins related to a time frame, wherein a number of the at least two relevant audio objects is a selection from a total number of audio objects, wherein the total number of audio objects is not indicated as being relevant.
  • 33. The encoded audio signal of claim 32, further comprising: one or more encoded transport channels,as the information on the parameter data, two or more encoded object identifications for the relevant audio objects for each one of the one or more frequency bins of the plurality of frequency bins in a time frame, and one or more encoded combined values or encoded amplitude related measures, andquantized and encoded direction data for each audio object in the time frame, the direction data being constant for all frequency bins of the one or more frequency bins.
Priority Claims (3)
Number Date Country Kind
20201633.3 Oct 2020 EP regional
20215651.9 Dec 2020 EP regional
21184367.7 Jul 2021 EP regional
Continuations (1)
Number Date Country
Parent PCT/EP2021/078217 Oct 2021 WO
Child 18296523 US