The invention relates to the field of audio signal processing. More specifically, the invention relates to apparatuses and methods for encoding and decoding a multichannel audio signal on the basis of the Karhunen-Loève Transform (KLT).
In the field of multichannel spatial audio coding the two following challenges will likely become more prominent in the future: (i) processing an input audio signal with an arbitrary number of recorded audio channels and (ii) handling a plurality of arbitrarily placed microphones, in particular with respect to angles. One reason for this development is the current trend of providing more and more advanced audio recording devices, such as the Eigenmike. Moreover, another current trend is the use of various conventional recording devices at the same time for producing a multichannel audio signal. Thus, there is a need for a generic audio coding scheme that is able to meet the challenges mentioned above.
Currently, activities in multichannel audio coding for streaming and storage purposes are gaining popularity due to the many possible new applications in the field of immersive sound, such as applications for cinemas, virtual reality, telepresence and the like. Exemplary current multichannel audio codecs are Dolby Atmos using a multichannel object based coding, MPEG-H 3D Audio, which incorporates channel objects and Ambisonics-based coding. These current existing multichannel codecs, however, are still limited to some specific numbers of audio channel, such as 5.1, 7.1 or 22.2 channels, as required by industrial standards, such as ITU-R BS.2159-4.
Thus, there is a need for an improved generic audio coding scheme allowing, in particular to process audio signals with an arbitrary number of audio channels as well as multichannel audio signals acquired on the basis of arbitrary arrangements of the audio recording devices.
It is an object of embodiments of the invention to provide improved apparatuses and methods for encoding and decoding a multichannel audio signal.
The foregoing and other objects are achieved by the subject matter of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.
According to a first aspect the invention relates to an apparatus for encoding an input audio signal, wherein the input audio signal is a multichannel audio signal, i.e. comprises a plurality of input audio channels. The apparatus comprises a pre-processor based on the Karhunen-Loève transformation (KLT), i.e. a KLT-based pre-processor. The KLT-based pre-processor is configured to transform the plurality of input audio channels into a plurality of eigenchannels (also referred to as transform coefficients) and to provide metadata associated with the plurality of eigenchannels, wherein each eigenchannel is associated with an eigenvalue and an eigenvector and wherein the metadata allows reconstructing the plurality of input audio channels on the basis of the plurality of eigenchannels. The apparatus further comprises a selector configured to select a subset of the plurality of eigenvectors corresponding to a plurality of selected eigenchannels on the basis of a geometric mean of the eigenvalues and an eigenchannel encoder configured to encode the plurality of selected eigenchannels. Moreover, the apparatus may comprise a metadata encoder configured to encode the metadata. The selector can be implemented as part of the KLT-based pre-processor.
In a first implementation form of the apparatus according to the first aspect as such the number P of selected eigenchannels is less than or equal to the number Q of input audio channels.
In a second implementation form of the apparatus according to the first aspect as such or the first implementation form thereof, the metadata comprises one or more of the following: a covariance matrix associated with the plurality of input audio channels and eigenvectors of a covariance matrix associated with the plurality of input audio channels.
In a third implementation form of the apparatus according to the first aspect as such or the first or second implementation form thereof, the selector is configured to select a subset of the plurality of eigenvectors by selecting those eigenvectors that have eigenvalues that are greater than the geometrical mean of the eigenvalues that are greater than a first threshold value. In an implementation form the first threshold value is zero or approximately zero.
In a fourth implementation form of the apparatus according to the third implementation form of the first aspect, the selector is configured to select a subset of the plurality of eigenvectors by selecting only the eigenvector with the largest eigenvalue if the absolute difference between the geometric mean of the eigenvalues that are greater than the first threshold value and the arithmetic mean of the eigenvalues that are greater than the first threshold value is less than a second threshold value.
In a fifth implementation form of the apparatus according to the fourth implementation form of the first aspect, the input audio signal comprises a plurality of frequency bands and the selector is configured to allow the second threshold value to be different for different frequency bands. I.e., each of the frequency bands can have its own threshold value. In an implementation form each frequency band can be divided into a plurality of frequency bins, wherein the second threshold value can be different for different frequency bins.
In a sixth implementation form of the apparatus according to the first aspect as such or any one of the first to fifth implementation form thereof, the selector is further configured to normalize the eigenvalues that are greater than the first threshold value on the basis of the smallest eigenvalue that is greater than the first threshold value.
In a seventh implementation form of the apparatus according to the first aspect as such or any one of the first to sixth implementation form thereof, the apparatus further comprises a control unit configured to choose on the basis of a pre-defined bitrate threshold between a first encoding mode and a second encoding mode, wherein in the first encoding mode the input audio signal is encoded by encoding the plurality of selected eigenchannels and the metadata and wherein in the second encoding mode the input audio signal is encoded by encoding the plurality of input audio channels.
In an eighth implementation form of the apparatus according to the seventh implementation form of the first aspect, the control unit is configured to estimate a bitrate associated with encoding the plurality of selected eigenchannels and the metadata and to choose the first encoding mode if the estimated bitrate is less than the pre-defined bitrate threshold.
According to a second aspect the invention relates to an apparatus for decoding an input audio signal, wherein the input audio signal comprises a plurality of encoded eigenchannels and encoded metadata. The apparatus comprises an eigenchannel decoder configured to decode the plurality of encoded eigenchannels, wherein each eigenchannel is associated with an eigenvalue and an eigenvector, a metadata decoder configured to decode the encoded metadata, a selector configured to select a subset of the plurality of eigenvectors on the basis of a geometric mean of the eigenvalues, and a KLT-based post-processor configured to transform the decoded eigenchannels into a plurality of output audio channels on the basis of the selected eigenvectors.
According to a first implementation form of the apparatus according to the second aspect as such, the selector is configured to select a subset of the plurality of eigenvectors by selecting the eigenvectors that have eigenvalues that are greater than the geometrical mean of the eigenvalues that are greater than a first threshold value.
Further implementation forms of the decoding apparatus according to the second aspect of the invention follow directly from the corresponding implementation forms of the encoding apparatus according to the first aspect of the invention.
According to a third aspect the invention relates to a method for encoding an input audio signal, wherein the input audio signal comprises a plurality of input audio channels. The method comprises the steps of transforming the plurality of input audio channels into a plurality of eigenchannels and providing metadata associated with the plurality of eigenchannels, wherein each eigenchannel is associated with an eigenvalue and an eigenvector and wherein the metadata allows reconstructing the plurality of input audio channels on the basis of the plurality of eigenchannels, selecting a subset of the plurality of eigenchannels on the basis of a geometric mean of the eigenvalues, encoding the plurality of selected eigenchannels, and encoding the metadata.
The encoding method according to the third aspect of the invention can be performed by the encoding apparatus according to the first aspect of the invention. Further features of the encoding method according to the third aspect of the invention result directly from the functionality of the encoding apparatus according to the first aspect of the invention and its different implementation forms.
According to a fourth aspect the invention relates to a method for decoding an input audio signal, wherein the input audio signal comprises a plurality of encoded eigenchannels and encoded metadata. The method comprises the steps of decoding the plurality of encoded eigenchannels, wherein each eigenchannel is associated with an eigenvalue and an eigenvector, decoding the encoded metadata, selecting a subset of the plurality of eigenvectors on the basis of a geometric mean of the eigenvalues, and transforming the decoded eigenchannels into a plurality of output audio channels on the basis of the selected eigenvectors.
The decoding method according to the fourth aspect of the invention can be performed by the decoding apparatus according to the second aspect of the invention. Further features of the decoding method according to the fourth aspect of the invention result directly from the functionality of the decoding apparatus according to the second aspect of the invention and its different implementation forms.
According to a fifth aspect the invention relates to a computer program comprising program code for performing the encoding method according to the third aspect of the invention or the decoding method according to the fourth aspect of the invention when executed on a computer.
The invention can be implemented in hardware and/or software.
Further embodiments of the invention will be described with respect to the following figures, wherein:
In the various figures, identical reference signs will be used for identical or at least functionally equivalent features.
In the following description, reference is made to the accompanying drawings, which form part of the disclosure, and in which are shown, by way of illustration, specific aspects in which the invention may be placed. It will be appreciated that the invention may be placed in other aspects and that structural or logical changes may be made without departing from the scope of the invention. The following detailed description, therefore, is not to be taken in a limiting sense, as the scope of the invention is defined by the appended claims.
For instance, it will be appreciated that a disclosure in connection with a described method will generally also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if a specific method step is described, a corresponding device may include a unit to perform the described method step, even if such unit is not explicitly described or illustrated in the figures.
Moreover, in the following detailed description as well as in the claims, embodiments with functional blocks or processing units are described, which are connected with each other or exchange signals. It will be appreciated that the invention also covers embodiments which include additional functional blocks or processing units that are arranged between the functional blocks or processing units of the embodiments described below.
Finally, it is understood that the features of the various exemplary aspects described herein may be combined with each other, unless specifically noted otherwise.
The apparatus 110 for encoding an input audio signal consisting of Q input audio channels comprises a KLT-based pre-processor 111 configured to transform the Q input audio channels into a P eigenchannels and to provide metadata associated with the P eigenchannels, which allows reconstructing the Q input audio channels on the basis of the P eigenchannels. Each eigenchannel is associated with an eigenvalue and an eigenvector. In an embodiment, the metadata can comprise the non-redundant elements of a covariance matrix associated with the Q input audio channels and/or the eigenvectors of the covariance matrix associated with the Q input audio channels.
The apparatus 110 further comprises a selector 114, embodiments of which will be described in more detail under reference to
Moreover, the apparatus 110 comprises an eigenchannel encoder 113 configured to encode the P eigenchannels selected by the selector 114 on the basis of a geometric mean of the eigenvalues as well as a metadata encoder 115 configured to encode the metadata provided by the KLT-based pre-processor 111.
As can be taken from
The unit 112 for covariance and subspace estimation provides the Q eigenvectors determined on the basis of the Q input audio channels to the selector 114. As already described above, the selector 114 is configured to select P selected eigenvectors from the Q eigenvectors on the basis of a geometric mean of the eigenvalues. A process for selecting the P eigenvectors on the basis of a geometric mean of the eigenvalues, which in an embodiment is implemented in the selector 114, will be described in the context of
In a step 303 the selector 114, 124 determines the minimum “non-zero” eigenvalue and sets the index m of this eigenvalue as the maximum index (m<=Q) and as the maximum dimension of eigenvalues. In an embodiment, the selector 114, 124 can be configured to determine the minimum “non-zero” eigenvalue by determining the smallest eigenvalue that is greater than or equal to a first positive non-zero threshold value T1.
In a step 305 the selector 114, 124 discards the eigenvalues that have indices larger than m and which therefore are less than the first threshold value T1, i.e. zero or close to zero.
In a step 307 the selector 114, 124 can normalize the remaining m eigenvalues on the basis of the smallest remaining eigenvalue λm resulting in m normalized eigenvalues
In a step 309a and a step 309b the selector 114, 124 can determine the arithmetic mean μλ and the geometric mean ηλ of the m normalized eigenvalues, respectively.
In a step 311 the selector 114, 124 checks whether the absolute difference between the arithmetic mean μλ and the geometric mean ηλ of the m normalized eigenvalues is less than a second threshold value T. If this is the case the selector 114, 124 will select one eigenvalue (and the corresponding eigenvector), namely the largest eigenvalue (see steps 313, 321 and 323). This makes sure that in case the eigenvalues are very similar at least one eigenvalue (and the corresponding eigenvector and eigenchannel) is selected by the selector 114, 124.
In case the selector 114, 124 determines in step 311 that the absolute difference between the arithmetic mean μλ and the geometric mean ηλ of the m normalized eigenvalues is not less than the second threshold value T (which implies that the eigenvalues are significantly different), the selector 114, 124 enters the loop consisting of the steps 315, 317 and 319. The loop starts from the largest normalized eigenvalue
In an embodiment, the selection process shown in
In an embodiment, the control unit 119 is configured to choose on the basis of a pre-defined bitrate threshold between the first encoding mode and the second encoding mode. In an embodiment, the control unit 119 is configured to estimate a bitrate associated with encoding the P selected eigenchannels and the metadata and to choose the first encoding mode if the estimated bitrate is less than the pre-defined bitrate threshold.
More specifically, in the embodiment shown in
In an embodiment, current state of the art encoders, which generally support mono or stereo channels input and are known to deliver excellent audio quality, can be used for the eigenchannel encoder 113 and/or the baseline encoder 113′. Moreover, currently available proprietary multichannel audio codecs can be implemented in the eigenchannel encoder 113 and/or the baseline encoder 113′ as well.
For illustrating the control unit 119 of the encoding apparatus 110 shown in
In a first scenario the control unit 119 is configured to select the encoding scheme from the first encoding scheme and the second encoding scheme, which provides the best quality, while keeping the overall bitrate below the maximum transmission rate. To this end, the control unit 119, firstly, calculates the baseline maximum bitrate per channel: 1.2 Mbps/32 channels=37.5 kbps per channel. Since this bitrate is not supported, the bitrate of 32 kbps per channel is taken, resulting in 32 kbps*32 channels=1.024 Mbps baseline maximum bitrate. Based on the output of KLT-based pre-processor 111, which outputs the number P as well as metadata bitrate estimates, the control unit 119 calculates the corresponding KLT dedicated audio bitrate per channel: (1.2 Mbps−Metadata bitrate)/P=X Mbps/channel. Thus, in an embodiment the control unit 119 will choose KLT-based encoding (i.e. node B) if X is greater than or equal to the calculated baseline maximum bitrate per channel, i.e., 32 kbps/channel.
In a second scenario the control unit 119 is configured to select the encoding scheme from the first encoding scheme and the second encoding scheme, which provides the lowest possible bitrate achievable given the quality set by the acceptable baseline quality. Firstly, since the lowest acceptable baseline quality bitrate is 16 kbps, the control unit 119 determines the following bitrate: 16 kbps*32 channels=512 kbps baseline maximum bitrate. Based on the output of KLT-based pre-processer 111, which outputs the number P and metadata bitrate estimates, the control unit 119 calculates the corresponding overall KLT-based bitrate: 16 kbps*P+Metadata bitrate=X Mbps/channel. Thus, in an embodiment the control unit 119 will choose KLT-based encoding (i.e. node B) if X is lower than or equal to the calculated baseline maximum bitrate, i.e., 512 kbps.
While a particular feature or aspect of the disclosure may have been disclosed with respect to only one of several implementations or embodiments, such feature or aspect may be combined with one or more other features or aspects of the other implementations or embodiments as may be desired and advantageous for any given or particular application. Furthermore, to the extent that the terms “include”, “have”, “with”, or other variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprise”. Also, the terms “exemplary”, “for example” and “e.g.” are merely meant as an example, rather than the best or optimal. The terms “coupled” and “connected”, along with derivatives may have been used. It should be understood that these terms may have been used to indicate that two elements cooperate or interact with each other regardless whether they are in direct physical or electrical contact, or they are not in direct contact with each other.
Although specific aspects have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that a variety of alternate and/or equivalent implementations may be substituted for the specific aspects shown and described without departing from the scope of the present disclosure. This application is intended to cover any adaptations or variations of the specific aspects discussed herein.
Although the elements in the following claims are recited in a particular sequence with corresponding labeling, unless the claim recitations otherwise imply a particular sequence for implementing some or all of those elements, those elements are not necessarily intended to be limited to being implemented in that particular sequence.
Many alternatives, modifications, and variations will be apparent to those skilled in the art in light of the above teachings. Of course, those skilled in the art readily recognize that there are numerous applications of the invention beyond those described herein. While the invention has been described with reference to one or more particular embodiments, those skilled in the art recognize that many changes may be made thereto without departing from the scope of the invention. It is therefore to be understood that within the scope of the appended claims and their equivalents, the invention may be practiced otherwise than as specifically described herein.
This application is a continuation of International Application No. PCT/EP2016/065395, filed on Jun. 30, 2016, the disclosure of which is hereby incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
20070297499 | de Victoria | Dec 2007 | A1 |
20150154971 | Boehm | Jun 2015 | A1 |
20150221313 | Purnhagen | Aug 2015 | A1 |
20160148618 | Huang | May 2016 | A1 |
20160155448 | Purnhagen | Jun 2016 | A1 |
Entry |
---|
Torres-Guijarro et al., “Multichannel Audio Decorrelation for Coding,” Proc. of the 6th Int. Conference on Digital Audio Effects(DAFX-03), London, UK, XP055339531 (2003). |
Valjamae “A feasibility study regarding implementation of holographic audio rendering techniques over broadcast networks,” XP002529548 (Apr. 15, 2003). |
Yang et al., “An Exploration of Karhunen-Loeve Transtomi for Muitichannel Audio Coding,” XP055339543 (2000). |
Yang et al., “High-Fidelity Multichannel Audio Coding With Karhunen-Loève Transform,” IEEE Transactions on Speech and Audio Processing, vol. 11, No. 4, pp. 365-380, Institute of Electrical and Electronics Engineers, New York, New York (Jul. 2003). |
“Frequently asked Questions about Dolby Digital,” Dolby, pp. 1-16 (2000). |
Valin et al., “High-Quality, Low-Delay Music Coding in the Opus Codec,” 135th AES Convention, New York, USA, Audio Engineering Society (Oct. 17-20, 2013). |
Neuendorf et al., “The ISO/MPEG Unified Speech and Audio Coding Standard—Consistent High Quality for all Content Types and at all Bit Rates,” J. Audio Eng. Soc., vol. 61, No. 12, pp. 956-977 (Dec. 2013). |
“Figures” 3GPP TS 26.445 V13.1.0, pp. 1-15, 3rd Generation Partnership Project, Valbonne, France (Mar. 2016). |
“3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Codec for Enhanced Voice Services (EVS); Detailed Algorithmic Description(Release 13),” 3GPP TS 26.445 V13.1.0, pp. 1-655, 3rd Generation Partnership Project, Valbonne, France (Mar. 2016). |
“Multichannel sound technology in home and broadcasting applications,” Report ITU-R BS. 2159-4, BS Series, Broadcasting service(sound), International Telecommunication Union, Geneva, Switzerland (May 2012). |
“Em32 Eigenmike® microphone array release notes (v18.0), Notes for setting up and using the mh acoustics em32 Eigenmike® microphone array,” mh acoustics (Jun. 18, 2014). |
Herre et al., “MPEG-H 3D Audio—The New Standard for Coding of Immersive Spatial Audio,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, No. 5, pp. 770-779, Institute of Electrical and Electronics Engineers, New York, New York (Aug. 2015). |
“Dolby® Atmos® Next-Generation Audio for Cinema,” Issue 3, Dolby (2014). |
Number | Date | Country | |
---|---|---|---|
20190147892 A1 | May 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2016/065395 | Jun 2016 | US |
Child | 16229921 | US |