The present invention is related to audio processing and in particular to a concept for decoding an encoded audio signal using reduced computational resources.
The “Unified speech and audio coding” (USAC) standard [1], standardizes a harmonic bandwidth extension tool, HBE, employing a harmonic transposer, and which is an extension of the spectral band replication (SBR) system, standardized in [1] and [2], respectively.
SBR synthesizes high frequency content of bandwidth limited audio signals by using the given low frequency part together with given side information. The SBR tool is described in [2], enhanced SBR, eSBR, is described in [1]. The harmonic bandwidth extension HBE which employs phase vocoders is part of eSBR and has been developed to avoid the auditory roughness which is often observed in signals subjected to copy-up patching, as it is carried out in the regular SBR processing. The main scope of HBE is to preserve harmonic structures in the synthesized high frequency region of the given audio signal while applying eSBR.
Whereas an encoder can select the usage of the HBE tool, a decoder which is conform to [1] shall provide decoding and applying HBE related data.
Listening tests [3] have shown that using HBE will improve perceptual audio quality of decoded bitstreams according to [1].
The HBE tool replaces the simple copy-up patching of the legacy SBR system by advanced signal processing routines. These necessitate a considerable amount of processing power and memory for filter states and delay lines. On the contrary the complexity of the copy-up patching is negligible.
The observed complexity increase with HBE is not a problem for personal computer devices. However, chip manufactures designing decoder chips are demanding rigid and low complexity constraints regarding computational workload and memory consumption. Otherwise, HBE processing is desired in order to avoid auditory roughness.
USAC-bitstreams are decoded as described in [1]. This implies necessarily the implementation of a HBE decoder tool, as described in [1], 7.5.3. The tool can be signaled in all codec operating points which contain eSBR processing. For decoder devices which fulfill profile and conformance criteria of [1] this means that the overall worst case of computational workload and memory consumption increases significantly.
The actual increase in computational complexity is implementation and platform dependent. The increase in memory consumption per audio channel is, in the current memory optimized implementation, at least 15 kWords for the actual HBE processing.
According to an embodiment, an apparatus for decoding an encoded audio signal having bandwidth extension control data indicating either a first harmonic bandwidth extension mode or a second non-harmonic bandwidth extension mode may have: an input interface for receiving the encoded audio signal having the bandwidth extension control data indicating either the first harmonic bandwidth extension mode or the second non-harmonic bandwidth extension mode; a processor for decoding the audio signal using the second non-harmonic bandwidth extension mode; and a controller for controlling the processor to decode the audio signal using the second non-harmonic bandwidth extension mode, even when the bandwidth extension control data indicates the first harmonic bandwidth extension mode for the encoded signal.
According to an embodiment, a method of decoding an encoded audio signal having bandwidth extension control data indicating either a first harmonic bandwidth extension mode or a second non-harmonic bandwidth extension mode may have the steps of: receiving the encoded audio signal having the bandwidth extension control data indicating either the first harmonic bandwidth extension mode or the second non-harmonic bandwidth extension mode; decoding the audio signal using the second non-harmonic bandwidth extension mode; controlling the decoding of the audio signal so that the second non-harmonic bandwidth extension mode is used in the decoding, even when the bandwidth extension control data indicates the first harmonic bandwidth extension mode for the encoded signal.
An embodiment may have a non-transitory digital storage medium having a computer program stored thereon to perform the method of decoding an encoded audio signal having bandwidth extension control data indicating either a first harmonic bandwidth extension mode or a second non-harmonic bandwidth extension mode, having the steps of: receiving the encoded audio signal having the bandwidth extension control data indicating either the first harmonic bandwidth extension mode or the second non-harmonic bandwidth extension mode; decoding the audio signal using the second non-harmonic bandwidth extension mode; and controlling the decoding of the audio signal so that the second non-harmonic bandwidth extension mode is used in the decoding, even when the bandwidth extension control data indicates the first harmonic bandwidth extension mode for the encoded signal, when said computer program is run by a computer.
The present invention is based on the finding that an audio decoding concept necessitating reduced memory resources is achieved when an audio signal consisting of portions to be decoded using an harmonic bandwidth extension mode and additionally containing portions to be decoded using a non-harmonic bandwidth extension mode is decoded, throughout the whole signal, with the non-harmonic bandwidth extension mode only. In other words, even when a signal comprises portions or frames which are signaled to be decoded using a harmonic bandwidth extension mode, these portions or frames are nevertheless decoded using the non-harmonic bandwidth extension mode. To this end, a processor for decoding the audio signal using the non-harmonic bandwidth extension mode is provided and additionally a controller is implemented within the apparatus or a controlling step is implemented within a method for decoding for controlling the processor to decode the audio signal using the second non-harmonic bandwidth extension mode even when the bandwidth extension control data included in the encoded audio signal indicates the first—i.e. harmonic—bandwidth extension mode for the audio signal. Thus, the processor only has to be implemented with corresponding hardware resources such as memory and processing power to only cope with the computationally very efficient non-harmonic bandwidth extension mode. On the other hand, the audio decoder is nevertheless in the position to accept and decode an encoded audio signal necessitating a harmonic bandwidth extension mode with an acceptable quality. Stated differently, for low computational resource demanding applications, the controller is configured for controlling the processor to decode the whole audio signal with the non-harmonic bandwidth extension mode, even though the encoded audio signal itself necessitates, due to the included bandwidth extension control data, that at least several portions of this signal are decoded using the harmonic bandwidth extension mode. Thus, a good compromise between computational resources on the one hand and audio quality on the other hand is obtained, while the full backward compatibility is maintained to encoded audio signals necessitating both bandwidth extension modes. The present invention is advantageous due to the fact that it lowers the computational complexity and memory demand of particularly a USAC decoder. Furthermore, in embodiments, the predetermined or standardized non-harmonic bandwidth extension mode is modified using harmonic bandwidth extension mode data transmitted in the bitstream in order to reuse bandwidth extension mode data which are basically not necessary for the non-harmonic bandwidth extension mode as far as possible in order to even improve the audio quality of the non-harmonic bandwidth extension mode. Thus, an alternative decoding scheme is provided in this embodiment, in order to mitigate the impairment of perceptual quality caused by omitting the harmonic bandwidth extension mode which is typically based on phase-vocoder processing as discussed in the USAC standard [1].
In an embodiment, the processor has memory and processing resources being sufficient for decoding the encoded audio signal using the second non-harmonic bandwidth extension mode, wherein the memory or processing resources are not sufficient for decoding the encoded audio signal using the first harmonic bandwidth extension mode, when the encoded audio signal is an encoded stereo or multichannel audio signal. Contrary thereto the processor has memory and processing resources being sufficient for decoding the encoded audio signal using the second non-harmonic bandwidth extension mode and using the first harmonic bandwidth extension mode, when the encoded audio signal is an encoded mono signal, since the resources for mono decoding are reduced compared to the resources for stereo or multichannel decoding. Hence, the available resources depend on the bit-stream configuration, i.e. combination of tools, sampling rate etc. For example it may be possible that resources are sufficient to decode a mono bit-stream using harmonic BWE but the processor lacks resources to decode a stereo bit-stream using harmonic BWE.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
In particular, the additional line in the high level syntax indicated at 600, 700, 702, 704 indicates that irrespective of the value sbrPatchingMode as read from the bitstream in 602, the sbrPatchingMode flag is nevertheless set to one, i.e. signaling, to the further process in the decoder, that a non-harmonic bandwidth extension mode is to be performed. Importantly, the syntax line 600 is placed subsequent to the decoder-side reading in of the specific harmonic bandwidth extension data consisting of sbrOversampllingFlag, sbrPitchInBinsFlag and sbrPitchInBins indicated at 604. Thus, as illustrated in
As indicated in
The additional harmonic bandwidth extension data is, in the USAC example, as discussed in the context of
Specifically, as indicated in the USAC standard, the data sbrPitchInBins controls the addition of cross-product terms in the SBR harmonic transposer. sbrPitchInBins is an integer value in the range between 0 and 127 and represents the distance measured in frequency bins for a 1536-DFT acting on the sampling frequency of the core coder. In particular, it has been found that using the sbrPitchInBins information, the pitch or harmonic grid can be determined. This is illustrated in the formula (1) in
Naturally, other indications of the harmonic grid, the pitch or the fundamental tone defining the harmonic grid can be included in the bitstream. This data is used for controlling the first harmonic bandwidth extension mode and can, in one embodiment of the present invention, be discarded so that the non-harmonic bandwidth extension mode without any modifications is performed. In other embodiments, however, the straightforward non-harmonic bandwidth extension mode is modified using the control data for the harmonic bandwidth extension mode as illustrated in
In the further embodiment, the additional payload data 304 for the first harmonic bandwidth extension mode comprises information on a harmonic characteristic of the encoded audio signal, and this harmonic characteristic can be sbrPitchInBins data, other harmonic grid data, fundamental tone data or any other data, from which a harmonic grid or a fundamental tone or a pitch of the corresponding portion of the encoded audio signal can be derived. The controller 104 is configured for modifying a patching buffer content of a patching buffer used by the processor 102 to perform a patching operation in decoding the encoded audio signal so that a harmonic characteristic of a patch signal is closer to the harmonic characteristic than a signal patched without modifying the patching buffer.
To this end, reference is made to
In step 314, the patching source band is modified within the frequency borders, i.e. the patch borders of the patch source are not changed compared to the transmitted data. This can be done either before patching, i.e. when the patch data is with respect to the core or decoded spectrum before patching indicated at 902 or when the patch content has already been transposed into the higher frequency range, i.e. as illustrated in
This patching 914 or “copy-up”, is a non-harmonic patching which can be seen in
The modification is performed in such a way that a frequency portion in the patching source band coinciding with the harmonic grid is located, after patching, in a target frequency portion coinciding with the harmonic grid.
Preferably, as illustrated in
To this end, reference is made to
As already discussed in the context of
The input interface 100 of
In an embodiment of the present invention, and as illustrated by the syntax changes in
Thus, the present invention provides a lower computational complexity and lower memory consumption necessitating processor together with a new decoding procedure. The bitstream syntax of eSBR as defined in [1] shares a common base for both HBE [1] and legacy SBR decoding [2]. In case of HBE, however, additional information is encoded into the bitstream. The “low complexity HBE” decoder in an embodiment of the present invention decodes the USAC encoded data according to [1] and discards all HBE specific information. Remaining eSBR data is then fed to and interpreted by the legacy SBR [2] algorithm, i.e. the data is used to apply copy-up patching [2] instead of harmonic transposition. The modification of the eSBR decoding mechanics is, with respect to the syntax changes, illustrated in
With legacy USAC encoded bitstream data the sbrPitchInBins value might be transmitted within a USAC frame. This value reflects a frequency value which was determined by an encoder to transmit information describing the harmonic structure of the current USAC frame. In order to exploit this value without using the standard HBE functionality, the following inventive method should be applied step by step:
The flowchart in
Subsequently,
To summarize, as illustrated in
In the QMF buffer shift, the harmonic structure of the signal is reconstructed according to the transmitted sbrPitchInBins information even though a non-harmonic bandwidth extension procedure has been performed.
Although some aspects have been described in the context of an apparatus for encoding or decoding, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a non-transitory storage medium such as a digital storage medium, for example a floppy disc, a Hard Disk Drive (HDD), a DVD, a Blu-Ray, a CD, a ROM, a PROM, and EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may, for example, be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive method is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitory.
A further embodiment of the invention method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example, via the internet.
A further embodiment comprises a processing means, for example, a computer or a programmable logic device, configured to, or adapted to, perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver .
In some embodiments, a programmable logic device (for example, a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods may be performed by any hardware apparatus.
While this invention has been described in terms of several advantageous embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
[1] ISO/IEC 23003-3:2012: “Unified speech and audio coding”
[2] ISO/IEC 14496-3:2009: “Audio”
[3] ISO/IEC JTCI/SC29/WG11 MPEG2011/N12232: “USAC Verification Test Report”
Number | Date | Country | Kind |
---|---|---|---|
13196305 | Dec 2013 | EP | regional |
This application is a continuation of U. S. patent application Ser. No. 15/177,265, filed Jun. 8, 2016, which is a continuation of International Application No. PCT/EP2014/076000, filed Nov. 28, 2014, which is incorporated herein by reference in its entirety, and additionally claims priority from European Application No. EP 13196305.0, filed Dec. 9, 2013, which is incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
20020143527 | Gao et al. | Oct 2002 | A1 |
20110216918 | Nagel et al. | Sep 2011 | A1 |
20120010880 | Nagel | Jan 2012 | A1 |
20120158409 | Nagel et al. | Jun 2012 | A1 |
20130090934 | Nagel et al. | Apr 2013 | A1 |
Number | Date | Country |
---|---|---|
2169670 | Mar 2010 | EP |
2011520146 | Jul 2011 | JP |
2011109670 | Sep 2012 | RU |
Entry |
---|
Song J et al.; Enhanced Long-Term Predictor for Unified Speech and Audio Coding; 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); Year 2011, pp. 504-508. |
“Study on ISO/IEC 23003-3:201x/DIS of Unified Speech and Audio Coding”, ISO/IEC JTC1/SC29/WG11 N12013; Coding of Moving Pictures and Audio; International Organisation for Standardisation; Geneva, Switzerland, Mar. 2011, 274 pages. |
ISO/IEC FDIS 23003-3:2011(E); “Information technology—MPEG audio technologies—Part 3: Unified speech and audio coding”, ISO/IEC JTC 1/SC 29/WG 11; STD Version 2.1c2, Sep. 20, 2011, 291 pages. |
“USAC Verification Test Report”, ISO/IEC JTC1/SC29/WG11 MPEG2011/N12232; Coding of Moving Pictures and Audio; Torino, Italy, Jul. 2011, pp. 1-41. |
ISO/IEC 14496-3:2009(E), “Information technology—Coding of audio-visual objects, Part 3 Audio”, International Standard, Forth edition, 2009, 1416 pages. |
Liu, et al., “Blind bandwidth extension of audio signals based on harmonic mapping in phase space”, IEEE, Feb. 2, 2013, pp. 454-458. |
Nagel, F. et al., “A Continuous Modulated Single Sideband Bandwidth Extension”, IEEE Intl Conf. on Acoustics, Speech and Signal Processing, Mar. 2010, pp. 257-360. |
Number | Date | Country | |
---|---|---|---|
20170278522 A1 | Sep 2017 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15177265 | Jun 2016 | US |
Child | 15621938 | US | |
Parent | PCT/EP2014/076000 | Nov 2014 | US |
Child | 15177265 | US |