The present invention is concerned with an audio codec supporting time-domain and frequency-domain coding modes.
Recently, the MPEG USAC codec has been finalized. USAC (Unified speech and audio coding) is a codec which codes audio signals using a mix of AAC (Advanced audio coding), TCX (Transform Coded Excitation) and ACELP (Algebraic Code-Excited Linear Prediction). In particular, MPEG USAC uses a frame length of 1024 samples and allows switching between AAC-like frames of 1024 or 8×128 samples, TCX 1024 frames or within one frame a combination of ACELP frames (256 samples), TCX 256 and TCX 512 frames.
Disadvantageously, the MPEG USAC codec is not suitable for applications necessitating low delay. Two-way communication applications, for example, necessitate such short delays. Owing to the USAC frame length of 1024 samples, USAC is not a candidate for these low delay applications.
In WO 2011147950, it has been proposed to render the USAC approach suitable for low-delay applications by restricting the coding modes of the USAC codec to TCX and ACELP modes, only. Further, it has been proposed to make the frame structure finer so as to obey the low-delay requirement imposed by low-delay applications.
However, there is still a need for providing an audio codec enabling low coding delay at an increased efficiency in terms of rate/distortion ratio. Advantageously, the codec should be able to efficiently handle audio signals of different types such as speech and music.
Thus, it is an objective of the present invention to provide an audio codec offering low-delay for low-delay applications, but at an increased coding efficiency in terms of, for example, rate/distortion ratio compared to USAC.
According to an embodiment, an audio decoder may have: a time-domain decoder; a frequency-domain decoder; and an associator configured to associate each of consecutive frames of a data stream, each of which represents a corresponding one of consecutive portions of an audio signal, with one out of a mode dependent set of a plurality of frame coding modes, wherein the time-domain decoder is configured to decode frames having one of a first subset of one or more of the plurality of frame coding modes associated therewith, and the frequency-domain decoder is configured to decode frames having one of a second subset of one or more of the plurality of frame coding modes associated therewith, the first and second subsets being disjoint to each other, and wherein the associator is configured to perform the association dependent on a frame mode syntax element associated with the frames in the data stream, and operate in an active one of a plurality of operating modes with selecting the active operating mode out of the plurality of operating modes depending on the data stream and/or an external control signal, and changing the dependency of the performance of the association depending on the active operating mode.
According to another embodiment, an audio encoder may have: a time-domain encoder; a frequency-domain encoder; and an associator configured to associate each of consecutive portions of an audio signal with one out of a mode dependent set of a plurality of frame coding modes, wherein the time-domain encoder is configured to encode portions having one of a first subset of one or more of the plurality of frame coding modes associated wherewith, into a corresponding frame of a data stream, and wherein the frequency-domain encoder is configured to encode portions having one of a second subset of one or more of the plurality of encoding modes associated therewith, into a corresponding frame of the data stream, and wherein the associator is configured to operate in an active one of a plurality of operating modes such that, if the active operating mode is a first operating mode, the mode dependent set of the plurality of frame coding modes is disjoint to the first subset and overlaps with the second subset and if the active operating mode is a second operating mode, the mode dependent set of the plurality of encoding modes overlaps with the first and second subset.
According to another embodiment, an audio decoding method using a time-domain decoder, and a frequency-domain decoder, may have the steps of: associating each of consecutive frames of a data stream, each of which represents a corresponding one of consecutive portions of an audio signal, with one out of a mode dependent set of a plurality of frame coding modes; decoding frames having one of a first subset of one or more of the plurality of frame coding modes associated therewith, by the time-domain decoder; and decoding frames having one of a second subset of one or more of the plurality of frame coding modes associated therewith, by the frequency-domain decoder, the first and second subsets being disjoint to each other, wherein the association is dependent on a frame mode syntax element associated with the frames in the data stream, and wherein the association is performed in an active one of a plurality of operating modes with selecting the active operating mode out of the plurality of operating modes depending on the data stream and/or an external control signal, such that the dependency of the performance of the association changes depending on the active operating mode.
According to still another embodiment, an audio encoding method using a time-domain encoder and a frequency-domain encoder may have the steps of: associating each of consecutive portions of an audio signal with one out of a mode dependent set of a plurality of frame coding modes; encoding portions having one of a first subset of one or more of the plurality of frame coding modes associated wherewith, into a corresponding frame of a data stream by the time-domain encoder; and encoding portions having one of a second subset of one or more of the plurality of encoding modes associated therewith, into a corresponding frame of the data stream by the frequency-domain encoder, wherein the association is performed in an active one of a plurality of operating modes such that, if the active operating mode is a first operating mode, the mode dependent set of the plurality of frame coding modes is disjoint to the first subset and overlaps with the second subset and if the active operating mode is a second operating mode, the mode dependent set of the plurality of encoding modes overlaps with the first and second subset.
Another embodiment may have a computer program having a program code for performing, when running on a computer, an audio decoding method or an audio encoding method as mentioned above.
A basic idea underlying the present invention is that an audio codec supporting both, time-domain and frequency-domain coding modes, which has low-delay and an increased coding efficiency in terms of rate/distortion ratio, may be obtained if the audio encoder is configured to operate in different operating modes such that if the active operating mode is a first operating mode, a mode dependent set of available frame coding modes is disjoined to a first subset of time-domain coding modes, and overlaps with a second subset of frequency-domain coding modes, whereas if the active operating mode is a second operating mode, the mode dependent set of available frame coding modes overlaps with both subsets, i.e. the subset of time-domain coding modes as well as the subset of frequency-domain coding modes. For example, the decision as to which of the first and second operating mode is accessed, may be performed depending on an available transmission bitrate for transmitting the data stream. For example, the decision's dependency may be such that the second operating mode is accessed in case of lower available transmission bitrates, while the first operating mode is accessed in case of higher available transmission bitrates. In particular, by providing the encoder with the operating modes, it is possible to prevent the encoder from choosing any time-domain coding mode in case of the coding circumstances, such as determined by the available transmission bitrates, being such that choosing any time-domain coding mode would very likely yield coding efficiency loss when considering the coding efficiency in terms of rate/distortion ratio on a long-term basis. To be more precise, the inventors of the present application found out that suppressing the selection of any time-domain coding mode in case of (relative) high available transmission bandwidth results in a coding efficiency increase: while, on a short-term basis, one may assume that a time-domain coding mode may currently be of advantage compared to the frequency-domain coding modes, it is very likely that this assumption turns out to be incorrect if analyzing the audio signal for a longer period. Such longer analysis or look-ahead is, however, not possible in low-delay applications, and accordingly, preventing the encoder from accessing any time-domain coding mode beforehand enables the achievement of an increased coding efficiency.
In accordance with an embodiment of the present invention, the above idea is exploited to the extent that the data stream bitrate is further increased: While it is quite bitrate inexpensive to synchronously control the operating mode of encoder and decoder, or does not even cost any bitrate as the synchronicity is provided by some other means, the fact that encoder and decoder operate and switch between the operating modes synchronously may be exploited so as to reduce the signaling overhead for signaling the frame coding modes associated with the individual frames of the data stream in consecutive portions of the audio signal, respectively. In particular, while a decoder's associator may be configured to perform the association of each of the consecutive frames of the data stream with one of the mode-dependent sets of the plurality of frame-coding modes dependent on a frame mode syntax element associated with the frames of the data stream, the associator may particularly change the dependency of the performance of the association depending on the active operating mode. In particular, the dependency change may be such that if the active operating mode is the first operating mode, the mode-dependent set is disjoined to the first subset and overlaps with the second subset, and if the active operating mode is the second operating mode, the mode-dependent set overlaps with both subsets. However, less strict solutions increasing the bitrate are by exploiting knowledge on the circumstances associated with the currently pending operating mode are, however, also feasible.
Embodiments of the present invention are described in more detail below with respect to the figures among which
With regard to the description of the figures it is noted that descriptions of elements in one figure shall equally apply to elements having the same reference sign associated therewith in another figure, as not explicitly taught otherwise.
To be more precise, the associator 16 is connected between an input 28 of decoder 10 on the one hand, and inputs of time-domain decoder 12 and frequency-domain decoder 14 on the other hand so as to provide same with associated frames 18a-c in a manner described in more detail below.
The time-domain decoder 12 is configured to decode frames having one of a first subset 30 of one or more of the plurality 22 of frame-coding modes associated therewith, and the frequency-domain decoder 14 is configured to decode frames having one of a second subset 32 of one or more of the plurality 22 of frame-coding modes associated therewith. The first and second subsets are disjoined to each other as illustrated in
As is shown in
Prior to further prosecuting with the description of the embodiment of
As will be outlined in more detail below, the associator 16 is configured to perform the association of the consecutive frames 18a-c of the data stream 20 with the frame-coding modes A-C in a manner which avoids the usage of a time-domain coding mode in cases where the usage of such time-domain coding mode is inappropriate such as in cases of high available transmission bitrates where time-domain coding modes are likely to be inefficient in terms of rate/distortion ratio compared to frequency-domain coding modes so that the usage of the time-domain frame-coding mode for a certain frame 18a-18c would very likely lead to a decrease in coding efficiency.
Accordingly, the associator 16 is configured to perform the association of the frames to the frame coding modes dependent on a frame mode syntax element associated with the frames 18a-c in the data stream 20. For example, the syntax of the data stream 20 could be configured such that each frame 18a-c comprises such a frame mode syntax element 38 for determining the frame-coding mode, which the corresponding frame 18a-c belongs to.
Further, the associator 16 is configured to operate in an active one of a plurality of operating modes, or to select a current operating mode out of a plurality of operating modes. Associator 16 may perform this selection depending on the data stream or dependent on an external control signal. For example, as will be outlined in more detail below, the decoder 10 changes its operating mode synchronously to the operating mode change at the encoder and in order to implement the synchronicity, the encoder may signal the active operating mode and the change in the active one of the operating modes within the data stream 20. Alternatively, encoder and decoder 10 may be synchronously controlled by some external control signal such as control signals provided by lower transport layers such as EPS or RTP or the like. The control signal externally provided may, for example, be indicative of some available transmission bitrate.
In order to instantiate or realize the avoidance of inappropriate selections or an inappropriate usage of time-domain coding modes as outlined above, the associator 16 is configured to change the dependency of the performance of the association of the frames 18 to the coding modes depending on the active operating mode. In particular, if the active operating mode is a first operating mode, the mode dependent set of the plurality of frame coding modes is, for example, the one shown at 40, which is disjoint to the first subset 30 and overlaps the second subset 32, whereas if the active operating mode is a second operating mode, the mode dependent set is, for example, as shown at 42 in
That is, in accordance with the embodiment of
In order to explain the change in the dependency of the performance of the association of the associator 16 in more detail, reference is made to
In any case, depending on the way the frame mode syntax element 38 has been inserted into data stream 20, there is a mapping 44 between the frame mode syntax element 38 as contained and transmitted via data stream 20, and a set 46 of possible values of the frame mode syntax element 38. For example, the frame mode syntax element 38 may be inserted into data stream 20 directly, i.e. using a binary representation such as, for example, PCM, or using a variable length code and/or using entropy coding, such as Huffman or arithmetic coding. Thus, the associator 16 may be configured to extract 48, such as by decoding, the frame mode syntax element 38 from data stream 20 so as to derive any of the set 46 of possible values wherein the possible values are representatively illustrated in
That is, each possible value which the frame mode syntax element 38 may possibly assume, i.e. each possible value within the possible value range 46 of frame mode syntax element 38, is associated with a certain one of the plurality of frame coding modes A, B and C. In particular, there is a bijective mapping between the possible values of set 46 on the one hand, and the mode dependent set of frame coding modes on the other hand. The mapping, illustrated by the double-headed arrow 52 in
However, even the number of possible values within set 46 may change. This is indicated by the triangle drawn with a dashed line in
Stated differently, the following is noted. Internally, the value of the frame mode syntax element 38 may be represented by some binary value, the possible value range of which accommodates the set 46 of possible values independent from the currently active operating mode. To be even more precise, associator 16 internally represents the value of the frame syntax element 38 with a binary value of a binary representation. Using this binary values, the possible values of set 46 are sorted into an ordinal scale so that the possible values of set 46 remain comparable to each other even in case of a change of the operating mode. The first possible value of set 46 in accordance with this ordinal scale may for example, be defined to be the one associated with the highest probability among the possible values of set 46, with the second one of possible values of set 46 continuously being the one with the next lower probability and so forth. Accordingly, the possible values of frame mode syntax element 38 are thus comparable to each other despite a change of the operating mode. In the latter example, it may occur that domain and co-domain of bijective mapping 52, i.e. the set of possible values 46 and the mode dependent set of frame coding modes remains the same despite the active operating mode changing between the first and second operating modes, but the bijective mapping 52 changes the association between the frame coding modes of the mode dependent set on the one hand, and the comparable possible values of set 46 on the other hand. In the latter embodiment, the decoder 10 of
The just mentioned probability associated with possible values 46 and optionally used for encoding/decoding same may be static or adaptively changed. Different sets of probability estimations may be used for different operating modes. In case of adaptively changing the probability, context-adaptive entropy coding may be used.
As illustrated in
As will be outlined in more detail below with respect to
For example, see
As shown in
For further details regarding a possible implementation of the CELP decoder of
Similarly,
As shown in
As already mentioned above, the frequency-domain decoder 14 of
The embodiments for an audio decoder described above were especially designed to take advantage of an audio encoder which operates in different operating modes, namely so as to change the selection among frame coding modes between these operating modes to the extent that time-domain frame coding modes are not selected in one of these operating modes, but merely in the other. It should be noted, however, that the embodiments for an audio encoder described below would also—at least as far as a subset of these embodiments is concerned—fit to an audio decoder which does not support different operating modes. This is at least true for those encoder embodiments according to which the data stream generation does not change between these operation modes. In other words, in accordance with some of the embodiments for an audio encoder described below, the restriction of the selection of frame coding modes to frequency-domain coding modes in one of the operating modes does not reflect itself within the data stream 12 where the operating mode changes are, insofar, transparent (except for the absence of time-domain frame coding modes during one of these operating modes being active). However, the especially dedicated audio decoders according to the various embodiments outlined above form, along with respective embodiments for an audio encoder outlined above, audio codecs which take additional advantage of the frame coding mode selection restriction during a special operating mode corresponding, as outlined above, to special transmission conditions, for example.
The associator 102 is configured to associate each of consecutive portions 116a to 116c which correspond to the aforementioned portions 24 of the audio signal 112, with one out of a mode dependent set of a plurality of frame coding modes (see 40 and 42 of
The time-domain encoder 104 is configured to encode portions 116a to 116c having one of a first subset 30 of one or more of the plurality 22 of frame coding modes associated therewith, into a corresponding frame 118a to 118c of the data stream 114. The frequency-domain encoder 106 is likewise responsible for encoding portions having any frequency-domain coding mode of set 32 associated therewith into a corresponding frame 118a to 118c of data stream 114.
The associator 102 is configured to operate in an active one of a plurality of operating modes. To be more precise, the associator 102 is configured such that exactly one of the plurality of operating modes is active, but the selection of the active one of the plurality of operating modes may change during sequentially encoding portions 116a to 116c of audio signal 112.
In particular, the associator 102 is configured such that if the active operating mode is a first operating mode, the mode dependent set behaves like set 40 of
As outlined above, the functionality of the audio encoder of
It should be noted, however, that the control signal 120 may also be provided by some other entity such as, for example, a speech detector which analyzes the audio signal to be reconstructed, i.e. 112, so as to distinguish between speech phases, i.e. time intervals, during which a speech component within the audio signal 112 is predominant, and non-speech phases, where other audio sources such as music or the like are predominant within audio signal 112. The control signal 120 may be indicative of this change in speech and non-speech phases and the associator 102 may be configured to change between the operating modes accordingly. For example, in speech phases the associator 102 could enter the aforementioned “second operating mode” while the “first operating mode” could be associated with non-speech phases, thereby obeying the fact that choosing time-domain frame coding modes during non-speech phases very likely results in a less-efficient compression.
While the associator 102 may be configured to encode a frame mode syntax element 122 (compare syntax element 38 in
However, in terms of bitrate overhead, it may be of advantage if the data stream 114 is generated by the audio encoder 100 of
Accordingly, in accordance with an embodiment of the audio encoder 100 of
In order to explain a possible implementation for time-domain encoder 104 and frequency-domain encoder 106, reference is made to
Coming back to the description of
As far as the time-domain encoder 104 is concerned, same comprises, besides the LPC analyzer 130, an LP analysis filter 144 and a code based excitation signal approximator 146 both being serially connected between common input 140 and an output 148 of time-domain encoder 104. A linear prediction coefficient input of LP analysis filter 144 is connected to the output of LPC analyzer 130.
In encoding the audio signal 112 entering at input 140, the LPC analyzer 130 continuously determines linear prediction coefficients for each portion 116a to 116c of the audio signal 112. The LPC determination may involve autocorrelation determination of consecutive—overlapping or non-overlapping—windowed portions of the audio signal—with performing LPC estimation onto the resulting autocorrelations (optionally with previously subjecting the autocorrelations to Lag windowing) such as using a (Wiener-)Levison-Durbin algorithm or Schur algorithm or other.
As described with respect to
The time-domain encoder 104 may operate as follows. The LP analysis filter may filter time-domain coding mode portions of the audio signal 112 depending on the linear prediction coefficient output by LPC analyzer 130. At the output of LP analysis filter 144, an excitation signal 150 is thus derived. The excitation signal is approximated by approximator 146. In particular, approximator 146 sets a code such as codebook indices or other parameters to approximate the excitation signal 150 such as by minimizing or maximizing some optimization measure defined, for example, by a deviation of excitation signal 150 on the one hand and the synthetically generated excitation signal as defined by the codebook index on the other hand in the synthesized domain, i.e. after applying the respective synthesis filter according to the LPCs onto the respective excitation signals. The optimization measure may optionally be perceptually emphasized deviations at perceptually more relevant frequency bands. The innovation excitation determined by the code set by the approximator 146, may be called innovation parameter.
Thus, approximator 146 may output one or more innovation parameters per time-domain frame coding mode portion so as to be inserted into corresponding frames having a time-domain coding mode associated therewith via, for example, frame mode syntax element 122. The frequency-domain encoder 106, in turn, may operate as follows. The transformer 132 transforms frequency-domain portions of the audio signal 112 using, for example, a lapped transform so as to obtain one or more spectra per portion. The resulting spectrogram at the output of transformer 132 enters the frequency domain noise shaper 136 which shapes the sequence of spectra representing the spectrogram in accordance with the LPCs. To this end, the LPC converter 134 converts the linear prediction coefficients of LPC analyzer 130 into frequency-domain weighting values so as to spectrally weight the spectra. This time, the spectral weight is performed such that an LP analysis filter's transfer function results. That is, an ODFT may be, for example, used so as to convert the LPC coefficients into spectral weights which may then be used to divide the spectra output be transformer 132, whereas multiplication is used at the decoder side.
Thereinafter, quantizer 138 quantizes the resulting excitation spectrum output by frequency-domain noise shaper 136 into transform coefficient levels 60 for insertion into the corresponding frames of data stream 114.
In accordance with the embodiments described above, an embodiment of the present invention may be derived when modifying the USAC codec discussed in the introductory portion of the specification of the present application by modifying the USAC encoder to operate in different operating modes so as to refrain from choosing the ACELP mode in case of a certain one of the operating modes. In order to enable the achievement of a lower delay, the USAC codec may be further modified in the following way: for example, independent from the operating mode, only TCX and ACELP frame coding modes may be used. To achieve lower delay, the frame length may be reduced in order to reach the framing of 20 milliseconds. In particular, in rendering a USAC codec more efficient in accordance with the above embodiments, the operation modes of USAC, namely narrowband (NB), wideband (WB) and super-wideband (SWB), may be amended such that merely a proper subset of the overall available frame coding modes are available within the individual operation modes in accordance with the subsequently explained table:
As the above table makes clear, in the embodiments described above, the decoder's operation mode may not only be determined from an external signal or the data stream exclusively, but based on a combination of both. For example, in the above table, the data stream may indicate to the decoder a main mode, i.e. NB, WB, SWB, FB, by way of a coarse operation mode syntax element which is present in the data stream in some rate which may be lower than the frame rate. The encoder inserts this syntax element in addition to syntax elements 38. The exact operation mode, however, may necessitate the inspection of an additional external signal indicative of the available bitrate. In case of SWB, for example, the exact mode depends on the available bitrate lying below 48 kbps, being equal to or greater than 48 kbps, and being lower than 96 kbps, or being equal to or greater than 96 kbps.
Regarding the above embodiments it should be noted that, although in accordance with alternative embodiments, it is of advantage if the set of all plurality of frame coding modes with which the frames/time portions of the information signal are associatable, exclusively consists of time-domain or frequency-domain frame coding modes, this may be different, so that there may also be one or more than one frame coding mode which is neither time-domain nor frequency-domain coding mode.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods may be performed by any hardware apparatus.
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which will be apparent to others skilled in the art and which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
This application is a continuation of copending International Application No. PCT/EP2012/052461, filed Feb. 14, 2012, which is incorporated herein by reference in its entirety, and additionally claims priority from U.S. Provisional Application No. 61/442,632, filed Feb. 14, 2011, which is also incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61442632 | Feb 2011 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2012/052461 | Feb 2012 | US |
Child | 13966048 | US |