The present invention is concerned with a codec supporting a time-domain aliasing cancellation transform coding mode and a time-domain coding mode as well as forward aliasing cancellation for switching between both modes.
It is favorable to mix different coding modes in order to code general audio signals representing a mix of audio signals of different types such as speech, music or the like. The individual coding modes may be adapted for particular audio types, and thus, a multi-mode audio encoder may take advantage of changing the encoding mode over time corresponding to the change of the audio content type. In other words, the multi-mode audio encoder may decide, for example, to encode portions of the audio signal having speech content, using a coding mode especially dedicated for coding speech, and to use another coding mode in order encode different portions of the audio content representing non-speech content such as music. Time-domain coding modes such as codebook excitation linear prediction coding modes, tend to be more suitable for coding speech contents, whereas transform coding modes tend to outperform time-domain coding modes as far as the coding of music is concerned, for example.
There have already been solutions for addressing the problem of coping with the coexistence of different audio types within one audio signal. The currently emerging USAC, for example, suggests switching between a frequency domain coding mode largely complying with the AAC standard, and two further linear prediction modes similar to sub-frame modes of the AMR-WB plus standard, namely a MDCT (Modified Discrete Cosine Transformation) based variant of the TCX (TCX=transform coded excitation) mode and an ACELP (adaptive codebook excitation linear prediction) mode. To be more precise, in the AMR-WB+ standard, TCX is based on a DFT transform, but in USAC TCX has a MDCT transform base. A certain framing structure is used in order to switch between FD coding domain similar to AAC and the linear prediction domain similar to AMR-WB+. The AMR-WB+ standard itself uses an own framing structure forming a sub-framing structure relative to the USAC standard. The AMR-WB+ standard allows for a certain sub-division configuration sub-dividing the AMR-WB+ frames into smaller TCX and/or ACELP frames. Similarly, the AAC standard uses a basis framing structure, but allows for the use of different window lengths in order to transform code the frame content. For example, either a long window and an associated long transform length may be used, or eight short windows with associated transformations of shorter length.
MDCT causes aliasing. This is, thus, true, at TCX and FD frame boundaries. In other words, just as any frequency domain coder using MDCT, aliasing occurs at the window overlap regions, that is cancelled by the help of the neighbouring frames. That is, for any transitions between two FD frames or between two TCX (MDCT) frames or transition between either FD to TCX or TCX to FD, there is an implicit aliasing cancellation by the overlap/add procedure within the reconstruction at the decoding side. Then, there is no more aliasing after the overlap add. However, in case of transitions with ACELP, there is no inherent aliasing cancellation. Then, a new tool has to be introduced which may be called FAC (forward aliasing cancellation). FAC is to cancel the aliasing coming from the neighbouring frames if they are different from ACELP.
In other words, aliasing cancellation problems occur whenever transitions between transform coding mode and time domain coding mode, such as ACELP, occur. In order to perform the transformation from the time domain to the spectral domain as effective as possible. time-domain aliasing cancellation transform coding is used, such as MDCT, i.e. a coding mode using a overlapped transform where overlapping windowed portions of a signal are transformed using a transform according to which the number of transform coefficients per portion is less than the number of samples per portion so that aliasing occurs as far as the individual portions are concerned, with this aliasing being cancelled by time-domain aliasing cancellation, i.e. by adding the overlapping aliasing portions of neighboring re-transformed signal portions. MDCT is such a time-domain aliasing cancellation transform. Disadvantageously, the TDAC (time-domain aliasing cancellation) is not available at transitions between the transform coding (TC) coding mode and the time-domain coding mode.
In order to solve this problem, forward aliasing cancellation (FAC) may be used according to which the encoder signals within the data stream additional FAC data within a current frame whenever a change in the coding mode from transform coding to time-domain coding occurs. This, however, necessitates the decoder to compare the coding modes of consecutive frames in order to ascertain as to whether the currently decoded frame comprises FAC data within its syntax or not. This, in turn, means that there may be frames for which the decoder may not be sure as to whether the decoder has to read or parse FAC data from the current frame or not. In other words, in case that one or more frames were lost during transmission, the decoder does not know for the immediately succeeding (received) frames as to whether a coding mode change occurred or not, and as to whether the bit stream of the current frame encoded data contains FAC data or not. Accordingly, the decoder has to discard the current frame and wait for the next frame. Alternatively, the decoder may parse the current frame by performing two decoding trials, one assuming that FAC data is present, and another assuming that FAC data is not present, with subsequently deciding as to whether one of both alternatives fails. The decoding process would most likely make the decoder crash in one of the two conditions. That is, in reality, the latter possibility is not a feasible approach. The decoder should at any time know how to interpret the data and not rely on its own speculation on how to treat the data.
According to an embodiment, a decoder for decoding a data stream having a sequence of frames into which time segments of an information signal are coded, respectively, may have a parser configured to parse the data stream, wherein the parser is configured to, in parsing the data stream, read a first syntax portion and a second syntax portion from a current frame; and a reconstructor configured to reconstruct a current time segment of the information signal associated with the current frame based on information acquired from the current frame by the parsing, using a first selected one of a Time-Domain Aliasing Cancellation transform decoding mode and a time-domain decoding mode, the first selection depending on the first syntax portion, wherein the parser is configured to, in parsing the data stream, perform a second selected one of a first action of expecting the current frame to have, and thus reading forward aliasing cancellation data from the current frame and a second action of not-expecting the current frame to have, and thus not reading forward aliasing cancellation data from the current frame, the second selection depending on the second syntax portion, wherein the reconstructor is configured to perform forward aliasing cancellation at a boundary between the current time segment and a previous time segment of a previous frame using the forward aliasing cancellation data.
According to another embodiment, an encoder for encoding an information signal into data stream such that the data stream has a sequence of frames into which time segments of the information signal are coded, respectively, may have a constructor configured to code a current time segment of the information signal into information of the current frame using a first selected one of a Time-Domain Aliasing Cancellation transform coding mode and a time-domain coding mode; and an inserter configured to insert the information into the current frame along with a first syntax portion and a second syntax portion, wherein the first syntax portion signals the first selection, wherein the constructor and inserter are configured to determine forward aliasing cancellation data for forward aliasing cancellation at a boundary between the current time segment and a previous time segment of a previous frame and insert the forward aliasing cancellation data into the current frame in case the current frame and the previous frame are encoded using different ones of the Time-Domain Aliasing Cancellation transform coding mode and the time-domain coding mode, and refraining from inserting any forward aliasing cancellation data into the current frame in case the current frame and the previous frame are encoded using equal ones of the Time-Domain Aliasing Cancellation transform coding mode and the time-domain coding mode, wherein the second syntax portion is set depending on as to whether the current frame and the previous frame are encoded using equal or different ones of the Time-Domain Aliasing Cancellation transform coding mode and the time-domain coding mode.
According to another embodiment, a method for decoding a data stream having a sequence of frames into which time segments of an information signal are coded, respectively, may have the steps of parsing the data stream, wherein parsing the data stream has reading a first syntax portion and a second syntax portion from a current frame; and reconstructing a current time segment of the information signal associated with the current frame based on information acquired from the current frame by the parsing, using a first selected one of a Time-Domain Aliasing Cancellation transform decoding mode and a time-domain decoding mode, the first selection depending on the first syntax portion, wherein, in parsing the data stream, a second selected one of a first action of expecting the current frame to have, and thus reading forward aliasing cancellation data from the current frame and a second action of not-expecting the current frame to have, and thus not reading forward aliasing cancellation data from the current frame is performed, the second selection depending on the second syntax portion, wherein the reconstructing includes performing forward aliasing cancellation at a boundary between the current time segment and a previous time segment of a previous frame using the forward aliasing cancellation data.
According to another embodiment, a method for encoding an information signal into data stream such that the data stream has a sequence of frames into which time segments of the information signal are coded, respectively, may have the steps of coding a current time segment of the information signal into information of the current frame using a first selected one of a Time-Domain Aliasing Cancellation transform encoding mode and a time-domain encoding mode; and inserting the information into the current frame along with a first syntax portion and a second syntax portion, wherein the first syntax portion signals the first selection, determining forward aliasing cancellation data for forward aliasing cancellation at a boundary between the current time segment and a previous time segment of a previous frame and inserting the forward aliasing cancellation data into the current frame in case the current frame and the previous frame are encoded using different ones of the Time-Domain Aliasing Cancellation transform encoding mode and the time-domain encoding mode, and refraining from inserting any forward aliasing cancellation data into the current frame in case the current frame and the previous frame are encoded using equal ones of the Time-Domain Aliasing Cancellation transform encoding mode and the time-domain encoding mode, wherein the second syntax portion is set depending on as to whether the current frame and the previous frame are encoded using equal or different ones of the Time-Domain Aliasing Cancellation transform encoding mode and the time-domain encoding mode.
According to another embodiment, a data stream may have a sequence of frames into which time segments of an information signal are coded, respectively, each frame having a first syntax portion, a second syntax portion, and information into which a time segment associated with the respective frame is coded using a first selected one of a Time-Domain Aliasing Cancellation transform coding mode and a time-domain coding mode, the first selection depending on the first syntax portion of the respective frame, wherein each frame includes forward aliasing cancellation data or not depending on the second syntax portion of the respective frame, wherein the second syntax portion indicates that the respective frame has forward aliasing cancellation data of the respective frame and the previous frame are coded using different ones of the Time-Domain Aliasing Cancellation transform coding mode and the time-domain coding mode so that forward aliasing cancellation using the forward aliasing cancellation data is possible at the boundary between the respective time segment and a previous time segment associated with the previous frame.
According to another embodiment, a computer program may have a program code for performing, when running on a computer, a method for decoding a data stream having a sequence of frames into which time segments of an information signal are coded, respectively, which may have the steps of parsing the data stream, wherein parsing the data stream includes reading a first syntax portion and a second syntax portion from a current frame; and reconstructing a current time segment of the information signal associated with the current frame based on information acquired from the current frame by the parsing, using a first selected one of a Time-Domain Aliasing Cancellation transform decoding mode and a time-domain decoding mode, the first selection depending on the first syntax portion, wherein, in parsing the data stream, a second selected one of a first action of expecting the current frame to include, and thus reading forward aliasing cancellation data from the current frame and a second action of not-expecting the current frame to include, and thus not reading forward aliasing cancellation data from the current frame is performed, the second selection depending on the second syntax portion, wherein the reconstructing includes performing forward aliasing cancellation at a boundary between the current time segment and a previous time segment of a previous frame using the forward aliasing cancellation data.
According to another embodiment, a computer program may have a program code for performing, when running on a computer, a method for encoding an information signal into data stream such that the data stream has a sequence of frames into which time segments of the information signal are coded, respectively, which may have the steps of coding a current time segment of the information signal into information of the current frame using a first selected one of a Time-Domain Aliasing Cancellation transform encoding mode and a time-domain encoding mode; and inserting the information into the current frame along with a first syntax portion and a second syntax portion, wherein the first syntax portion signals the first selection, determining forward aliasing cancellation data for forward aliasing cancellation at a boundary between the current time segment and a previous time segment of a previous frame and inserting the forward aliasing cancellation data into the current frame in case the current frame and the previous frame are encoded using different ones of the Time-Domain Aliasing Cancellation transform encoding mode and the time-domain encoding mode, and refraining from inserting any forward aliasing cancellation data into the current frame in case the current frame and the previous frame are encoded using equal ones of the Time-Domain Aliasing Cancellation transform encoding mode and the time-domain encoding mode, wherein the second syntax portion is set depending on as to whether the current frame and the previous frame are encoded using equal or different ones of the Time-Domain Aliasing Cancellation transform encoding mode and the time-domain encoding mode.
The present invention is based on the finding that a more error robust or frame loss robust codec supporting switching between time-domain aliasing cancellation transform coding mode and time-domain coding mode is achievable if a further syntax portion is added to the frames depending on which the parser of the decoder may select between a first action of expecting the current frame to include, and thus reading forward aliasing cancellation data from the current frame and a second action of not-expecting the current frame to include, and thus not reading forward aliasing cancellation data from the current frame. In other words, while a bit of coding efficiency is lost due to the provision of the second syntax portion, it is merely the second syntax portion which provides for the ability to use the codec in case of a communication channel with frame loss. Without the second syntax portion, the decoder would not be capable of decoding any data stream portion after a loss and will crash in trying to resume parsing. Thus, in an error prone environment, the coding efficiency is prevented from vanishing by the introduction of the second syntax portion.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
For ease of explanation of the below-outlined embodiments, it is assumed that the information signal 18 is an audio signal. However, it should be noted that the information signal could also be any other signal, such as a signal output by a physical sensor or the like, such as an optical sensor or the like. In particular, signal 18 may be sampled at a certain sampling rate and the time segments 16a to 16c may cover immediately consecutive portions of this signal 18 equal in time and number of samples, respectively. A number of samples per time segment 16a to 16c may, for example, be 1024 samples.
The decoder 10 comprises a parser 20 and a reconstructor 22. The parser 20 is configured to parse the data stream 12 and, in parsing the data stream 12, read a first syntax portion 24 and a second syntax portion 26 from a current frame 14b, i.e. a frame currently to be decoded. In
Naturally, each frame 14a to 14c also has further information incorporated therein which is for representing the associated time segment 16a to 16c in a way outlined in more detail below. This information is indicated in
The reconstructor 22 is configured to reconstruct the current time segment 16b of the information signal 18 associated with the current frame 14b based on the further information 28 using a selected one of the time-domain aliasing cancellation transform decoding mode and a time-domain decoding mode. The selection depends on the first syntax element 24. Both decoding modes differ from each other by the presence or absence of any transition from spectral domain back to time-domain using a re-transform. The re-transform (along with its corresponding transform) introduces aliasing as far as the individual time segments are concerned which aliasing is, however, compensable by a time-domain aliasing cancellation as far as the transitions at boundaries between consecutive frames coded in the time-domain aliasing cancellation transform coding mode is concerned. The time-domain decoding mode does not necessitate any re-transform. Rather, the decoding remains in time-domain. Thus, generally speaking, the time-domain aliasing cancellation transform decoding mode of reconstructor 22 involves a re-transform being performed by reconstructor 22. This retransform maps a first number of transform coefficients as obtained from information 28 of the current frame 14b (being of the TDAC transform decoding mode) onto a re-transformed signal segment having a sample length of a second number of samples which is greater than the first number thereby causing aliasing. The time-domain decoding mode, in turn, may involve a linear prediction decoding mode according to which the excitation and linear prediction coefficients are reconstructed from the information 28 of the current frame which, in that case, is of the time-domain coding mode.
Thus, as became clear from the above discussion, in the time-domain aliasing cancellation transform decoding mode, reconstructor 22 obtains from information 28 a signal segment for reconstructing the information signal at the respective time segment 16b by a re-transform. The re-transformed signal segment is longer than the current time segment 16b actually is and participates in the reconstruction of the information signal 18 within a time portion which includes and extends beyond time segment 16b.
However, in cases where the previous or succeeding frame 14a or 14c is coded in the time-domain coding mode, a transition between different coding modes results at the leading or trailing edge of the current time segment 16b and, in order to account for respective aliasing, the data stream 12 comprises forward aliasing cancellation data within the respective frame immediately following the transition for enabling the decoder 10 to compensate for the aliasing occurring at this respective transition. For example, it may happen that the current frame 14b is of the time-domain aliasing cancellation transform coding mode, but decoder 10 does not know as to whether the previous frame 14a was of the time-domain coding mode. For example, frame 14a may have got lost during transmission and decoder 10 has no access thereto, accordingly. However, depending on the coding mode of frame 14a, the current frame 14b comprises forward aliasing cancellation data in order to compensate for the aliasing occurring at aliasing portion 323 or not. Similarly, if the current frame 14b was of the time-domain coding mode, and the previous frame 14a has not been received by decoder 10, then the current frame 14b has forward aliasing cancellation data incorporated into it or not depending on the mode of the previous frame 14a. In particular, if the previous frame 14a was of the other coding mode, i.e. time-domain aliasing cancellation transform coding mode, then forward aliasing cancellation data would be present in the current frame 14b in order to cancel the aliasing otherwise occurring at boundary between time segments 16a and 16b. However, if the previous frame 14a was of the same coding mode, i.e. time-domain coding mode, then parser 20 would not have to expect forward aliasing cancellation data to be present in the current frame 14b.
Accordingly, the parser 20 exploits a second syntax portion 26 in order to ascertain as to whether forward aliasing cancellation data 34 is present in the current frame 14b or not. In parsing the data stream 12, parser 20 may selected one of a first action of expecting the current frame 14b to comprise, and thus reading forward aliasing cancellation data 34 from the current frame 14b and a second action of not-expecting the current frame 14b to comprise, and thus not reading forward aliasing cancellation data 34 from the current frame 14b, the selection depending on the second syntax portion 26. If present, the reconstructor 22 is configured to perform forward aliasing cancellation at the boundary between the current time segment 16b and the previous time segment 16a of the previous frame 14a using the forward aliasing cancellation data.
Thus, compared to the situation where the second syntax portion is not present, the decoder of
Before describing more detailed embodiments further below, an encoder able to generate the data stream 12 of
In the following, an embodiment is described according to which a codec, a decoder and an encoder of the above described embodiments belong to, supports a special type of frame structure according to which the frames 14a to 14c themselves are the subject to sub-framing, and two distinct versions of the time-domain aliasing cancellation transform coding mode exist. In particular, according to these embodiments further described below, the first syntax portion 24 associates the respective frame from which same has been read, with a first frame type called FD (frequency domain) coding mode in the following, or a second frame type called LPD coding mode in the following, and, if the respective frame is of the second frame type, associates sub-frames of a sub-division of the respective frame, composed of a number of sub-frames, with a respective one of a first sub-frame type and a second sub-frame type. As will outlined in more detail below, the first sub-frame type may involve the corresponding sub-frames to be TCX coded while the second sub-frame type may involve this respective sub-frames to be coded using ACELP, i.e. Adaptive Codebook Excitation Linear Prediction. Either, any other codebook excitation linear prediction coding mode may be used as well.
The reconstructor 22 of
Switch 50 has an input at which the information 28 of the currently decoded frame 14b enters, and a control input via which switch 50 is controllable depending on the first syntax portion 24 of the current frame. Switch 50 has two outputs one of which is connected to the input of decoding module 54 responsible for FD decoding (FD=frequency domain), and the other one of which is connected to the input of sub-switch 52 which has also two outputs one of which is connected to an input decoding module 56 responsible for transform coded excitation linear prediction decoding, and the other one of which is connected to an input of module 58 responsible for codebook excitation linear prediction decoding. All coding modules 54 to 58 output signal segments reconstructing the respective time segments associated with the respective frames and sub-frames from which these signal segments have been derived by the respective decoding mode, and a transition handler 60 receives the signal segments at respective inputs thereof in order to perform the transition handling and aliasing cancellation described above and described in more detail below in order to output at its output of the reconstructed information signal. Transition handler 60 uses the forward aliasing cancellation data 34 as illustrated in
According to the embodiment of
The reconstructed signal segments output by modules 54 to 58 are put together by transition handler 60 in the correct (presentation) time order with performing the respective transition handling and overlap-add and time-domain aliasing cancellation processing as described above and described in more detail below.
In particular, the FD decoding module 54 may be constructed as shown in
Re-transformer 72 then performs a re-transform on the de-quantized transform coefficient information to obtain a re-transformed signal segment 78 extending, in time, over and beyond the time segment 16b associated with the current frame 14b. As will be outlined in more detail below, the re-transform performed by re-transformer 72 may be an IMDCT (Inverse Modified Discrete Cosine Transform) involving a DCT IV followed by an unfolding operation wherein after a windowing is performed using a re-transform window which might be equal to, or deviate from, the transform window used in generating the transform coefficient information 74 by performing the afore-mentioned steps in the inverse order, namely windowing followed by a folding operation followed by a DCT IV followed by the quantization which may be steered by psycho acoustic principles in order to keep the quantization noise below the masking threshold.
It is worthwhile to note that the amount of transform coefficient information 28 is due to the TDAC nature of the re-transform of re-transformer 72, lower than the number of samples which the reconstructed signal segment 78 is long. In case of IMDCT, the number of transform coefficients within information 74 is rather equal to the number of samples of time segment 16b. That is, the underlying transform may be called a critically sampling transform necessitating time-domain aliasing cancellation in order to cancel the aliasing occurring due to the transform at the boundaries, i.e. the leading and trailing edges of the current time segment 16b.
As a minor note it should be noted that similar to the sub-frame structure of LPD frames, the FD frames could be the subject of a sub-framing structure, too. For example, FD frames could be of long window mode in which a single window is used to window a signal portion extending beyond the leading and trailing edge of the current time segment in order to code the respective time segment, or of a short window mode in which the respective signal portion extending beyond the borders of the current time segment of the FD frame is sub-divided into smaller sub-portions each of which is subject to a respective windowing and transform individually. In that case, FD coding module 54 would output a re-transformed signal segment for sub-portion of the current time segment 16b.
After having described a possible implementation of the FD coding module 54, a possible implementation of the TCX LP decoding module and the codebook excitation LP decoding module 56 and 58, respectively, is described with respect to
In order to deal with the TCX sub-frames the TCX LP decoding module 56 comprises a spectral weighting derivator 94, a spectral weighter 96 and a re-transformer 98. For illustration of purposes, the first sub-frame 90a is shown to be a TCX sub-frame, whereas the second sub-frame 90b is assumed to be ACELP sub-frame.
In order to process the TCX sub-frame 90a, derivator 94 derives a spectral weighting filter from LPC information 104 within information 28 of the current frame 14b, and spectral weighter 96 spectrally weights transform coefficient information within the respect of sub-frame 90a using the spectral weighting filter received from derivator 94 as shown by arrow 106.
Re-transformer 98, in turn, re-transforms the spectrally weighted transform coefficient information to obtain a re-transformed signal segment 108 extending, in time t, over and beyond the sub-portion 92a of the current time segment. The re-transform performed by re-transformer 98 may be the same as performed by re-transformer 72. In effect, re-transformer 72 and 98 may have hardware, a software-routine or a programmable hardware portion in common.
The LPC information 104 comprised by the information 28 of the current LPD frame 14b may represent LPC coefficients of one-time instant within time segment 16b or for several time instances within time segment 16b such as one set of LPC coefficients for each sub-portion 92a to 92c. The spectral weighting filter derivator 94 converts the LPC coefficients into spectral weighting factors spectrally weighting the transform coefficients within information 90a according to a transfer function which is derived from the LPC coefficients by derivator 94 such that same substantially approximates the LPC synthesis filter or some modified version thereof. Any de-quantization performed beyond the spectral weighting by weighter 96, may be spectrally invariant. Thus, differing from FD decoding mode, the quantization noise according to the TCX coding mode is spectrally formed using LPC analysis.
Due to the use of the re-transform, however, the re-transformed signal segment 108 suffers from aliasing. By using the same re-transform, however, re-transform signal segments 78 and 108 of consecutive frames and sub-frames, respectively, may have their aliasing cancelled out by transition handler 60 merely by adding the overlapping portions thereof.
In processing the (A)CELP sub-frames 90b, the excitation signal derivator 100 derives an excitation signal from excitation update information within the respective sub-frame 90b and the LPC synthesis filter 102 performs LPC synthesis filtering on the excitation signal using the LPC information 104 in order to obtain an LP synthesized signal segment 110 for the sub-portion 92b of the current time segment 16b.
Derivators 94 and 100 may be configured to perform some interpolation in order to adapt the LPC information 104 within the current frame 14b to the varying position of the current sub-frame corresponding to the current sub-portion within the current time segment 16b.
Commonly describing
However, the situation changes whenever an FD frame or TCX sub-frame (both representing a transform coding mode variant) precedes an ACELP sub-frame (representing a form of time domain coding mode). In that case, transition handler 60 derives a forward aliasing cancellation synthesis signal from the forward aliasing cancellation data from the current frame and adds the first forward aliasing cancellation synthesis signal to the re-transformed signal segment 100 or 78 of the immediately preceding time segment to re-construct the information signal across respective the boundary. If the boundary falls into the inner of the current time segment 16b because a TCX sub-frame and an ACELP sub-frame within the current frame define the boundary between the associated time segment sub-portions, transition handler may ascertain the existence of the respective forward aliasing cancellation data for these transitions from first syntax portion 24 and the sub-framing structure defined therein. The syntax portion 26 is not needed. The previous frame 14a may have got lost or not.
However, in case of the boundary coinciding with the boundary between consecutive time segments 16a and 16b, parser 20 has to inspect the second syntax portion 26 within the current frame in order to determine as to whether the current frame 14b has forward aliasing cancellation data 34, the FAC data 34 being for cancelling aliasing occurring at the leading end of the current time segment 16b, because either the previous frame is an FD frame or the last sub-frame of the preceding LPD frame is a TCX sub-frame. At least, parser 20 needs to know syntax portion 26 in case, the content of the previous frame got lost.
Similar statements apply for transitions into the other direction, i.e. from ACELP sub-frames to FD frames or TCX frames. As long as the respective boundaries between the respective segments and segment sub-portions fall within the inner of the current time segment, the parser 20 has no problem in determining the existence of the forward aliasing cancellation data 34 for these transitions from the current frame 14b itself, namely from the first syntax portion 24. The second syntax portion is not needed and is even irrelevant. However, if the boundary occurs at, or coincides with, a boundary between the previous time segment 16a and the current time segment 16b, parser 20 needs to inspect the second syntax portion 26 in order to determine as to whether forward aliasing cancellation data 34 is present for the transition at the leading end of the current time segment 16b or not—at least in case of having no access to the previous frame.
In case of transitions from ACELP to FD or TCX, the transition handler 60 derives a second forward aliasing cancellation synthesis signal from the forward aliasing cancellation data 34 and adds the second forward aliasing cancellation synthesis signal to the re-transformed signal segment within the current time segment in order to reconstruct the information signal across the boundary.
After having described embodiments with regard to
Window switching in USAC has several purposes. It mixes FD frames, i.e. frames encoded with frequency coding, and LPD frames which are, in turn, structured into ACELP (sub-) frames and TCX (sub-)frames. ACELP frames (time-domain coding) apply a rectangular, non-overlapping windowing to the input samples while TCX frames (frequency-domain coding) apply a non-rectangular, overlapping windowing to the input samples and then encode the signal using a time-domain aliasing cancellation (TDAC) transform, namely the MDCT, for example. To harmonize the overall windows, TCX frames may use centered windows with homogeneous shapes and to manage the transitions at ACELP frame boundaries, explicit information for cancelling the time-domain aliasing and windowing effects of the harmonized TCX windows are transmitted. This additional information can be seen as forward aliasing cancellation (FAC). FAC data is quantized in the following embodiment in the LPC weighted domain so that quantization noises of FAC and decoded MDCT are of the same nature.
Thus, the vertical dotted lines in
Line 1 of
Note, however, that the transitions at LPC1 and LPC2 in
Line 2 of
To obtain line 3 of
The further processing at the encoder side regarding frame 120 is explained in the following with respect to line 3 of
The first contribution 130 is a windowed and time-reversed (of folded) version of the last ACELP synthesis samples, i.e. the last samples of signal segment 110 shown in
The second contribution 132 is a windowed zero-input response (ZIR) of the LPC1 synthesis filter with the initial state taken as the final states of this filter at the end of the ACELP synthesis 110, i.e. at the end of frame 122. The window length and shape of this second contribution may be the same as for the first contribution 130.
With new line 3 in
Note that if the decoder were to use only the synthesis signals of line 3 in
Before proceeding to describe the encoding process in order to obtain the forward aliasing cancellation data, reference is made to
In transitioning from the time-domain to transform-domain, the TDAC transform involves a windowing 150 applied to an interval 152 of the signal to be transformed which extends beyond the time segment 154 for which the later resulting transform coefficients are actually be transmitted within the data stream. The window applied in the windowing 150 is shown in
A re-transform does the reverse. That is, following a de-quantization 164, an IMDCT 166 is performed involving, firstly, a DCT−1 IV 167 so as to obtain time samples the number of which equals the number of samples of the time segment 154 to be re-constructed. Thereafter, an unfolding process 168 is performed on the inversely transformed signal portion received from module 167 thereby expanding the time interval or the number of time samples of the IMDCT result by doubling the length of the aliasing portions. Then, a windowing is performed at 170, using a re-transform window 172 which may be same as the one used by windowing 150, but may also be different. The remaining blocks in
The description of
To compensate for the windowing and time-domain aliasing effects around marker LPC1, the processing is described in
Now, the processing for the windowing and time-domain aliasing correction at the end of the TC frame 120 (before marker LPC2) is described. To this end, reference is made to
The error signal at the end of the TC frame 120 on line 4 in
The processing in
Note again, the FAC data 34 may relate to such a transition occurring inside the current time segment in which case the existence of the FAC data 34 is derivable for parser 20 from solely from syntax portion 24, whereas parser 20 needs to, in case of the previous frame having got lost, exploit the syntax portion 26 in order to determine as to whether FAC data 34 exists for such transitions at the leading edge of the current time segment 16b.
1. One step is to decode the MDCT-encoded TC frame and position the thus obtained time-domain signal between markers LPC1 and LPC2 as shown in line 2 of
2. Another step in the processing of the transition handler 60 is the generation of the FAC synthesis signal according to
3. As far as the situation of
4. The contributions of lines 1, 2 and 3 of
Thus,
In the following, specific possibilities will be mentioned as to how the second syntax portion 26 may be implemented.
For example, in order to handle the occurrence of lost frames, the syntax portion 26 may be embodied as a 2-bit field prev_mode that signals within the current frame 14b explicitly the coding mode that was applied in the previous frame 14a according to the following table:
With other words, this 2-bit field may be called prev_mode and may thus indicate a coding mode of the previous frame 14a. In case of the just-mentioned example, four different states are differentiated, namely:
1) The previous frame 14a is an LPD frame, the last sub-frame of which is an ACELP sub-frame;
2) the previous frame 14a is an LPD frame, the last sub-frame of which is a TCX coded sub-frame;
3) the previous frame is an FD frame using a long transform window and
4) the previous frame is an FD frame using short transform windows.
The possibility of potentially using different window lengths of FD coding mode has already been mentioned above with respect to the description of
In any case, based on the above-outlined 2-bit field, the parser 20 is able to decide as to whether FAC data for the transition between the current time segment and the previous time segment 16a is present within the current frame 14b or not. As will be outlined in more detail below, parser 20 and reconstructor 22 are even able to determine based on prev_mode as to whether the previous frame 14a has been an FD frame using a long window (FD_long) or as to whether the previous frame has been an FD frame using short windows (FD_short) and as to whether the current frame 14b (if the current frame is an LPD frame) succeeds an FD frame or an LPD frame which differentiation is needed according to the following embodiment in order to correctly parse the data stream and reconstruct the information signal, respectively.
Thus, in accordance with the just-mentioned possibility of using a 2-bit identifier as the syntax portion 26, each frame 14a to 14c would be provided with an additional 2-bit identifier in addition to the syntax portion 24 which defines the coding mode of the current frame to be a FD or LPD coding mode and the sub-framing structure in case of LPD coding mode.
For all of the above embodiments, it should be mentioned that other inter-frame dependencies should be avoided as well. For example, the decoder of
It is worthwhile to note for all the above-described embodiments, that the parser 20 could be configured to buffer at least the currently decoded frame 14b within a buffer with passing all the frames 14a to 14c through this buffer in a FIFO (first in first out) manner. In buffering, parser 20 could perform the removal of frames from this buffer in units of frames 14a to 14c. That is, the filling and removal of the buffer of parser 20 could be performed in units of frames 14a to 14c so as to obey the constraints imposed by the maximally available buffer space which, for example, accommodates merely one, or more than one, frames of maximum size at a time.
An alternative signaling possibility for syntax portion 26 with reduced bit consumption will be described next. According to this alternative, a different construction structure of the syntax portion 26 is used. In the embodiment described before, the syntax portion 26 was a 2-bit field which is transmitted in every frame 14a to 14c of the encoded USAC data stream. Since for the FD part it is only important for the decoder to know whether it has to read FAC data from the bit stream in case the previous frame 14a was lost, these 2-bits can be divided into two 1-bit flags where one of them is signaled within every frame 14a to 14c as fac_data_present. This bit may be introduced in the single_channel_element and channel_pair_element structure accordingly as shown in the tables of
The other 1-bit flag prev_frame_was_lpd is then only transmitted in the current frame if same was encoded using the LPD part of USAC, and signals whether the previous frame was encoded using the LPD path of the USAC as well. This is shown in the table of
The table of
If, however, the current frame is an LPD frame with the preceding frame being also an LPD frame, i.e. if a transition between TCX and CELP sub-frames occurs between the current frame and the previous frame, FAC data is read at 206 without the gain adjustability option, i.e. without the FAC data 34 including the FAC gain syntax element fac_gain. Further, the position of the FAC data read at 206 differs from the position at which FAC data is read at 202 in case of the current frame being an LPD frame and the previous frame being an FD frame. While the position of reading 202 occurs at the end of the current LPD frame, the reading of the FAC data at 206 occurs before the reading of the sub-frame specific data, i.e. the ACELP or TCX data depending on the modes of the sub-frames of the sub-frames structure, at 208 and 210, respectively.
In the example of
For completeness only, the syntax structure of the LPD frame according to
For sake of completeness only,
Thus, the 1-bit flag prev_frame_was_lpd is only transmitted if the current frame is encoded using the LPD part of USAC and signals whether the previous frame was encoded using the LPD path of the USAC codec (see Syntax of lpd_channel_stream( ) in
Regarding the embodiment of
1) The FAC data 34 mentioned in the previous figures was meant to primarily note the FAC data present in the current frame 14b in order to enable forward aliasing cancellation occurring at the transition between the previous frame 14a and the current frame 14b, i.e. between the corresponding time segments 16a and 16b. However, further FAC data may be present. This additional FAC data, however, deals with the transitions between TCX coded sub-frames and CELP coded sub-frames positioned internally to the current frame 14b in case the same is of the LPD mode. The presence or absence of this additional FAC data is independent from the syntax portion 26. In
2) Further, the syntax portion 26 may be composed of more than one syntax element as described above. The flag FAC_data_present indicates as to whether fac_data for the boundary between the previous frame and the current frame is present or not. This flag is present at an LPD frame as well as FD frames. A further flag, in the above embodiment called prev_frame_was_lpd, is transmitted in LPD frames only in order to denote as to whether the previous frame 14a was of the LPD mode or not. In other words, this second flag included in the syntax portion 26 indicates as to whether the previous fame 14a was an FD frame. The parser 20 expects and reads this flag merely in case of the current frame being an LPD frame. In
3) By dividing-up the second syntax portion 26 into the just-mentioned three flags, it is possible to transmit merely one flag or bit to signal the second syntax portion 26 in case of the current frame being an FD frame, merely two flags or bits in case of the current frame being an LPD frame and the previous frame being an LPD frame, too. Merely in case of a transition from an FD frame to a current LPD frame, a third flag has to be transmitted in the current frame. Alternatively, as stated above, the second syntax portion 26 may be a 2-bit indicator transmitted for every frame and indicating the mode the frame preceding this frame to the extent needed for the parser to decide as to whether FAC data 34 has to be read from the current frame or not, and if so, from where and how long the FAC synthesis signal is. That is, the specific embodiment of
A syntax portion 26 could also merely have three different possible values in case FD frames will use only one possible length.
A slightly differing, but very similar syntax structure to that described above with respect to 15 to 19 is shown in
With regard to the embodiments described with respect to
With regard to the embodiments described with respect to
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
The inventive encoded audio signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blue-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitory.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
The above described embodiments are merely illustrative for the principles of the present invention. It is understood that modifications and variations of the arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein.
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
This application is a continuation of copending International Application No. PCT/EP2011/061521, filed Jul. 7, 2011, which is incorporated herein by reference in its entirety, and additionally claims priority from U.S. Patent Application No. 61/362,547, filed Jul. 8, 2010 and U.S. Patent Application No. 61/372,347, filed Aug. 10, 2010, all of which are incorporated herein by reference in their entirety.
Number | Name | Date | Kind |
---|---|---|---|
7516064 | Vinton et al. | Apr 2009 | B2 |
8457975 | Neuendorf et al. | Jun 2013 | B2 |
8706480 | Herre et al. | Apr 2014 | B2 |
8898059 | Beack et al. | Nov 2014 | B2 |
8918324 | Choo | Dec 2014 | B2 |
8959015 | Lee | Feb 2015 | B2 |
8959017 | Grill | Feb 2015 | B2 |
9043215 | Neuendorf | May 2015 | B2 |
20030009325 | Kirchherr | Jan 2003 | A1 |
20050192797 | Makinen | Sep 2005 | A1 |
20050192798 | Vainio et al. | Sep 2005 | A1 |
20070168197 | Vasilache | Jul 2007 | A1 |
20070269063 | Goodwin et al. | Nov 2007 | A1 |
20080052066 | Oshikiri et al. | Feb 2008 | A1 |
20090299757 | Guo et al. | Dec 2009 | A1 |
20100063806 | Gao | Mar 2010 | A1 |
20100138218 | Geiger | Jun 2010 | A1 |
20100145688 | Sung | Jun 2010 | A1 |
20100217607 | Neuendorf et al. | Aug 2010 | A1 |
20100262420 | Herre et al. | Oct 2010 | A1 |
20100324912 | Choo | Dec 2010 | A1 |
20110087494 | Kim | Apr 2011 | A1 |
20110119054 | Lee | May 2011 | A1 |
20110153333 | Bessette | Jun 2011 | A1 |
20110178809 | Philippe et al. | Jul 2011 | A1 |
20120022880 | Bessette | Jan 2012 | A1 |
20120209600 | Kim et al. | Aug 2012 | A1 |
20120226496 | Yoon et al. | Sep 2012 | A1 |
20130090929 | Ishikawa et al. | Apr 2013 | A1 |
Number | Date | Country |
---|---|---|
1926609 | May 2007 | CN |
2373014 | Oct 2011 | EP |
2009-523258 | Jan 2009 | JP |
10-2010-0059726 | Jun 2010 | KR |
200841743 | Dec 1996 | TW |
200912896 | Jun 1997 | TW |
2010125228 | Nov 2010 | WO |
WO-2010003532 | Feb 2011 | WO |
WO-2010003563 | Feb 2011 | WO |
Entry |
---|
Bessette, Bruno et al., “Alternatives for windowing in USAC”, Bruno Bessette et al.: “Alternatives for windowing in USAC”, 89. MPEG meeting, Jun. 6, 2009-Mar. 7, 2009, London, No. M16688, Jun. 29, 2009. |
Geiser, Bernd et al., “Candidate proposal for ITU-T super-wideband speech and audio coding”, Bernd Geiser et al.: “Candidate proposal for ITU-T super-wideband speech and audio coding”, Acoustics, Speech and signal processing, 2009, ICASSP 2009, IEEE international conference on, IEEE, Piscatway, NJ, USA, Apr. 19, 2009, pp. 4121-4124, Apr. 19, 2009, 4121-4124. |
Geiser, Bernd et al., “Joint pre-echo control and frame erasure concealment for VOIP audio codes”, Bernd Geiser et al: “Joint pre-echo control and frame erasure concealment for VOIP audio codes”, 17th European signal processing conference (Eusipco 2009), Aug. 24, 2009, pp. 1259-1263, Aug. 24, 2009, 1259-1263. |
Neuendorf, Max et al., “Completion of core experiment on unification of USAC windowing and frame transitions”, Max Neuendorf et al: “Completion of core experiment on unification of USAC windowing and frame transitions”, 91. MPEG meeting, Jan. 18, 2010-Jan. 22, 2010, Kyoto, No. M17167, Jan. 16, 2010. |
Anonymous, “Call for Proposals on Unified Speech and Audio Goding”, 82. MPEG Meeting; Oct. 22-26, 2007; Shenzhen; (Motion Pictureexpert Group or ISO/IEC JTC1/SC29/WG11), No. N9519, XP030016014, ISSN: 0000-0044, Oct. 26, 2007, (6 pages). |
Anonymous, “ISO Guidlines and Poicies for the protection of ISO's intellectual property”, Retrieved from the Internet: URL:http://www.open-std.org/jtc1/impit/open/j1n4564.htm [retrieved on Nov. 11, 2015], Jan. 17, 1997 (20 pages). |
Anonymous, “WD5 of USAC”, 90. MPEG Meeting; Oct. 26-30, 2009; Xian; (Motion Picture Expertgroup or ISO/IEC JTC1/SC29/WG11), No. N11040, XP030017537, ISSN: 0000-0031, Dec. 8, 2009 (146 pages). |
Number | Date | Country | |
---|---|---|---|
20130124215 A1 | May 2013 | US |
Number | Date | Country | |
---|---|---|---|
61362547 | Jul 2010 | US | |
61372347 | Aug 2010 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2011/061521 | Jul 2011 | US |
Child | 13736762 | US |