This Application claims priority under 35 U.S.C. 119 to German Application No. 10102154.2, filed Jan. 18, 2001 and PCT Application, PCT/EP02/00295, the disclosure of each which is expressly incorporated herein by reference in its entirety.
The present invention relates to scalable encoders and decoders and in particular to the generation of scalable data streams through which a bit savings bank may be signalized.
Scalable encoders are shown in EP 0 846 375 B1. In general, scalability is understood as the possibility of decoding a partial section of a bit stream representing an encoded data signal, e.g. an audio signal or a video signal into a useful signal. This property is particularly desirable when e.g. a data transmission channel fails to provide the complete bandwidth necessary for transmitting a complete bit stream. On the other hand, an incomplete decoding is possible on a decoder with reduced complexity. Generally, different discrete scalability layers are defined in practice.
An example of a scalable encoder as defined in Subpart 4 (General Audio) of Part 3 (Audio) of the MPEG-4 Standard (ISO/IEC 14496-3; 1999 Subpart 4) is shown in
The scalable audio encoder further includes some further elements. First, there exists a delay stage 24 in the AAC branch and a delay stage 26 in the Celp branch. With both delay stages it is possible to set an optional delay for the respective branch. A downsampling stage 28 is downstream of the delay stage 26 of the Celp branch to adjust the sampling rate of the input signal s(t) to the sampling rate requested by the Celp encoder. An inverse Celp decoder 30 is downstream to the Celp encoder 12, wherein the Celp encoded/decoded signal is then supplied to an upsampling stage 32. The upsampled signal is then supplied to a further delay stage 34, which is termed “Core Coder Delay” in the MPEG-4 Standard.
The stage CoreCoderDelay 34 has the following function. If the delay is set to zero, the first encoder 14 and the second encoder 12 process exactly the same samples of the audio input signal in a so-called superframe. A superframe might e.g. consist of three AAC frames, which together represent a certain number of samples No. x to No. y of the audio signal. The superframe further includes e.g. 8 CELP blocks, which represent the same number of samples and also the same samples No. x to No. y if CoreCoderDelay=0.
If, however, a CoreCoderDelay D is set as a time value other than zero, the three blocks of AAC frames nevertheless represent the same samples No. x to No. y. The eight blocks of CELP frames, in contrast, represent the samples No. x-Fs D to No. y-Fs D, wherein Fs is the sampling frequency of the input signal.
The current time sections of the input signal in a superframe for the AAC blocks and the CELP blocks can thus be either identical, when CoreCoderDelay D=0, or be shifted relative to each other by CoreCoderDelay, when D is not equal to zero. For the following implementations, however, it will be assumed, on the grounds of simplicity and without restriction of generality, that CoreCoderDelay=0, so that the current time section of the input signal for the first encoder and the current time section for the second encoder are identical. In general, however, the only requirement for a superframe is, that the AAC block(s) and the CELP block(s) in a superframe represent the same number of samples, wherein it is not necessary for the samples themselves to be identical to one another, but they may also be shifted relative to each other by CoreCoderDelay.
It should be noted that the Celp encoder, depending on the configuration, may process a section of the input signal s(t) faster than the AAC encoder 14. In the AAC branch a block decision stage 26 is downstream to the optional delay stage 24 which establishes among other things whether short or long windows should be used for windowing the input signal s(t), wherein short windows must be chosen for strongly transient signals, while long windows are preferred for less transient signals since the relationship between the amount of payload data and page information is better than for short windows.
By the block decision stage 26 a fixed delay by e.g. ⅝ times a block is performed in the present example. This is referred to as a look-ahead function in the art. The block decision stage must already look ahead a certain time to be able to determine whether there are transient signals in future that must be encoded with short windows. After that the corresponding signal in the Celp branch as well as the signal in the AAC branch are fed to means for converting the time-related illustration to a spectral illustration, which is designated as MDCT 36 or 38, respectively, in
At this point, samples belonging together regarding time must be present, i.e. the delay must be identical in both branches.
The following block 44 determines whether it is more favorable to supply the input signal itself to the AAC encoder 14. This is enabled via the bypass branch 42. If it is determined, however, that the differential signal at the output of the subtracter 40 is smaller regarding energy than the signal output by the MDCT block 38, then not the original signal but the differential signal is taken to be encoded by the AAC encoder 14 to finally form the second scaling layer 18. This comparison may be performed band by band, which is indicated by frequency-selective switching means (FSS) 44. The exact functions of the individual elements are known in the art and are described for example in the MPEG-4 standard as well as in further MPEG standards.
One main feature in the MPEG-4 standard and in other encoder standards, respectively, is that the transmission of the compressed data signal is to be performed with a constant bit rate via a channel. All high-quality audio codecs operate based on blocks, i.e. they process blocks of audio data (order 480-1024 samples) to pieces of a compressed bit stream, which are also referred to as frames. The bit stream format must here be set up so that a decoder without a priory information where a frame starts is able to recognize the beginning of a frame in order to start the output of decoded audio signal data with a lowest possible delay. Thus, each header or determining data block of a frame starts with a certain synchronization word which may be searched for in a continuous bit stream. Further common components within the data stream apart from the determining data block are the main data or “payload data” of the individual layers in which the actual compressed audio data is contained.
Generally, the bit savings bank represents a buffer of bits which may be used to provide more bits for encoding a block of time sample as is actually allowed by the constant output data rate. The technology of the bit savings bank takes into account that some blocks of audio samples may be encoded with less bits than predetermined by the constant transmission rate, so that through these blocks the bit savings bank is filled, while again other blocks of audio samples comprise psychoacoustic characteristics which do not allow such a high compression so that for these blocks the available bits would actually not be enough for a low-interference or interference-free encoding, respectively.
The additional bits needed are taken from the bit savings bank so that the bit savings bank is emptied with such blocks.
Such an audio signal may, however, be also transmitted by a format with a variable frame length, as it is shown in
It is to be noted that the above-mentioned encoders are no scalable encoders but include only one single audio encoder.
In MPEG 4 the combination of different encoder/decoders to a scalable encoder/decoder is provided. It is therefore possible and sensible to combine one CELP voice encoder as the first encoder with an AAC encoder for the further scaling layer(s) and pack the same into one bit stream. The purpose of this combination is that the possibility remains open either to decode all scaling layers and therefore reach a best possible audio quality, or parts of the same, maybe even only the first scaling layer, with the correspondingly restricted audio quality. Reasons for only decoding the lowest scaling layer may be that due to a bandwidth of the transmission channel which is too small, the decoder only received the first scaling layer of the bit stream. Because of this the parts of the first scaling layer in the bit stream are favored over the second and the further scaling layers in the transmission, whereby the transmission of the first scaling layer is guaranteed with capacity bottlenecks in the transmission network, while the second scaling layer may be lost completely or in part.
A further reason may be that a decoder wants to achieve a lowest possible codec delay and therefore decodes only the first scaling layer. It is to be noted that the codec delay of a Celp code is generally significantly smaller than the delay of the AAC code.
In MPEG 4 version 2 the transport format LATM is standardized, which may among other things also transmit scalable data streams.
In the following, reference is made to
One superframe may comprise several ratios of number of AAC frames to number of CELP frames, as it is illustrated in tabular form in MPEG 4. Thus, a superframe may for example comprise one AAC block and 1 to 12 CELP blocks, 3 AAC blocks and 8 CELP blocks but also e.g. for example more AAC blocks than CELP blocks, depending on the configuration. An LATM frame which comprises an LATM determining data block includes a superframe or also several superframes.
The generation of the LATM frame opened by the header 1 is described as an example. First, the output data blocks 11, 12, 13, 14 of the Celp encoder 12 (
A disadvantage of the known bit stream formats illustrated in
A further disadvantage of the known bit stream formats is, that no bit stream format exists for a scalable data stream, so that the bit savings bank function for scalable data streams with output data of encoders having a different time basis may currently, in particular, not be made useable for the combination of AAC encoders and celp encoders of a scalable encoding device. As, however, a constant transmission rate is required, the AAC encoder, however, outputs blocks of a different length depending on the characteristics of the encoded signal, the case may well occur, that the AAC encoder requires more bits for the encoding of a section of the time signal than predetermined by the transmission rate, while it requires less bits for a different section than predetermined by the output data rate. Thus, the AAC encoder of the scalable encoding device will run out of bits in the latter case, while the AAC encoder of the scalable encoding device will not be able to avoid to introduce audible interferences into the encoded and again decoded signal in the first case in order to maintain the constant output data rate.
It is the object of the present invention to provide a method and a device for generating a scalable data stream suitable for the use of a bit savings bank function for a scaling layer, and to provide a method and a device for decoding a scalable data stream.
In accordance with a first aspect of the invention, this object is achieved by a method for generating a scalable data stream from one or several blocks of output data of a first encoder and from one or several blocks of output data of a second encoder, wherein the one or the several blocks of output data of the first encoder together represent a number of samples of the input signal for the first encoder forming the current section of the input signal for the first encoder, and wherein the one block or the several blocks of output data of the second encoder together represent a number of samples of the input signal for the second encoder, wherein the number of samples for the second encoder represent a current section of the input signal for the second encoder, wherein the number of samples for the first encoder and the number of samples for the second encoder are equal and wherein the current sections for the first and the second encoders are identical or shifted to each other by a period of time, comprising: writing a determining data block for the current section of the input signal for the first or the second encoder; writing output data of the second encoder representing a preceding section of the input signal for the second encoder, in transmission direction from an encoder to a decoder after the determining data block; writing output data of the second encoder representing the current section of the input signal for the second encoder, when the output data of the second encoder for the preceding section of the input signal are written; writing buffer information into the scalable data stream, wherein the buffer information indicates how far the output data of the second encoder for the preceding section extend beyond the determining data block for the second encoder; and writing the one or the several blocks of output data of the first encoder into the scalable data stream.
In accordance with a second aspect of the invention, this object is achieved by a device for generating a scalable data stream from one or several blocks of output data of a first encoder and from one or several blocks of output data of a second encoder, wherein the one or the several blocks of output data of the first encoder together represent a number of samples of the input signal for the first encoder, forming a current section of the input signal for the first encoder, and wherein the one block or the several blocks of output data of the second encoder together represent a number of samples of the input signal for the second encoder, wherein the number of samples for the second encoder form a current section of the input signal for the second encoder, wherein the number of samples for the first encoder and the number of samples for the second encoder are equal and wherein the current sections for the first and the second encoders are identical or shifted from each other by a period of time, comprising: means for writing a determining data block for the current section of the input signal for the first or the second encoder; means for writing output data of the second encoder representing a preceding section of the input signal for the second encoder, in transmission direction from an encoder to an decoder after the determining data block; means for writing output data of the second encoder representing the current section of the input signal for the second encoder when the output data of the second encoder for the preceding section of the input signal are written; means for writing buffer information into the scalable data stream, wherein the buffer information indicates how far the output data of the second encoder for the preceding section extend beyond the determining data block for the second encoder; and means for writing the one or the several blocks of output data of the first encoder into the scalable data stream.
In accordance with a third aspect of the invention, this object is achieved by a method for decoding a scalable data stream from one or several blocks of output data of a first encoder and from or several blocks of output data of a second encoder, wherein the one or the several blocks of output data of the first encoder together represent a number of samples of the input signal for the first encoder, forming a current section of the input signal for the first encoder, and wherein the one block or the several blocks of output data of the second encoder together represent a number of samples of the input signal for the second encoder, wherein the number of samples for the second encoder form a current section of the input signal for the second encoder, wherein the number of samples for the first encoder and the number of samples for the second encoder are equal, and wherein the current sections for the first and the second encoder are identical or shifted to each other by a period of time, wherein the scalable data stream comprises a determining data block for the current section for the first or the second encoder, output data of the second encoder for a preceding section of the input signal in transmission direction after the determining data block, and buffer information, indicating how far the output data of the second encoder for the preceding section extend beyond the determining data block, comprising the following steps: reading the determining data block for the current section of the input signal for the first or the second encoder; reading the output data of the first encoder for the current section of the first encoder; reading the buffer information; reading the output data of the second encoder for the current section starting from a position in the scalable data stream indicated by the buffer information; and decoding the output data of the second encoder and the output data of the first encoder to obtain a decoded signal.
In accordance with a fourth aspect of the invention, this object is achieved by a device for decoding a scalable data stream from one or several blocks of output data of a first encoder and from or several blocks of output data of a second encoder, wherein the one or the several blocks of output data of the first encoder together represent a number of samples of the input signal for the first encoder, forming a current section of the input signal for the first encoder, and wherein the one block or the several blocks of output data of the second encoder together represent a number of samples of the input signal for the second encoder, wherein the number of samples for the second encoder form a current section of the input signal for the second encoder, wherein the number of samples for the first encoder and the number of samples for the second encoder are equal, and wherein the current sections for the first and the second encoder are identical or shifted to each other by a period of time, wherein the scalable data stream comprises a determining data block for the current section for the first or the second encoder, output data of the second encoder for a preceding section of the input signal in transmission direction after the determining data block, and buffer information, indicating how far the output data of the second encoder for the preceding section extend beyond the determining data block, comprising: a bit stream demultiplexer, adapted to be able to perform the following steps: reading the determining data block for the current section of the input signal for the first or the second encoder; reading the output data of the first encoder for the current section of the first encoder; reading the buffer information; reading the output data of the second encoder for the current section starting from a position in the scalable data stream indicated by the buffer information; and means for decoding the output data of the second encoder and the output data of the first encoder to obtain a decoded signal.
The present invention is based on the findings that the known concept illustrated in
The decoder may then easily determine based on a determining data block and using the buffer information, where the output data of the second encoder end and where the output data of the second encoder for the current time section begin, so that the decoder is able to bring the corresponding output data blocks of the first encoder in connection with the corresponding output data blocks of the second encoder to decode the signal again in all layers, wherein the term “corresponding” relates to the fact that the respective data of the first and the second encoder are related to the same section of the input signal in case of CoreCoderDelay equal zero (see
In an inventive method for generating a scalable data stream from one or several blocks of output data of a first encoder and from one or several blocks of output data of a second decoder a determining data block is therefore written for a current section of the input signal. In addition, the output data of the second encoder illustrating a preceding section of the input signal are written in transmission direction from an encoder to a decoder after the determining data block. The output data of the second encoder relating to the current section of the input signal, i.e. which actually belong to the determining data block, may then be written when the output data of the second encoder for the preceding section are completely written. In addition, buffer information is written into the scalable data stream, wherein the buffer information indicates, how far the output data of the second encoder for the preceding section extend beyond the determining data block for the current section. The output data of the first encoder may either be written equidistantly or not at all into the scalable data stream, wherein it is, however, desired due to delay reasons to facilitate a low-delay decoding of the first scaling layer alone, i.e. only of the output data blocks of the first encoder, to write these data blocks in an equidistant and delay-optimized way.
Usually, a bit savings bank is defined among others by the maximum size of the bit savings bank, wherein this value is designated by “max bufferfullness” in
Independent of the functionality of the bit savings bank the inventive format further facilitates, however, to transmit output data blocks of a varying length of the second encoder in an equidistant grid of determining data blocks. It may therefore be sensible to choose the grid for the determining data blocks and the grid for the output data blocks of the first encoder equidistantly and in particular to select the same so that a determining data block is always followed by an output data block of the first encoder. The output data block of the second encoder is then written into the remaining gaps, wherein it is signalized by the buffer information how many data of the second encoder behind a determining data block belong to a time section which the determining data block refers to or which still count among the preceding time section of the input signal, so that the decoder may definitely and undoubtedly provide an association between output data blocks of the first encoder and an output data block of the second encoder for a time section of the input signal.
It is a further advantage of the present invention that the signalizing of the output data block after the determining data block may easily be combined with a signalizing of output data blocks of the first encoder before the determining data block for the current time section in order to facilitate a low-delay decoding only of the first scaling layer.
The inventive scalable data stream is in particular useful for real-time applications, may, however, also be used for non-real-time applications.
In the following, preferred embodiments of the present invention are explained in more detail referring to the accompanying drawings, in which:
a shows a schematical illustration of an input signal which is divided into successive time sections;
b shows a schematical illustration of an input signal which is divided into successive time sections, wherein the relation of the block length of the first encoder to the block length of the second encoder is illustrated;
c shows a schematical illustration of a scalable data stream having a high delay in the decoding of the first scaling layer;
d shows a schematical illustration of a scalable data stream having a low delay in the decoding of the first scaling layer;
e shows a bit stream format according to the present invention, in which after the determining data block for a current section only output data of the second encoder from a preceding time section is arranged;
In the following, reference is made to
Further, in contrast to
For the case of core frame offset =zero, the bit stream designated in
Through this bit stream set-up it is possible for the celp to transmit the generated celp block directly after encoding. In this case, no additional delay is added to the celp encoder by the bit stream multiplexer (20). Thus, for this case no additional delay is added to the celp delay by the scalable combination, so that the delay will be minimal.
It is indicated, that the case illustrated in
In the extreme case this means (1:12 for MPEG 4 CE-LP/AAC), that for the same time section of the input signal for which the AAC encoder generates an output data block, the celp encoder generates twelve output data blocks. The delay advantage by the data stream illustrated in
In the following, reference is made to
To this end, data from the output data block of the second encoder of the preceding section designated by “0” in the
According to the invention, in the case of the application for a scalable integer it is preferred to provide no inherent side information for signalizing the buffer information but to use the value bufferfullness already transmitted in the bit stream to this end, wherein the length of the pointer designated by “buffer information” in
In the following, reference is made to
As it may be seen from
core coder delay==tdip−celp encoder delay−downsampling delay==600−120−117=363 samples.
For the case without a bit savings bank function or for the case, respectively, that the bit savings bank (bit mux output buffer) is full, which is indicated by the variable bufferfullness=max, the case indicated in
The present invention may simply be combined with the bit savings bank function, as it is illustrated in the last line of
The bit savings bank level is transmitted by the variable “bufferfullness” according to MPEG 4 in the element Stream-MuxConfig. The variable bufferfullness is calculated from the variable bit reservoir divided by the 32-fold of the currently present channel number of the audio channels.
It is to be noted that the pointer designated with the reference numeral 314 in
It is further to be noted that the pointer 314 is deliberately drawn in an interrupted way below the celp block 2, as it does not consider the length of the celp block 2 or the length of the celp block 1, as this data has of course nothing to do with the bit savings bank of the AAC encoder. Further, no header data and bits of possibly present further layers are considered.
In the decoder, first of all an extraction of the celp frames from the bit stream is performed which is easily possible as the same are for example arranged equidistantly and have a fixed length.
In the LATM header, length and distance of all celp blocks may be signalized, so that in every case a direct decoding is possible.
Thereby, the parts of the output data of the AAC encoder of the directly preceding time section which were as it were separated by the celp block 2 may be joined again, and the LATM header 306 as it were moved to the beginning of the pointer 314, so that the decoder knowing the length of the pointer 314 knows, when the data of the directly preceding time section is over, to be able to decode the directly preceding time section together with the celp blocks present for the same with full audio quality when this data is completely read in.
In contrast to the case illustrated in
Number | Date | Country | Kind |
---|---|---|---|
101 02 154 | Jan 2001 | DE | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP02/00295 | 1/14/2002 | WO | 00 | 12/19/2003 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO02/058051 | 7/25/2002 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5365552 | Astle | Nov 1994 | A |
5758092 | Agarwal | May 1998 | A |
6092041 | Pan et al. | Jul 2000 | A |
6108625 | Kim | Aug 2000 | A |
6182031 | Kidder et al. | Jan 2001 | B1 |
6349284 | Park et al. | Feb 2002 | B1 |
6438525 | Park | Aug 2002 | B1 |
6446037 | Fielder et al. | Sep 2002 | B1 |
6904089 | Sueyoshi et al. | Jun 2005 | B1 |
20030065518 | Heo | Apr 2003 | A1 |
20030088423 | Nishio et al. | May 2003 | A1 |
Number | Date | Country |
---|---|---|
3912605 | Oct 1990 | DE |
0918401 | May 1999 | EP |
0884850 | Dec 1999 | EP |
2000307661 | Nov 2000 | JP |
WO9714229 | Apr 1997 | WO |
WO9933274 | Jul 1999 | WO |
Number | Date | Country | |
---|---|---|---|
20040107289 A1 | Jun 2004 | US |