The invention relates to a method and to an apparatus for the bitrate-reducing encoding and decoding of information, in particular digital audio signals.
The digital representation of analog audio signals has a time structure that originates from the sampling process. Digital audio signals represented in PCM format consist of a sequence of values, wherein the distances between the values correspond to the sampling frequency. That distance is the shortest element of the signal by which the signal can be defined in the time domain. Digital signals can have a length that is an integer multiple only of this time element.
Encoders and decoders reducing the bitrate of a digital audio signal (like MPEG1/2/4-Audio, Dolby Digital AC-3, mp3, ATRAC, Windows Media Audio WMA or Real Audio) typically operate with short-time frequency-domain representations of the signal. In order to convert the signal into this domain, typically a number—e.g. 128, 256, 512, 1024 and 1152—of signal elements are grouped together—denoted as frames or blocks—and thereafter transformed into the frequency domain. When encoding a signal of arbitrary length, a typical audio coder either discards some part of the audio signal at its end or fills up the audio signal with a number of zero-valued samples (stuffing bits). As a result, the length—i.e. the quantity of samples or coefficients—of any encoded or decoded audio signal can be a multiple only of a further multiple of the initial time element mentioned above, i.e. a multiple of the frame or block length that is required by the encoding or decoding process. Therefore en-coded/de-coded digital audio signals rarely do have the same length as the original audio signal. This difference in lengths can be very annoying when audio signals are to be edited or combined with precise timing.
A problem to be solved by the invention is to provide a block-based encoded/decoded audio signal that has the original arbitrary length or quantity of sample values, in order to enable exact cutting or splicing.
According to the invention, information about the exact length of the original signal is transferred together with the encoded audio information when broadcasting or when recording on or replay from a storage medium. This length value information is available during the encoding process and is inserted into the encoded audio bit stream. Insertion is made using e.g. the ancillary data field as defined in the MPEG Audio standard ISO/IEC 11172-3. The length information sent can have different forms:
Additionally, an information value can be transferred that represents the total encoder and/or decoder delay.
The decoder can extract these items of information and adjust the length and the begin of the decoded signal by cutting off samples at the start and/or at the end of the program or track or decoding unit output.
The invention allows decoding an audio or other information signal with a length that matches exactly the original length of the audio or information signal, thereby enabling exact cutting and splicing of the audio or information signal.
In principle, the inventive encoding method is applied to a digital information signal—e.g. an audio signal—having an arbitrary number of original sample values for a specific program or track and thus having an arbitrary length, wherein the encoding operation is based on value blocks related to said sample values, said value blocks each containing multiple values, wherein the encoded digital information signal is output as a code that, when correspondingly decoded, represents a decoded digital information signal having a total length of multiple units corresponding to the length or lengths of said value blocks, and wherein data representing said original sample values arbitrary-length number
In principle, the inventive decoding method is applied to an encoded digital information signal—e.g. an audio signal—having an arbitrary number of original sample values for a specific program or track and thus having an arbitrary original length, wherein the decoding operation is based on value blocks related to said sample values, said value blocks each containing multiple values, wherein the encoded digital information signal is input as a code that after decoding represents a decoded digital information signal having a length of multiple units corresponding to the length or lengths of said value blocks, and wherein data representing said original sample values arbitrary-length number and supplementing frames of the encoded digital information signal input code, for example the last frame or the penultimate frame of said encoded digital information signal, or being repeatedly arranged in said encoded digital information signal, are used for limiting the block unit based total length of the decoded digital information signal to said arbitrary original length.
In principle the inventive apparatus for encoding a digital information signal—e.g. an audio signal—having an arbitrary number of original sample values for a specific program or track and thus having an arbitrary length, said value blocks each containing multiple values, includes:
In principle the inventive apparatus for decoding an encoded digital information signal—e.g. an audio signal—having an arbitrary number of original sample values for a specific program or track and thus having an arbitrary original length, includes:
Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in:
In studio sound or audio processing the available analog audio signals (e.g. at the output of microphone amplifiers) are converted into digital signals, applying the principles of sampling and quantisation. ‘Sampling’ means that signal amplitude values are taken in regular intervals. The reciprocal value of the temporal intervals is the sampling rate. According to the Nyquist or sampling theorem the original content of the sampled signals can be recovered error-free, if they contain maximum frequencies up to half the sampling rate only. Typical sampling rates used in audio processing are e.g. 44.1 kHz or 48 kHz, which correspond to sampling intervals or clocks of 22.67 μs or 20.83 μs, respectively. ‘Quantisation’ means that a reduced quantity of amplitude values is assigned to the basically finely resolved signal sample values, according to a quantisation characteristic. Thereby the resolution of the amplitude values becomes limited and the irreversible loss of information detail in the correspondingly inverse quantized values cannot be avoided. For example, a 16-bit amplitude value range extends from −32768 to +32767, and is also called 16-bit quantisation or 16-bit PCM (pulse code modulation). A two-channel audio signal that was sampled with 44.1 kHz sampling frequency and quantized with 16 bits leads to 1411200 bits per second to be processed. 16 bits correspond to 2 bytes, a value which can be easily handled in typical computers or microprocessors. Due to the byte-based processing and the relatively high sampling frequency and thus high time resolution, cut and insert processing can be carried out without problems when editing such digital audio signals.
The disadvantage of the high data quantities to be processed is apparent when transferring and storing such signals.
Therefore the above-mentioned data reducing methods are applied, which perform suppression of redundant as well as irrelevant signal components, based on psycho-acoustic laws. Data reduction factors of 10 or more can be achieved.
The data reduction effect is achieved more effectively, if the signals are represented and processed in the frequency domain that is entered either by short time frequency transformation (e.g. short time fast Fourier transformation FFT) or by multi-frequency band filtering called subband filtering. The result of both kinds of operations is a representation of the audio signal as a temporal sequence of short time spectra. In the decoder, a corresponding inverse transformation or inverse subband filtering, respectively, is carried out in order to re-enter the time domain.
The transformation is usually carried out on input sample blocks having lengths that fully or partly correspond to an integral power of ‘2’, e.g. 128, 256, 512, 1024 or 1152 values as mentioned above, because of computational simplification. Most data reduction coder and decoder types further operate with blocks overlapping in the time domain.
When using overlapping blocks, the total length values possible are an integral multiple of a section of the block length, e.g. an integral multiple of one half of the block length.
In subband coders a split into e.g. 32 frequency bands is carried out, and blocks of sampling values are likewise formed. E.g. MPEG Audio Layer3 (mp3) codecs use a block length of 1152 sampling values, corresponding to a time period of 24 ms at 48 kHz sampling rate.
The resulting coded signal representations are arranged in corresponding frames according to standardized rules, whereby the frames contain strongly signal-dependent binary signals. These frames usually contain sections with important control information (e.g. data packet header information with, side information) and sections with less important however strongly signal-adaptive frequency coefficient information called ‘main information’. Because the quantity of information to be transmitted varies strongly depending on the audio signal characteristic and practically never completely fills the capacity of the frames, the frames can also contain parts that represent no standardized useful information. These parts are called for instance ‘ancillary data’ and can be used freely for different purposes.
One task of the encoder is therefore controlling the coding such that the amount of coded data just fits the frames, i.e. does not exceed the given maximum datarate but makes full use of it. This is mainly achieved by adjusting the coding quality, e.g. the coarseness of the quantisation. The coder can be controlled such that a desired amount of the total datarate is kept for ancillary data.
When decoding (after storage or transfer) the correspondingly inverse processing takes place on the frames/blocks.
When applying above coding/decoding principles, two problems arise that strongly limit in particular the use of the decoded sound signal for editing:
If the above-described coding procedures are used in continuously operating transmission circuits, e.g. in broadcasting or in microwave links between broadcasting studios, the basic delay and the blocked structure do not impose a serious problem. However, if the audio signals are stored in coded form on data carriers with certain data lengths (as ‘files’), both problems are particularly unfavourable when cutting and editing the audio signals. Contrary to the short cutting/editing time units of approximately 20 μs available with PCM Audio signals, here only time units are present that are about 500 or 1000 times longer. Thereby the typical cutting and editing processes can be carried out in a limited fashion only.
To solve these problems, the following is supposed to be known:
According to the inventive solution, the basic delay value and the total length value are signalled to the decoder. This signalling can be performed by any means, for instance in a separate file or channel, preferably however together with the encoded data in the same data stream or data file, e.g. as ‘ancillary data’ or additional header data.
The decoder is designed such that it calculates at the start of decoding a certain number (corresponding to above basic delay value) of samples in the usual way but does not output these samples.
Furthermore the decoder is designed such that it initially calculates the audio signal at the end of the program or track in the usual way, but thereafter the output audio signal is limited in its total length corresponding to the transferred information on the total length value.
Advantageously, the transfer of the additional information, i.e. the basic delay value and the total length value, occurs within the ancillary data area. If necessary, the encoder must be controlled such that it reserves enough data capacity for the additional information.
Advantageously, the information about the basic delay is transmitted in the first frame or in one of the first frames. Advisable is transmitting it as a quantity of samples that are to be removed at the beginning. Transmitting this information repeatedly can also be an advantage.
The information about the total length value can be sent in different ways and at different locations within the Data stream or file, e.g. as a quantity of samples that are to be removed from the initially calculated end, or as a quantity of relevant samples within the last data frame, or as an absolute quantity of samples for the total length. This information can be transmitted in the first frame or in one of the first frames or within a later frame, e.g. the last or the second last frame. Transmitting this information repeatedly can also be an advantage.
Advantageously, the basic delay value and/or the total length value are preceded or initiated by an identification data pattern, and are protected by error protection data, e.g. a CRC check.
In
In
The left part of
The right part of
Instead of a digital audio signal any other information signal can be processed, e.g. a digital video signal.
Number | Date | Country | Kind |
---|---|---|---|
02090083 | Mar 2002 | EP | regional |
Number | Name | Date | Kind |
---|---|---|---|
5844600 | Kerr | Dec 1998 | A |
5905768 | Maturi et al. | May 1999 | A |
Number | Date | Country |
---|---|---|
WO 0217302 | Feb 2002 | WO |
Number | Date | Country | |
---|---|---|---|
20030167165 A1 | Sep 2003 | US |