The invention generally relates to a reference picture refresh delay during encoding and decoding of video sequences. Particularly, the present invention relates to a method and apparatus for predictive encoding and decoding of video sequences employing multiple reference images and a repetitive reference picture refresh in order to allow random access.
The transmission of motion pictures requires a substantial amount of data to be sent through conventional transmission channels of a limited available frequency bandwidth. For transmitting digital data through a limited channel bandwidth, it is inevitable to compress or reduce the volume of the video data to be transmitted. Video coding standards have been developed for reducing the amount of video data. Video coding standards are denoted with H.26x for ITU-T standards and with MPEG-x for ISO/IEC standards.
The underlying coding approach of most of the video coding standards consists of the following main stages. First, each video frame of a sequence of video frames is divided into blocks of pixels, and the following processing of video frames is conducted at a block level. The quantity of video data is then reduced by analysing the video data in spatial and temporal respect. Spatial redundancies are reduced within a video frame by subjecting the video data of each block to transformation, quantization and entropy coding.
Temporal dependencies between blocks of subsequent frames are exploited in order to only transmit differences between subsequent frames. This is accomplished by employing a motion estimation and compensation technique. For any given block, a search is performed in previously coded frames to determine a motion vector. The determined motion vector is utilized by the encoder and decoder to predict the image data of a block.
An example of a video encoder configuration is illustrated in
The operation of the video encoder of
The prediction 125 is based on the decoding result 165 (the “currently decoded image”) of previously encoded images at the encoder side. This is accomplished by a decoding unit 160 being incorporated into video encoder 100. Decoding unit 160 performs the encoding steps in a reverse manner, i.e. decoding unit 160 comprises an inverse quantizing unit Q−1, an inverse transform unit T−1, and an adder for adding the decoded differences to the prediction 125. In the same manner, a separate decoder (not shown in the drawings) receiving the encoded sequence 180 of video images will decode the received data stream and output decoded images 165.
The motion compensated DPCM, conducted by the video encoder of
Based on the results of motion estimation, motion compensation provides a prediction utilizing the determined motion vector. The information contained in a prediction error block, resulting from the differences between the current and the predicted block, is then transformed into the transform coefficients by transform unit 130. Generally, a two-dimensional Transform (T), for instance a Discrete Cosine Transform or an Integer Transform, is employed therefore. The resulting transform coefficients are quantized and finally entropy encoded (VLC) in entropy encoding unit 150.
The transmitted stream of compressed video data 180 is received by a decoder (not shown) for again producing the sequence of encoded video images from the received bit stream. The decoder configuration corresponds to that of decoder 160 described in connection with
The prediction between subsequent fields or frames, which is performed in order to take advantage of temporary redundancies between subsequent images, is conducted either in form of a unidirectional or in form of a bi-directional motion estimation and compensation. When a selected reference frame in motion estimation is a previously encoded frame, the encoded frame is referred to as a P-picture. In case both, a previously encoded frame and a future frame, are chosen as reference frames, the frame to be encoded is referred to as a B-picture.
Latest video encoding standards offer the option of having multiple reference frames for inter-picture encoding. The use of multiple reference frames results in a more efficient coding of images. For this purpose, motion estimation and compensation utilizes a multi-frame buffer for providing several reference pictures. The motion vector is accompanied by additional information indicating the individual reference image used.
The internal configuration of a motion estimation and compensation unit 190 of
Other images of the encoded video sequence, which are denoted as I-pictures, only reduce special redundancies within the image and do not exploit any temporal information.
According to the emerging H.264 video encoding standard, instantaneous decoder refresh (IDR) pictures are additionally provided. Such IDR pictures do not exploit any temporal information corresponding to the encoding of I-pictures. In addition, an IDR pictures resets the multi-frame buffer in order break inter-dependencies from any picture decoded prior to IDR-picture. For this purpose, the coding/decoding process marks all current reference pictures in the multi-frame buffer 200 as “unused for reference” immediately before encoding/decoding IDR-picture. Marking all reference pictures as “unused for reference” indicates that subsequent pictures in the encoding/decoding order are only processed without inter-prediction from pictures prior to the IDR-picture. Hence, the use of IDR-pictures reduce the processing effort for random access to any of the encoded images of the video sequence. IDR-pictures enable a jump to any temporal position within the encoded bit stream and decoding the subsequent pictures without decoding any of the previous images.
The encoding of video images employing IDR-pictures will be explained in more detail with reference to
One of the images of the sequence of
The first of the above features of IDR-pictures, namely to only intra encode video data, is similar to that of former video encoding standards like MPEG-1 or MPEG-2 utilizing I-type frames.
The second of the above features has no antecedents in former video encoding standards. These former standards only apply a predetermined prediction scheme including a maximum of one reference frame prior to the currently encoded/decoded frame and another one following the current frame. Latest video coding standards like H.263++ and H.264/AVC apply a plurality of reference images for motion compensated prediction. A single I-picture cannot anymore break inter-prediction to previous frames. For this purpose, a “breakpoint” 470 is introduced into the encoding/decoding process in order to start any inter-prediction a new utilizing an IDR picture as shown in
The use of IDR-pictures causes a number of problems. One of the main problem is that the coding efficiency is reduced.
Accordingly, it is the object of the present invention to provide an encoding method, an encoder, a decoding method and a decoder which enable a more efficient compressing of a video sequence.
This is achieved for an encoding method by the features as set forth in claim 1, for an encoder by the features as set forth in claim 13, for a decoding method by the features as set forth in claim 23, and for a decoder by the features as set forth in claim 33.
According to a first aspect of the present invention, a method for predictive encoding a sequence of video images is provided. The encoding method employs a motion estimation for determining motion vectors between each of a plurality of image areas of an image to be encoded and image areas of a plurality of reference images. Said reference images being previously encoded images of said image sequence. During encoding, the method subjects all images of said image sequence to motion estimation except predetermined individual images thereof. In addition, the method disables current reference images from being reference images wherein a disabling of all current reference images except the predetermined image not subjected to motion estimation is performed after lapse of a predetermined delay after having encoded the predetermined image of the images not subjected to motion estimation.
According to a second aspect, an encoder for predictive encoding a sequence of a video image is provided. The encoder comprises a multi-frame buffer, a motion estimation unit and a buffer controller. The multi-frame buffer stores a plurality of reference images. The reference images being previously encoded images of said image sequence. The motion estimation unit determines a motion vector between each of a plurality of image areas of an image to be encoded and image areas of a plurality of said reference images. The motion estimation unit being adapted to subject all images of said image sequence to be encoded to motion estimation except predetermined individual images thereof. The buffer controller disables current reference images from being reference images wherein said buffer controller disables all current reference images except the predetermined image not subjected to motion estimation after lapse of a predetermined delay after encoding the image of the predetermined individual image not subjected to motion estimation.
According to a third aspect of the present invention, a method for decoding a sequence of encoded video images is provided. The decoding method performs motion compensation based on motion vectors for predicting image areas of an image to be decoded from image areas of a plurality of reference images. Said reference images being previously decoded images of said image sequence. The method subjects all images of said image sequence to motion compensation during decoding except predetermined individual images thereof. Further, the method disables all current reference images from being reference images except the predetermined image not subjected to motion estimation. The reference images are disabled after lapse of a predetermined delay after decoding the image of the predetermined individual image not subjected to motion estimation.
According to a fourth aspect, a decoder for decoding a sequence of encoded video images is provided. The decoder comprises a multi-frame buffer, a motion compensation unit and a buffer controller. The multi-frame buffer stores a plurality of reference images. The reference images being previously decoded images of said sequence of encoded video images. The motion compensation unit predicts image areas of an image to be decoded by image areas of a plurality of the reference images. The motion compensation unit subjecting all images of the sequence of encoded images to motion compensation except predetermined individual video images thereof. The buffer controller disables all current reference images from being reference images except the predetermined image not subjected to motion estimation wherein the buffer controller disables said reference images after lapse of a predetermined delay after decoding the image of the predetermined video image not subjected to motion compensation.
It is the particular approach of the present invention that during encoding/decoding of IDR images, i.e. images not subjected to motion estimation and compensation, the reference images stored in the multi-frame buffer are not immediately marked as unused for reference in accordance with the prior art encoding/decoding standards. In contrast, a reference picture refresh, i.e. a marking of all previous reference images as “unused”, is delayed by a predefined delay. Consequently, B-type images positioned prior to an IDR-type image can be encoded/decoded dependent on the subsequent IDR-type image. Hence, the coding efficiency can be improved without reducing the advantages of IDR-type images. In particular, the backward reference to prior B-type images does not effect the random access capabilities.
Preferably, the predetermined delay is defined by a particular number of pictures indicating the number of pictures after decoding the IDR picture before performing a decoder reference refresh by disabling reference images. Such a number of pictures may be set in advance for all IDR images or individually transmitted for each of the IDR images.
Alternatively, a separate refresh-flag is inserted into the encoded stream of data at that temporal position for performing the disabling at the decoder side. Such a flag enables an individual adaptation of the disabling position of reference images to the encoding process of video data, i.e. to the employed image type coding structure.
According to a further alternative, the disabling is performed immediately before encoding/decoding a P-type image following an IDR-type image.
Further preferred embodiments of the present invention are the subject matter of dependent claims.
Other embodiments and advantages of the present invention will become more apparent from the following description of preferred embodiments given in conjunction with accompanying drawings, in which:
Referring to
Images 640, 650 subsequent to the IDR-type image 630 are encoded in accordance with the conventional encoding/decoding structure as shown by the respective arrows 660. As from the IDR-type image 630 on, the referencing of images corresponds to that shown in the example structure of
The effect of the inventive encoding/decoding approach on the display order and decoding order (which is identical to the encoding order) is illustrated in
The shifting of the reference image disabling is illustrated in
Although there exists a plurality of possibilities in order to implement a reference picture refresh, preferably, one of the following alternative is used for this purpose:
Firstly, the predetermined delay value is submitted together with the IDR-image data by an encoder to a decoder. The submitted delay value determines the number of pictures after the IDR-picture for disabling the reference images.
Alternatively, a separate refresh flag is transmitted at the temporal position 830 for performing a reference picture refresh. The flag indicates that all reference features except the last IDR picture have to be refreshed immediately, i.e. “marked as unused”.
According to a further alternative embodiment, the reference picture refresh is executed immediately before the first P-type picture following an IDR-picture. This embodiment advantageously avoids any additional information to be transmitted to the decoding side.
The process of marking reference images in multi-frame buffer 200 as “unused” is illustrated in FIGS. 9 to 12.
Although the present invention starts from a multi-frame buffer 200 configuration as shown in
Summarising, the present invention delays a reference image refresh from the temporal position of an IDR-type image within a sequence of encoded images in order to enable an inter-prediction to images to be displayed prior to the IDR-type image.
| Number | Date | Country | Kind |
|---|---|---|---|
| 03015477.7 | Jul 2003 | EP | regional |
| Filing Document | Filing Date | Country | Kind | 371c Date |
|---|---|---|---|---|
| PCT/EP04/05575 | 5/24/2004 | WO | 1/6/2006 |