The invention relates to a method for encoding and decoding video data. When encoding a video signal to make it suitable for digital handling, such as transmission or storage, compression of the video data is used to optimize the use of available bandwidth and storage capacity. The good compression results are obtained with lossy encoding, wherein information of the original signal can not be fully recovered in the decoding stage.
Although with lossy encoding good results can be obtained, it is an object of the invention to provide an encoding method with which better compression results can be obtained. Better performance can lie in that with a similar compression rate or bandwidth, better decoded results are obtained or that similar decoded result is obtained with a better compression rate or smaller bandwidth. To obtain this objective, a method for encoding a video signal is provided according to claim 1.
From a video stream to be encoded a decimated frame sequence is formed by removing a number of frames of the video stream. Then the decimated frame sequence is temporally interpolated in order to make a good estimation of the decimated (i.e. skipped) frames. Consecutively, areas of the skipped-estimated frames are detected in which the estimation is inadequate, in that it does not meet a predetermined standard. By comparing the in the encoder still available skipped frame with the skipped-estimated frames, these areas can be detected, and residual information can be determined. Only the decimated frame sequence and the residual data for the detected areas will now be encoded, and inserted in an encoded bitstream. Preferably, the temporal interpolation is performed on locally decoded encoded frames of the decimated frame sequence in order to perform the temporal interpolation a frames that are also available in a decoder.
An encoded bitstream is decoded according to the invention, by extracting the residual data from the main bitstream. Consequently, the main bitstream data is interpolated using a similar interpolating process as used for the encoding. The residual data is then added to the interpolated frame sequence.
By using the encoding/decoding system according to the invention, a better quality/bandwidth ration can be obtained, because only relevant residual data is incorporated into the encoded signal.
The invention further relates to a method for decoding, an encoder, a decoder, an audiovisual device, a data container device, a computer program and a data carrier device on which a computer program is stored.
Particularly advantageous elaborations of the invention are set forth in the dependent claims. Further objects, elaborations, modifications, effects and details of the invention appear from the following description, in which reference is made to the drawing, in which
In
The encoder 20 also encodes a full encoded data stream, that is without discarding any frames due to temporal interpolation. This data stream is send to a decoder 30, suitable to decode the encoded data stream which in this example means a MPEG decoder. The decoded data stream 35 is a 50 Hz signal, as no frame where dropped in the encoding process. The data stream 35 is provided to an IP selector 40; the selector 40 performs the same temporal elimination as the encoder 20 performs on the original video input. The result is again a 12.5 Hz signal. This reduced signal is fed to a motion estimator 50, that is embodied in this example as a natural motion estimator. The estimator 50 performs a upconversion from 12.5 Hz to 50 Hz by estimating additional frames. The estimator 50 performs the same upconversion as later the decoder will perform when decoding the coded data stream. Any motion estimation method can be employed according to the invention. In particular good results can be obtained with motion estimation based on natural or true motion estimation as used in for example frame-rate conversion methods. A very cost efficient implemention is for example three-dimensional recursive search (3DRS) which is suitable for consumer applications, see for example the U.S. Pat. Nos. 5,072,293, 5,148,269, and 5,212,548. The motion-vectors estimated using 3DRS tend to be equal to the true motion, and the motion-vector field inhibits a high degree of spatial and temporal consistency. Thus, the vector inconsistency is not thresholded very often and consequently, the amount of residual data transmitted is reduced compared to non-true motion estimations.
The upconverted signal 55 is send to an evaluation unit 60 (as indicated with a minus sign). To the evaluation unit also the full data stream 35 is send (as indicated with a plus sign). The evaluation unit 60 compares the interpolated frames as determined by the motion estimator 50 with the actual frames. From the comparison is determined where the estimated frames differ from the actual frames. Differences in the respective frames are evaluated; in case the differences meet certain thresholds, the differential data is selected as residual data. The thresholds can for example be related to noticeable the differences are; such threshold criteria per se are known in the art. In this example the residual data is described in the form of meta blocks. The residual data stream 120 in the form of meta blocks is then put into a MPEG encoder 70. The residual data can be encoded using a private data channel as is provided for within a MPEG environment.
Finally, the main stream of the data and the residual data stream are combined by means of the multiplexer 80 to form a single output data stream 90. The output stream 90 can be transmitted (using for example a (wireless) data transmission connection) or stored or used otherwise.
In
Apart from the upconverted signal is also the decoded residual data from the decoder 140 forwarded to the combiner 160. The combiner 160 combines the information of the main data stream with the residual data stream. Such an operation per se is known in the art, and comprises replacing information, such as meta blocks, in the main data stream with respective residual information, such as meta blocks. The output signal of the combiner 160 is a 50 Hz frame rate video data stream.
In case the decoder that receives the data stream 90 is not equipped to detect the residual data stream, the main stream only is decoded. Therefore a usable video signal can be decoded, even with a decoder that is not fully compliant with the residual data signal. However, the decoded signal is not as good as the signal obtainable with the residual data correction.
The invention may be applied in various devices, for example a data transmission device, like a radio transmitter or a computer network router that includes input signal receiver means and transmitter means for transmitting a coded signal, such as an antenna or an optical fibre, may be provided with an image encoder device according to the invention that is connected to the input signal receiver means and the transmitter means. Furthermore, a decoder according to the invention can be implemented in for example a DVD recorder, and a PVR (HDD) recorder. An encoding and decoding system according to the invention can be implemented with for example internet video streaming services, and in-home (wireless) networks.
Good results can be obtained for a temporal decimation of 1 out of 2; typically less then 5-10% of the area of the skipped-estimated frames is detected as in need of residual information. Also a decimation of 1 out 4 frames yields good results. Even more frames can be skipped using the invention for applications that do not require the highest image quality.
The invention also relates to an encoder and a decoder for performing the above illustrated coding and decoding methods. In
In
In
The encoding of the video stream is generally similar in both the first and second embodiment. In the second embodiment (see
Parallel with the high accuracy interpolation, the data is also supplied to a simple temporal interpolator 210, of the type employed by the eventual decoder. The simple interpolator 210 yields a medium accuracy data stream that is provided to the above mentioned evaluator 220. The evaluator 220 compares the high and medium accuracy interpolations and yields a corrected vector stream to the multiplexer to be included in the residual information in for example the private data channel. The vector stream is also provided to a combiner 230 that combines the vector data with the medium accuracy interpolation result of the simple temporal interpolator 210. The combined signal is fed to the natural motion estimator 50′ that uses the information to adjust the interpolated frames. The subsequent residual data determination is similar to the first embodiment.
The resulting encoded data stream comprises the main stream data, the residual data, and the correction vector information. The bandwidth used is therefore slightly larger than in the first embodiment, but better quality results are obtained.
In decoding, shown in
In the examples of devices and methods described above, the residual data stream is encoded or decoded using the same type of encoding or decoding as the main data stream. It is likewise possible to encode or decode the residual data using a different type of encoding or decoding. For example, the encoding or decoding of the residual data stream may be specifically adapted to the residual data. In that case, a more efficient encoding may be obtained compared to using the same encoding or decoding for both the main data stream and the residual data stream. The increase of coding efficiency may for example be caused by the difference in correlation between the residual data and the main data, since in general there will be less correlation between consecutive frames in the residual data stream then between consecutive frames in the main data stream.
The encoder for the residual data may be some special or proprietary coding scheme, which may take into account the characteristics of the visual content of the residual data stream. For example, scattered non-empty blocks in the residual data could first be clustered in a larger group.
The encoder of
The video encoder 520 codes the video signal in a specific digital format, in this example a MPEG format. The encoder 520 also provides a full encoded data stream, that is without discarding any frames due to temporal interpolation. This data stream is sent to a decoder 540, suitable to decode the encoded data stream. In this example the decoder 540 is a MPEG decoder. The decoded data stream 535 is a 50 Hz signal, as no frames were dropped in the encoding process. The data stream 535 is provided to an IP selector 550; the selector 540 performs the same temporal elimination as the encoder 520 performs on the original video input. The result is again a 12.5 Hz signal. This reduced signal is fed to a motion estimator 570, that is embodied in this example as a natural motion estimator.
The estimator 570 performs a upconversion from 12.5 Hz to 50 Hz by estimating additional frames. The estimator 570 performs the same upconversion as the decoder will perform when decoding the coded data stream. In this example, the estimator 570 is a natural motion estimator. The upconverted signal 555 is send to an evaluation unit 560 (as indicated with a minus sign). To the evaluation unit 560 also the full data stream 535 is sent (as indicated with a plus sign). The evaluation unit 560 compares the interpolated frames as determined by the motion estimator 570 with the actual frames. From the comparison is determined where the estimated frames differ from the actual frames. The comparison may for example consist of checking the difference between the estimated frame and the actual frame against predetermined criterions.
The differences in the respective frames are evaluated; in case the differences meet certain thresholds reformat code is transmitted by the evaluation unit 560 to the video encoder 520 which indicates how the encoder should rebuild the respective frame. When the estimated frames are similar to the actual frames, the evaluation unit 560 transmits a skip code to the video encoder 520. The video encoder 520 interleaves the data from the evaluation unit 560 with the main data during coding. Thereby high coding efficiency is be achieved while the same components, e.g. the MPEG-2 coder and decoder, are used to encode or decode both the residual data and the main data. Furthermore, the actual frames and the skip code may easily be detected.
In
The video decoder 630 may decode an encoded data stream and is specifically suited to decode a date stream encoded with the encoder of
When the encoder and/or decoder of
Furthermore, a coded block pattern (cbp) code, as is known from section 8.4.5. of Haskell et all., “Digital video; an introduction to MPEG-2”, Kluwer, 1997, may be used. Such a CBP indicates which blocks in a macro-block are empty, that is in MPEG: which block have all zero discrete cosine transforms. Thereby, if only a part of a macro block or a frame is to be replaced with the actual frame or (macro-) block, the other parts may be indicated with the CBP whereby the amount of data is reduced.
If the invention is used in a MPEG context, an efficient choice for the coding of the base frames (i.e. the decimated data stream) is IPP-frame encoding; for the skipped frames B-frame coding is an effective choice, however other choices could be made as well.
In an advantageous embodiment, the full frame video sequence is obtained by temporally interpolating a relatively low frame rate video sequence such as a 24 Hz progressive movie sequence by a further interpolator of higher quality or accuracy than the interpolator used for interpolating the decimated frame sequence, the further interpolator being e.g. the above described complex temporal interpolator or complex natural motion, or a higher accuracy 2-3 pull down algorithm. The further interpolator is preferably a non-real time, offline interpolator. By using, in above embodiments, a higher quality further interpolator for interpolating a relatively low frame rate movie sequence, a movie temporal enhancement layer is created. In a decoder, the movie temporal enhancement layer is used in order to obtain decoded video with reduced movie judder. The decimation of the full frame video sequence can be performed efficiently by taking the low frame rate video sequence as the decimated video sequence directly. The movie temporal enhancement layer can also be combined with a spatial enhancement layer such that a backwards compatible bitstream is created with a spatial and temporal enhancement layer for improved video quality.
The invention is not limited to implementation in the disclosed examples of physical devices, but can likewise be applied in another device. In particular, the invention is not limited to physical devices but can also be applied in logical devices of a more abstract kind or in software performing the device functions. Furthermore, the devices may be physically distributed over a number of apparatuses, while logically regarded as a single device. Also, devices logically regarded as separate devices may be integrated in a single physical device.
The invention may also be implemented in a computer program for running on a computer system, at least including code portions for performing steps of a method according to the invention when run on a computer system or enabling a general propose computer system to perform functions of a computer system according to the invention. Such a computer program may be provided on a data carrier, such as a CD-rom or diskette, stored with data loadable in a memory of a computer system, the data representing the computer program. The data carrier may further be a data connection, such as a telephone cable or a wireless connection transmitting signals representing a computer program according to the invention.
Number | Date | Country | Kind |
---|---|---|---|
01205131.4 | Dec 2001 | EP | regional |
02077035.0 | May 2002 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB02/05500 | 12/16/2002 | WO | 6/16/2004 |