The present invention relates to interframe wavelet video coding. More particularly, the present invention relates to a method and apparatus for interframe wavelet video coding with good video compression rate and scalability which can improve the scalability of the video compression rate and the performance of the Interframe Wavelet Video Coding on low bitrate.
As is known, the bitstream obtained by the related art of Interframe Wavelet Video Coding comprises two kind of information: 1. motion information (mainly the motion vector) and 2. wavelet transform coefficient and its related information. But now only the second kind of information is scalable and so the performance on low bitrate is not good.
Because the video scalability of the related art is mainly about the transformation factor and the wavelet factor that seem not enough when applying on low bitrate and the Motion Information (MI) still take a part in the whole bitstream, the present invention is to make MI to be scalable and to improve the performance of the Interframe Wavelet Video Coding on low bitrate.
Besides, there are mainly three kinds of video scalability: spatial scalability, temporal scalability and SNR scalability. The SNR scalability uses the feature on the bit plane to achieve gradual adjustment of the video frame.
Therefore, the main purpose of the present invention is to obtain good video compression rate and scalability on video coding so to improve the scalability of the video compression.
Another purpose of the present invention is to obtain scalability of the Motion Information (MI) to improve the performance of the Interframe Wavelet Video Coding on low bitrate.
To achieve the above purposes, the present invention comprises an encoder, a decoder and a puller to provide a video compression device capable of scalability which is to partition and encode MI to achieve scalability, and to transfer partitioned MI to a terminal according the scalability request so that the Ml is partitioned to be scalable and be coded according to the spatial precision, the temporal precision and the numerical precision; and the MI can accept a scalability request and corresponding MI data can be transferred after properly adjusting the above three precision. As a result, the present invention can have good video compression rate and scalability on video coding to improve the scalability of the video compression and the performance of the Interframe Wavelet Video Coding on low bitrate.
The present invention will be better understood from the following detailed description of preferred embodiments of the invention, taken in conjunction with the accompanying drawings, in which
The following descriptions of the preferred embodiment are provided to understand the features and the structures of the present invention.
Please refer to
The encoder 1 is for video input which comprises the followings.
A Motion Compensated Temporal Filtering (MCTF) analyzer 11 is to analyze each video frame on temporal axis and decompose the video frame into high-pass frames of high frequency and low-pass frames of low frequency by using a motion vector obtained from a motion estimator 15, so that an output of temporal high-pass frames and temporal low-pass frames are obtained by an input of the original video frames.
A spatial analyzer 12 is connected to the MCTF analyzer 11 and is to decompose the temporal high-pass frames and the temporal low-pass frames into spatial high-pass frames and spatial low-pass frames through Discrete Wavelet Transform (DWT) method, so that an output of spatial high-pass frames and spatial low-pass frames is obtained through DWT method by an input of the temporal high-pass frames and the temporal low-pass frames.
A DWT coefficients encoder 13 is connected to the spatial analyzer 12 and is to perform an encoding in a compression way on the spatial high-pass frames and the spatial low-pass frames obtained by the spatial analyzer 12, so that, an output of a compressed video content bitstream is obtained by an input of the spatial high-pass frames and the spatial low-pass frames obtained through DWT method.
A packetizer 14 is connected to the DWT coefficient encoder 13 and is to bundle the compressed video content bitstream and a compressed MI into a single compound compressed bitstream, so that an output of the single compound compressed bitstream is obtained by an input of the compressed video content bitstream and the compressed MI.
A motion estimator 15 is connected to the MCTF analyzer 11 and is to search for the motion vector for each partition of the video frame and continuously search through all partitions (as shown in
And an MI encoder 16 is connected to the packetizer 14 and the motion estimator 15 and is to split all motion vectors of all partitions into a base layer and a few enhancement layers and to apply entropy coding on the base layer and the enhancement layers (as shown in
Therein, the MI encoder 16 is to do partitioned coding to the MI according to three precisions of spatial precision, temporal precision, or numerical precision.
And, the spatial precision is a partitioned motion block.
And, the temporal precision is a number of frames per second.
And, the numerical precision is a precision of the arithmetic expression of a motion vector.
And, the MI encoder 16 is to help compress related information of the motion estimator 15.
The decoder 2 is for video output which comprises the followings.
A de-packetizer 21 is connected to the puller 3 and is to split a compound bitstream into a compressed video content bitstream and a compressed MI.
A DWT coefficient decoder 22 is connected to the de-packetizer 21 and is to apply compressed decoding on the spatial high-pass frames and the spatial low-pass frames that are obtained by the spatial analyzer 12, so that an output of the spatial high-pass frames and the spatial low-pass frames is obtained by an input of a compressed video content bitstream.
A spatial synthesizer 23 is connected to the DWT coefficient decoder 22 and is to rebuild temporal high-pass frames and temporal low-pass frames from the spatial high-pass frames and the spatial low-pass frames through Inverse Discrete Wavelet Transform (IDWT) method, so that an output of the temporal high-pass frames and the temporal low-pass frames is obtained through IDWT method by an input of the spatial high-pass frames and the spatial low-pass frames.
An MCTF synthesizer 24 is connected to the spatial synthesizer 23 and is to synthesize the temporal high-pass frames and the temporal low-pass frames into a video frame by using motion vectors, so that an output of the video frame is obtained by an input of the temporal high-pass frames and the temporal low-pass frames obtained through IDWT method.
And an MI decoder 25 is connected to the de-packetizer 21 and the MCTF synthesizer 24 and is to apply entropy decoding on the compressed MI and combine a base layer and one or more enhancement layers to form a motion vector, so that, through applying entropy decoding to a compressed MI, an output of an MI is obtained by an input of the compressed MI.
The puller 3 is connected to the encoder 1 and the decoder 2 and is to read bit-rate/frame-rate/image-size information to partition a compressed video content bitstream; to decide whether one or more enhancement layers are needed on the bit-rate/frame-rate/image-size; to send the MI of a base layer; and to combine the partitioned compressed video content bitstreams and a partitioned MI obtained by partitioning the MI of the enhancement layers according to the bit-rate/frame-rate/image-size, to form a compressed bitstream (as shown in
Therein, the method and apparatus is to partition an MI for scalability and to transfer a partition of the MI to a terminal to achieve the scalability.
The present invention of method and apparatus for interframe wavelet video coding is a method and apparatus to partition an MI to achieve scalability which applies partitioned encoding on an MI encoder 16 according to three precisions of spatial precision, temporal precision, and numerical precision and transfers data corresponding to the MI to achieve scalability of the MI after properly tuning the above three precisions.
Therein, the spatial precision is a partitioned motion block; the temporal precision is a number of frames per second; the numerical precision is a precision of the arithmetic expression of a motion vector; and the scalability is a capability of accepting demands according to one factor or a plurality of factors among bit-rate/frame-rate/image-size and the above three precisions.
And, the MI is a motion vector with the related data that helps to rebuild the motion vector.
And, the video compressing method can be an Interframe Wavelet Video Coding method or a video encoding method with motion information.
Accordingly, a novel method and apparatus for interframe wavelet video coding is obtained.
Please refer to
The second step is to do partitioned encoding by MI encoder 16. The motion vectors for the various levels obtained in the previous step are partitioned and encoded here. To achieve scalability, in the pull process, the puller 3 will decide the data size to be transferred according to the requested data amount (ex. based on bit-rate/frame-rate/image-size request) needed. So, motion vectors are partitioned, and total levels to be transferred is decided according to data amount needed. As shown in the example of
The third step is to write partitioned motion vectors to compressed bitstreams. Take the example in step 2 as an example, the motion vectors of the base layer and one or more enhancement layers are encoded separately and is written to the bitstreams.
The pull process of the puller 3 comprises the following steps.
Firstly, the compressed bitstreams is partitioned according to the bit-rate/frame-rate/image-size provided by the system. According to the bit-rate/frame-rate/image-size provided by the system, if the bit-rate/frame-rate/image-size is high, the base layer and several enhancement layers are transferred; but, if the bit-rate/frame-rate/image-size is low, only the base layer is transferred. By doing so, scalability can be achieved as requested by the system.
Secondly, the partitioned bitstreams are combined to form a new compressed bitstream. The final partitioned motion vector bitstream and the partitioned compressed video content bitstream is combined to a new bitstream which conforms to the data amount requested by the system.
After the pull process of the puller, the motion vectors obtained is read for decoding. In the present invention, the decoder will read the motion vectors after the pull process which can be the base layer or the base layer together with one or more enhancement layers.
Accordingly, the present invention is capable of achieving the followings:
The preferred embodiments herein disclosed are not intended to unnecessarily limit the scope of the invention. Therefore, simple modifications or variations belonging to the equivalent of the scope of the claims and the instructions disclosed herein for a patent are all within the scope of the present invention.