This disclosure is based upon European Application No. 02406109.5 filed Dec. 17, 2002, and International Application No. PCT/IB2003/005852, filed Dec. 9, 2003, the contents of which are incorporated by reference.
The present invention relates to a method of selecting among N “Spatial Video CODECs” where N is an integer number greater than 1, the optimum “Spatial Video CODEC” for a same input signal I. In this new technique (hereafter referred to as “Dynamic Coding”) for digital video coding, “Spatial Video CODEC” is understood as the combination of any transform of the input signal, followed by a quantization of the transform coefficients and a corresponding entropic coder.
Video Coding is an important issue in all application fields where digital video information has to be stored on a digital support or transmitted over digital networks. Several solutions have been proposed in the last 20 years and standardizations efforts have been undertaken to define a unified syntax.
Standard video coding schemes have a rigid structure. They take into account the context of specific, well-defined applications requiring video coding, and propose an optimized, albeit limited, solution. This explains the number of existing international recommendations that have been defined for specific applications. For example, the ITU-T H.261 standard is designed for tele-conferencing and video-telephony applications, MPEG-1 for storage on CD-ROM, MPEG-2 for wideband TV broadcast, MPEG-4 for low-bitrate coding with multimedia functionalities and H264 for very low bit-rate video coding.
The strategy adopted by classical video coding schemes is prompted by the fact that no a single universal coding technique can be applied with optimum results in every context. In fact, the performance of a “Spatial Video CODEC” depends on several application specific parameters, such as: the type of the data to be compressed (still pictures, video, stereo imagery, and so on), the nature of the visual data (natural, synthetic, text, medical, graphics, hybrid), the target bitrate, the maximum acceptable delay (ranging from few milliseconds to off-line), the minimum acceptable quality (spatial and temporal resolution), the type of communication (point to point, broadcast, multi-point to point, etc . . . ), and the set of functionalities required (scalability, progressiveness, etc.). Often these parameters such as the nature of the input signal or the available bandwidth may change in time with a consequent variation of the performances of the selected “Spatial Video CODEC”. In the following table, major specifications for a few of the most critical applications for video coding are listed. Reads “Mbps” Mega bit per second, “Kbps” kilobit per second, “fps” frame per second, “MP2MP” multi point to multi point, “P2P” point to point, “P2MP” point to multipoint
Given the wide variations in the requirements from application to application, it is clear that a coding scheme tuned for a specific application will be superior, for that application, to an universal coding technique that attempts to find a suitable compromise among different constraints. However, even the optimum “Spatial Video Video CODEC” for a specific set of constraints may be a sub-optimal solution when the parameters of the application are allowed to change through time. For example, in several multimedia scenarios, the video input combines static scenes to highly dynamic ones. Moreover, the sequences may be natural images, or synthetic scenes or combination of the two (text, graphs, and natural images) with a consequent variation of the statistical properties of the input signal.
The present invention proposes a method which is a suitable solution to achieve an optimum video coding despite the changes of the above discussed properties of the input signal.
The new proposed paradigm to Video coding is based on the following idea: dynamically evaluate the performances of several coders given the input data and the external constraints of the application. This evaluation is performed on a block-based basis. The input image can be organized as a set of rectangular blocks whose size can range from 2 by 2 pixels to the size of the input image. The results of such an evaluation are used to select the best performing among the available “Spatial Video CODECs” for that given input data. A prototype implementing this strategy has proved its superiority against standard approaches.
The invention will now be disclosed with the help of the accompanying drawings.
In order to efficiently encode any digital signal it is important to exploit all the redundancies that the signal contains. When the input signal is a video signal there are three different kind of redundancies that can be exploited: the spatial, the temporal and the statistical redundancy. All existing video coding schemes apply different techniques to exploit more or less all these redundancies. Based on these observations the general scheme of any video coding is represented in
Because of practical constraints, it is not possible to imagine that all the possible solutions can be evaluated. Thus the approach adopted by the standards is to fix one combination that provides the best compromise in some specific scenario. This constraints the efficiency of the “Spatial Video CODEC”, but simplifies its implementation. What is lacking in order to dynamically adapt the scheme to the external constraints is an efficient prediction of the performances of each tool.
We will explain how it is possible to evaluate several tools to dynamically select the one that exploits optimally the spatial redundancy of the signal, by taking into account the properties of the previous tool used to exploit the temporal redundancy and the following tool used to exploit the statistic redundancy. This evaluation is fast and efficient and boosts the coding performances of standard approaches.
Hereafter, we refer to both the spatial and entropic redundancy modules as “Spatial Video CODEC”. In this document, as previously mentioned, “Spatial Video CODEC” is understood as the combination of any transform of the input signal, followed by a quantization of the transform coefficients and a corresponding entropic coder.
A simplified version of MPEG-like standard video encoder is shown in
The technique proposed in this document can be applied in the same context as the one described above. The scheme is the same as the one in
In order to optimally exploit the spatial redundancies, it is possible to evaluate not only one single transform (as the DCT in the standard approaches) associated to a unique quantization and entropic coder, but an arbitrary number of other “Spatial Video CODECs” composed of any possible transform with corresponding quantization and entropic coder. As in a standard approach, a rate-distortion algorithm is used to provide to the Dynamic Coding block an indication on the expected distortion and rate. In our implementation, this indication comes in the form of a quality parameter Q that defines the quality of the encoded frames. When fixing the quality parameter Q, the rate will vary according to the statistical properties of the input signal, otherwise, it is possible to chose the right Q in order to provide a constant bitrate coded video stream.
The basic “Dynamic Coding” block is illustrated in
In this document we propose a new procedure that is able to efficiently compare the performances of different “Spatial Video CODECs” in two steps: Normalization and Evaluation. The normalization step is performed offline, while the evaluation step is performed on the output of each SCn. In the normalization step, all the “Spatial Video CODECs” that are to be evaluated are aligned in terms of the quality parameter Q. In the evaluation step, the rate-distortion performances of each normalized “Spatial Video CODEC” are predicted and the one with the best rate-distortion ratio is selected. The exact normalization and evaluation procedure are detailed hereafter.
The normalization step requires the definition of a quality parameter Q. The parameter Q must have a minimum value and a maximum value. In the preferred implementation, Q is any integer from 0 to 100. Q is the result of a rate-distortion algorithm that controls the encoding process to obtain one of the following:
All the “Spatial Video CODECs” are normalized or “aligned” as a function of the same parameter Q (quality) so as to provide the same distortion for the same value of Q. i.e. for the same input and the same Q parameter, all normalized “Spatial Video CODECs” should provide a compressed frame with similar distortion, but possibly differing rate.
In the proposed implementation, the “Spatial Video CODECs” are aligned according to the MSE (Mean Square Error) distortion measure. The alignment is performed by defining the following relationship between Q and the MSE distortion measure:
where ƒ(Q) is a function of the quality parameter Q. As described by Mallat in “Analysis of Low Bit Rate Image Transform Coding” (In IEEE Transactions on Signal Processing, VOL. 46, No. 4, April 1998), for Spatial CODECs using a uniform quantization, the relationship between the quality parameter Q and the quantization step Δ can be expressed as: ƒ(Q)=Δ. In case of non-uniform quantization the relationship between MSE and Q has to be respected for each Q.
In a preferred implementation, in order to perform the normalization, the “Spatial Video CODECs” are uniformly quantized with a step Δ defined as:
Δ=2(C
By combining equation (1) and (2) we obtain that the distortion expressed as the MSE is function of Q defined by:
where C1 controls the minimal and maximal quality and C2 the variation of the distortion according to Q. In particular the following values have been chosen: C1=5 and C2=24. This means that the distortion is doubled for each decrease of 24 of the Q parameter.
The proposed normalization procedure is not exact, but as we show in
Moreover we claim that the error En in the alignment of the n-th “Spatial Video CODEC” is small compared to the predicted distortion:
Dn=ƒ′(Q)+En, |En|<<|Dn| (4)
We have statistically evaluated equation (4) and the performances of the proposed “Spatial Video CODEC” alignment and we report the results in
where the real distortion measure is Dn and the predicted distortion measure is MSE. It turns out that most (>98% of the evaluated tests) of the predicted distortions introduce an approximation below 20% of the exact distortion.
The evaluation step requires the computation of the rate R and of the distortion D for each Normalized “Spatial Video CODEC” given a selected quality parameter Q and the current input block to be coded.
According to these values a decision is made on which “Spatial Video CODEC” has the best rate-distortion performance. This decision may be taken independently on the rate or on the distortion values. In the first case, the “Spatial Video CODEC” with minimum rate will be selected in the second, the “Spatial Video CODEC” with the minimum distortion will be selected. However, a better decision is obtained if both the rate and the distortion are taken into account. This is possible by applying a Lagrangian optimization of the two values:
Ln=Rn+λDn, (6)
with n [2, N] representing the index of an evaluated “Spatial Video CODEC” over the N total number of “Spatial Video CODECs” and λ representing the Lagrangian multiplier that provides the relative weight between rate and distortion in selecting the best compromise. In this context, the selection of the best “Spatial Video CODEC” is done by choosing the one with the minimum Ln.
In order to perform a Lagrangian optimization, it is necessary to compute the rate, the distortion and the optimal λ for each “Spatial Video CODEC”. In the following we describe the procedure adopted in this invention.
The Lagrange multiplier is responsible for weighting the contribution of the rate and of the distortion in the overall evaluation of the performances of a “Spatial Video CODEC”. Its value should be computed according to the rate-distortion curves of each “Spatial Video CODEC” as a function of a given input. This is clearly too computationally expensive to be considered in a practical application. In our approach we find an approximation for λ that is a function of the quality parameter Q. The starting point is the model of high rate proposed by Mallat. This model states that:
D=k·2−2R, (7)
where k is a constant depending on the “Spatial Video CODEC” and the input signal. From equation (7) we defines the relationship between rate and distortion as:
Merging (8) and (6) we obtain:
We can now minimize L as a function of the only distortion D. To do this, we differentiate over D and we find the roots:
The final solution is:
If the model of distortion of equation (3) is assumed valid, equation (11) may be written as:
Equation (12) defines the optimum λ as a function of the quality parameter Q. By referring to the scheme of
Given the parameter Q, and the input I, computing R and D can be performed in an precise, but computationally expensive way by applying the “Spatial Video CODEC” to the input and measuring the size of the encoded stream and the introduced distortions. This procedure is summarized in
In the preferred implementation, an approximate prediction of both R and D is obtained without the need of performing the Quantization, the Entropic Coder, the Scaling and the Inverse Transform steps. The prediction can be computed in a much more computationally efficient way and the introduced approximation does not affect the correct choice of the best “Spatial Video CODEC”.
In the preferred implementation, the rate is estimated as a linear function of the number of zeros obtained after quantization of the coefficients while the distortion is approximated from the distribution of the transformed coefficients. In particular, before quantization, the histogram of the transform coefficient is computed. The rate is predicted as a function of the quantization step Δ:
where Nxi is the number of coefficient with an amplitude equal to xi and the parameter α is derived from experimental results. Note that in a preferred implementation Δ is related to Q by equation (2), thus the rate is a simple function of the quality parameter Q defined by the rate-distortion algorithm.
The distortion is predicted from the distribution of the transformed coefficients:
where xi is the amplitude of the coefficients and Nxi is the number of coefficient with an amplitude of xi. Note that in a preferred implementation Δ is related to Q by equation (2), thus the distortion is also a simple function of the quality parameter Q defined by the rate-distortion algorithm.
The proposed procedure to estimate the rate and the distortion of the “Spatial Video CODECs” is displayed in
In order to illustrate the principle of the dynamic coder, two examples of measure of the performance will be presented. Both example have been computed using a frame based dynamic coder. The first example shown in
The next example (
The proposed dynamic coder offers several advantages over standard approaches:
The dynamic codec is particularly suitable for all those application where high quality video signals have to be encoded in real-time at low-bit rates (<500 Kbps) for interactive TV, Corporate TV, TV over ADSL, TV broadcast. In particular for all those input signals characterized by the presence of natural, synthetic and mixed scenes.
Number | Date | Country | Kind |
---|---|---|---|
02406109 | Dec 2002 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB03/05852 | 12/9/2003 | WO | 00 | 3/29/2006 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2004/056124 | 7/1/2004 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5047852 | Hanyu et al. | Sep 1991 | A |
5394473 | Davidson | Feb 1995 | A |
5778192 | Schuster et al. | Jul 1998 | A |
5959560 | Said et al. | Sep 1999 | A |
Number | Date | Country |
---|---|---|
0 866 426 | Sep 1998 | EP |
9715146 | Apr 1997 | WO |
98 38800 | Sep 1998 | WO |
Number | Date | Country | |
---|---|---|---|
20060215751 A1 | Sep 2006 | US |