The present invention related generally to video coding and image coding. More particularly, the present invention relates to scalable video coding and scalable image coding.
Quantization is an important step in video coding. Quantization is a process by which each sample in a video signal is rounded to one of a finite number of values. By changing the quantization parameters, one can control both the bit-rate and the quality of the compressed video. The quantization and dequantization processes defined in typical video coding standards can be explained using the following equations.
A Function floor (y) gives the largest integer number that is smaller than or equal to y. In equations (1) and (2), x is the original transform coefficient; Q(x) is the quantized transform coefficient; R(x) is the reconstructed transform coefficient; q is the quantization step size; f is the rounding offset; and g is the reconstruction offset.
The quantization/dequantization processes in different standards may use different values for f and g. For example, in the H.264 video codec, the encoder can vary f, normally within the range between 0 and 1/2 , in order to obtain an optimal coding performance, and g is always equal to 0. Such a quantizer is illustrated in
A signal-to-noise ratio (SNR) scalable video stream has the property that the video of a lower quality level can be reconstructed from a partial bitstream. With this feature, a device can properly reconstruct a video, but at a lower quality, if it only decodes part of the bitstream due to some limitations such as channel bandwidth or processing power.
One method of generating a SNR scalable video bitstream involves generate a base layer using a normal non-scalable video coder, such as a H.264 encoder, and then generating the enhancement layers with additional coding tools. Such an approach is particularly important because of the backward compatibility consideration. This approach is also taken by the International Telecommunication Union's Joint Video Team (JVT) in developing new scalable video coding standard. The latest reference software, Joint Scalable Video Model version 2.0 (JSVM2), has just been released. JSVM2 is able to generate a scalable video stream including an Advanced Video Coding (AVC)-compliant base layer and additional enhancement layers, such as a spatial enhancement layer, a coarse granularity SNR enhancement layer, and a fine granularity SNR enhancement layer.
Conventionally, the quantizer used in SNR enhancement layer coding is similar to that used in base layer coding. For example, JSVM2 uses the same quantizer in both base layer and SNR enhancement layer coding. JSVM2 simply quantizes the error signal resulting from the base layer coding with smaller qp. This approach is referred to as re-quantization and is illustrated at the left side of
One method of generating more uniform quantization intervals is to perform what is referred to as “embedded quantization.” In embedded quantization, decision levels of a coarse-scale quantizer are always aligned with the decision levels of a fine-scale quantizer. In one design of such a quantizer, a base layer refinement interval is split into two halves of equal size of q/2, and the deadzone is split into three interval including two new refinement intervals and the new deadzone. Two new refinement intervals have the size of q/2 that is the same as that of other refinement intervals. Such an embedded quantizer is illustrated in
The advantage of quantization methodology depicted in
The present invention provides for a flexible dequantizer design for use in SNR enhancement layer coding. In the present invention, the decoder performs the normal uniform dequantization of a coefficient based upon the quantization index and the nominal quantization step size in order to obtain a nominal reconstruction level. The nominal quantization step size may not be the same as the actual quantization step size used in the quantization process. The decoder then adjusts the result by adding the reconstruction offset in order to obtain the optimal reconstruction level for a coefficient. The best reconstruction levels are calculated at the encoder side and the reconstruction offsets, which are calculated as the differences between the optimal reconstruction levels and the nominal reconstruction levels, are transmitted to the decoder. The reconstruction offset is dependent on the quantization index. It is also dependent on the classification of the coefficients. For example, luminance and chrominance signals can have their own set of reconstruction offsets so that luma and chroma coefficients can be quantized differently. With the present invention, an efficient methodology is used to code the reconstruction offsets so that the coding overhead on these numbers is minimal.
The present invention provides for a number of important advantages over conventional approaches. One major problem with the conventional requantization systems is that the refinement intervals are improperly handled. Although using embedded quantization solves this problem, the simple embedded quantization methodology possesses inflexibility in splitting the deadzone. In contrast, with this invention, the design of the dequantizer allows the quantizer to treat the refinement intervals as they are treated in embedded quantization. In addition, the quantizer can perform optimal splitting of the deadzone to obtain the best coding performance.
These and other objects, advantages and features of the invention, together with the organization and manner of operation thereof, will become apparent from the following detailed description when taken in conjunction with the accompanying drawings, wherein like elements have like numerals throughout the several drawings described below.
The present invention provides for a design of a dequantizer, as well as an exemplary quantizer used in the coding of SNR enhancement layers, particularly Fine Granularity Scalability (FGS) SNR enhancement layers. In the present invention, the decoder performs a process similar to uniform dequantization based upon the quantization index and nominal quantization step size in order to obtain a nominal reconstruction level. The decoder adjusts the result by adding the reconstruction offset. The best reconstruction levels are calculated at the encoder side and the reconstruction offsets, which are calculated as the differences between the optimal reconstruction levels and the nominal reconstruction levels, are transmitted to the decoder. The reconstruction offset is dependent upon the quantization index. With the present invention, an efficient methodology is used to code the reconstruction offsets so that the coding overhead on these numbers is minimal. Luminance and chrominance signals can have their own sets of reconstruction offsets so that luma and chroma coefficients can be quantized differently. Luma is a component which represents lightness, while chroma comprises two components that represent color, disregarding lightness.
The dequantizer can be extended. Coefficients can be classified into more categories instead of just being separated into luma and chroma coefficients to allow for more flexibility in the base layer quantizer design.
In the present invention, the coefficients can be first classified into coefficient sets based upon the color component, transform type and frequency. The sets of the coefficients are categorized into groups based upon the statistics of each coefficient set. The grouping information as well as the optimal reconstruction offsets for each group is signaled.
With the present invention, the enhancement layer quantizer can perform optimal splitting of the deadzone resulting from base layer quantization in order to achieve the optimal coding performance. This flexibility in splitting the deadzone also allows for much-needed control on the bit rate. The deadzone of luma and chroma coefficients can be differently split to have the optimal balance between luma and chroma quality.
A process of implementation of one embodiment of the present invention is depicted generally in
After quantization and at step 710, the adaptive quantizer calculates the optimal reconstruction level for each quantization interval. The reconstruction offset is calculated at step 720 as the difference between the optimal reconstruction level and the nominal reconstruction level that is calculated from the quantization index and nominal quantization step size. The reconstruction offsets are coded in the bitstream at step 730. The decoder then decodes quantized content at step 740 by performing some simple dequantizaton such as uniform reconstruction to obtain the nominal reconstruction level and adding the reconstruction offset.
In an exemplary implementation of the present invention, the reconstruction offsets can be transmitted at the frame level or the slice level. Specifically, for the implantation in SVC, which is based upon H.264, the reconstruction offsets can be transmitted in the slice header of an FGS slice. In the syntax description below, a slice of slice type “PR” is an FGS slice (progressive refinement.) The syntax for the coding of reconstruction offsets is as follows:
The semantics for coding of reconstruction offsets is as follows.
recon_offset_bit_depth is used to indicate how many bits are needed to represent the absolute value of one reconstruction offset.
recon_offset_shift_bits plus—3 is used to indicate the precision of the reconstruction offset. The normalized reconstruction offset is recon_offset_fixed/(1<<(recon_offset_bit_depth _recon_offset_shift_bits)).
num_recon_offsets indicates how many reconstruction offsets are stored for this group of coefficients.
recon_offset is the reconstruction offset. The number of bits to be read is recon_offset_bit_depth if the recon_offset_all_non_positive_flag is 1. The number of bits to be read is recon_offset_bit_depth+1 if the recon_offset_all_non_positive_flag is 0. A recon_offset of value 0 indicates that the corresponding quantization index is not valid and should not be encountered during the decoding process. The actual reconstruction offset is converted from recon_offset using the equation: recon_offsets_arr[num_recon_offsets]=recon13 offset-(1<<recon_offset_bit_depth)+1
If the quantization index decoded from the bitstream is too large and the corresponding reconstruction offset is not found in the lookup table, then the last valid reconstruction offset should be used.
Handling closeloop in FGS coding. Normally in the coding of a coefficient in an FGS layer, only collocated information in the base layer reconstructed frame is used as the prediction. If a different prediction is used, e.g, by performing motion compensation in the FGS layer, an original signal which is a coefficient calculated from the prediction residuals may not be confined within the previous quantization interval, and value of the refinement coefficient is no longer limited to 0 or 1. In the present invention, the refinement information is not limited to 0 or 1 if closeloop is supported. A flag can be transmitted to the decoder so that the decoder can perform entropy decoding and dequantizatoin accordingly. The coefficient can transit to a different interval if the refinement information is not either 0 or 1. The reconstruction offset to the coefficient also depends upon the value of the refinement coefficient.
Non-valid reconstruction offset. A non-valid reconstruction offset is explicitly signaled. Some quantization indices may not be decoded from the bitstream at all. This could occur when the encoder chooses not to split some intervals. In one embodiment of the present invention, non-valid reconstruction offsets can be used for detecting the error in the decoding process caused by either the error in the bitstream or a problem in decoder implementation. If the decoder decodes a quantization index that has a non-valid reconstruction offset, an error has occurred. When a reconstruction offset for a coefficient is not valid, it means that the corresponding quantization interval in the base layer is not split in the enhancement layer. In one embodiment of the invention, no refinement information corresponding this interval needs to be coded. The same is true for the deadzone. The deadzone is normally split into a definite number of new intervals (normally three intervals). The non-valid reconstruction offset can also be used in signaling how the deadzone is actually split. The quantization index for the interval in the enhancement layer is inferred from base layer quantization index if the quantization index has a non-valid reconstruction offset.
Adjustment of chroma quality with respect to luma. In the H.264 video codec, the quantization parameter used for coding the chrominance signals is different from that for coding the luminance signals. In this discussion, chroma_qp_index_offset, an additional parameter that controls the mapping process, is not considered and is assumed to always be 0. The following lookup table is used for deriving chroma qp QPc from luma qp Qpy. It should be noted that, instead of using a normal quantization step size, the H.264 codec uses a quantization parameter from which the quantization step size q can be derived.
In this situation, if base layer is coded at a QPy of 43, then the chroma qp QPc of value 37 should be used. The luma qp used in FGS enhancement layer coding is 37 (=43−6) because the quantization step size in the enhancement layer is usually half of that in the base layer. If chroma is treated the same as luma in FGS layer quantization, then the chroma qp used in the enhancement layer will be 31 (37−6). This is much smaller than 34, which is mapped from luma qp of 37.
In the present invention, chroma reconstruction offsets can be different from luma reconstruction offsets. This makes it possible to quantize luma and chroma differently in the enhancement layer. For example, the rounding offset used in quantizing the chroma can be set smaller than that used in quantizing the luma. With proper adjustment on the rounding offsets, a quality balance between luma and chroma that is similar to that in the H.264 base layer can also be achieved in the enhancement layer.
The present invention is described in the general context of method steps, which may be implemented in one embodiment by a program product including computer-executable instructions, such as program code, executed by computers in networked environments. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of program code for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps.
Software and web implementations of the present invention could be accomplished with standard programming techniques with rule based logic and other logic to accomplish the various database searching steps, correlation steps, comparison steps and decision steps. The present invention can be implemented directly in software using any common programming language, e.g. C/C++ or assembly language. This invention can also be implemented in hardware and used in consumer devices. It should also be noted that the words “component” and “module” as used herein and in the claims is intended to encompass implementations using one or more lines of software code, and/or hardware implementations, and/or equipment for receiving manual inputs.
The foregoing description of embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the present invention to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the present invention. The embodiments were chosen and described in order to explain the principles of the present invention and its practical application to enable one skilled in the art to utilize the present invention in various embodiments and with various modifications as are suited to the particular use contemplated.
This application claims priority to United States Provisional Patent Application No. 60/701,172, filed Jul. 21, 2005 and incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
60701172 | Jul 2005 | US |