The present invention relates generally to the field of video coding and encoding. More specifically, the present invention relates to scalable video coding and decoding systems.
This section is intended to provide a background or context. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the claims in this application and is not admitted to be prior art by inclusion in this section.
In general, conventional video coding standards (e.g., MPEG-1, H.261/263/264) incorporate intra-frame or inter-frame predictions which can be used to remove redundancies within a frame or among the video frames in multimedia applications and services.
In a typical single-layer video codec, like H.264, a video frame is processed in macroblocks. If a macroblock (MB) is an inter-MB, the pixels in the MB can be predicted from the pixels in one or more reference frames. If a macroblock is an intra-MB, the pixels in the MB in the current frame can be predicted entirely from the pixels in the same video frame.
For both inter-MB and intra-MB, the MB can be decoded in the following steps:
At the encoder side, the prediction residuals can be the difference between the original pixels and their predictors. The residuals can be transformed and the transform coefficients can be quantized. The quantized coefficients can then be encoded using certain entropy-coding schemes.
If a MB is an inter-MB, following information related to mode decision can be coded. Using H.264 as an example, following information can include.
If the MB is an intra-MB, it can be necessary in some cases to code the following information. Again using H.264 as an example, the following information can include.
In either case, there can be a significant amount of bits spent on coding the modes and associated parameters and texture information that is the prediction residual.
Scalable video coding is a desirable feature for many multimedia applications and services used in systems with a wide range of capabilities. The systems could have different transmission bandwidths, employ decoders with a wide range of processing power, or have displays of different resolutions. Several types of video scalability schemes have been proposed, such as temporal, spatial and SNR scalability in order to achieve the optimal representation on different systems.
In some scenarios, it is desirable to transmit an encoded digital video sequence at some minimum or “base” quality, and in concert transmit an “enhancement” signal that may be combined with the minimum quality signal in order to yield a higher-quality decoded video sequence. Such an arrangement simultaneously allows some decoding of the video sequence by devices supporting some set of minimum capabilities (at the “base” quality), while enabling other devices with expanded capability to decode higher-quality versions of the same sequence, without incurring the increased cost associated with transmitting two independently coded versions of the same sequence.
In some situations, more than two levels of quality may be desired. To achieve that, multiple “enhancement” signals can be transmitted, each building on the “base” quality signal plus all lower-quality “enhancement” signals. Such “base” and “enhancement” signals are referred to as “layers” in the field of scalable video coding. One type of enhancement layer itself can be separated into small units and each small unit can provide incremental quality improvement of fine granularity. This is usually referred to as a Fine granularity scalability (FGS) layer. A scalable video codec, such as JSVM1.0 which is the reference software for the scalable video coding standardization by Joint Video Team between MPEG and ITU/VCEG (“Joint Scalable Video Model 1.0 (JSVM1.0), JVT-N024, January 2005, Hong Kong, China”), may generate multiple FGS quality levels on top of certain base layers in multiple coding passes. In some implementations, all these FGS quality levels are considered as belonging to one FGS layer. For example, under certain configuration, JSVM1.0 could generate one QCIF base layer, and 2 QCIF FGS quality levels, and one CIF enhancement layer for a video frame. In this case, 2 QCIF FGS quality levels belong to the same FGS layer.
In order to achieve good coding efficiency, inter-layer prediction modes can be used for reducing the redundancy among the layers. In each inter-layer prediction mode, the information that has already been coded in the base layer can be used in improving the coding efficiency of the enhancement layer. Inter-layer prediction modes can be used in predicting the mode and motion information in the enhancement layer from that in the base layer or in predicting the texture in the enhancement layer from that in the base layer. Residual prediction is one inter-layer texture prediction mode in which the reconstructed prediction residual of the base layer can be used in reducing the amount of prediction residual to be coded in the enhancement layer. So generally, using a scalable video codec, each video frame can be coded in one or more layers. Two types of scalable layers can be of interest, discrete layers and layers that can be partially decoded. A discrete layer usually is not partially decoded, otherwise the reconstructed video will have major artifacts and the decodability of enhancement layers above this layer can be affected. A partially decodable layer is a layer that even if it is partially decoded, the reconstructed video can still have reasonable quality and the enhancement layers above this layer can still be decoded with certain graceful degradation. In JSVM1.0, the first layer, the spatial enhancement layer and the coarse granularity SNR enhancement layer are examples of the discrete layer. Also in that scalable codec, an FGS (Fine Granularity Scalability) layer can be a partially decodable layer based on the definition given above. In the following discussion, the FGS layer will be used interchangeably with partially decodable layer. However, it should be noted that the partially decodable layer could also have scalability of relatively large granularity.
For residual prediction mode, a residual prediction flag can be coded for a macroblock to indicate whether residual prediction has been used for this macroblock. In some cases conditional coding of the residual prediction flag can be used to reduce the amount of bits spent on coding the residual prediction flags. If the base layer reconstructed prediction residual is zero, residual prediction normally does not help. In this case, the value of the flag can be set to 0 and not coded at all. However, if the base layer residual information available to the decoder is not the same as that available to the encoder, the conditional coding of residual prediction flag may not work properly. As such, there is a need for an improved scheme for coding a residual prediction flag in a scalable video coding system.
One embodiment of the invention relates to an improved scheme for coding the residual prediction flag. In one embodiment, conditional coding of the residual prediction flag can be used only if all the base layers are discrete layers. If some base layers are discrete layers and some base layers are FGS layers, the residual predication flag is coded. The residual prediction flag can be coded under contexts which depend upon whether the reconstructed prediction residual of the discrete base layers is zero or not, as well as possibly other information such as the value of residual prediction flags of neighboring macroblocks and/or differences between motion vectors in the current MB and the base layer MB. In an alternative embodiment, the residual prediction flag is always coded, however it is coded in certain contexts as described herein.
Other features and advantages of the present invention will become apparent to those skilled in the art from the following detailed description. It should be understood, however, that the detailed description and specific examples, while indicating preferred embodiments of the present invention, are given by way of illustration and not limitation. Many changes and modifications within the scope of the present invention may be made without departing from the spirit thereof, and the invention includes all such modifications.
Exemplary embodiments present methods, computer code products, and devices for efficient enhancement layer encoding and decoding. Embodiments can be used to solve some of the problems inherent to existing solutions. For example, these embodiments can be used to improve the overall coding efficiency of a scalable coding scheme.
As used herein, the term “enhancement layer” refers to a layer that is coded differentially compared to some lower quality reconstruction. The purpose of the enhancement layer is that, when added to the lower quality reconstruction, signal quality should improve, or be “enhanced.” Further, the term “base layer” applies to both a non-scalable base layer encoded using an existing video coding algorithm, and to a reconstructed enhancement layer relative to which a subsequent enhancement layer is coded.
As noted above, embodiments include program products comprising computer-readable media for carrying or having computer-executable instructions or data structures stored thereon. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer. By way of example, such computer-readable media can comprise RAM, ROM, EPROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a computer-readable medium. Thus, any such connection is properly termed a computer-readable medium. Combinations of the above are also to be included within the scope of computer-readable media. Computer-executable instructions comprise, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Any common programming language, such as C or C++, or assembly language, can be used to implement the invention.
The device 12 of
The exemplary embodiments are described in the general context of method steps or operations, which may be implemented in one embodiment by a program product including computer-executable instructions, such as program code, executed by computers in networked environments. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of program code for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps. Software and web implementations could be accomplished with standard programming techniques, with rule based logic, and/or other logic to accomplish the various database searching steps, correlation steps, comparison steps and decision steps. It should also be noted that the words “module” as used herein and in the claims is intended to encompass implementations using one or more lines of software code, and/or hardware implementations, and/or equipment for receiving manual inputs.
Turning now to residual prediction flag coding, the reconstructed prediction residual of a base layer can be used to reduce the amount of residual to be coded in an enhancement layer.
(C1-E0)−(B1-B0)=C1−(E0+(B1-B0))
Such a coding mode does not always help, i.e., encoding “(C1-E0)−(B1-B0)” is not always more efficient than encoding “(C1-E0)”. A flag, commonly called the residual prediction flag, can be used to indicate whether such a mode is used in encoding the prediction residual of the current MB in the enhancement layer. For example, a flag of value 0 can indicate that residual prediction mode is not used in coding the current MB, and a flag of value 1 can indicate that residual prediction mode is used. If the base layer reconstructed prediction residual is zero, residual prediction may not help. In this case, the value of the flag can be set to 0 and does not need to be coded at all. This is called conditional coding of the residual prediction flag.
According to one embodiment of the present invention, the residual prediction flag is conditionally coded only if all the base layers are discrete layers. In this case, if the base-layer reconstructed prediction residual that can be used for residual prediction of the current enhancement layer is zero, the value of the residual prediction flag can be inferred to be 0 and the flag does not need to be coded. If some of the base layers are FGS layers, the residual prediction flag is coded with certain contexts. With context-based coding, the residual flags with one context can be coded separately from the residual flags with another context. A set of symbols being coded can be classified according to the contexts, which can be calculated from the information that is already coded, into sub-sets with different probability distributions to improve the overall coding efficiency. The coding contexts for coding the residual prediction flag can depend on the value of the discrete base-layer reconstructed prediction residual calculated from a function of the reconstructed prediction residuals of the discrete base layers. As one particular example, the coding contexts for coding the residual prediction flag can depend whether the discrete base-layer reconstructed prediction residual calculated from a function of the reconstructed prediction residuals of the discrete base layers is zero or not. Alternatively, other information such as the value of the residual prediction flags of neighboring MBs, and the differences between motion vectors of the current MB and motion vectors of the base layer MB can be used in conjunction with the value of the reconstructed prediction flag to determine the residual prediction flag coding context. The discrete base layer normally should be fully reconstructed so the decoder can properly decode the residual prediction flag. There are different ways of calculating the base-layer reconstructed prediction residual to be used for residual prediction of the current enhancement layer from the reconstructed residuals of multiple base layers. One example of such a function is to set the base-layer reconstructed prediction residual to be used for residual prediction of the current layer, say layer n, equal to the reconstructed prediction residual of the immediate base layer, layer (n−1), if the residual prediction mode is not used in the coding of the corresponding MB in layer (n−1), otherwise, if the residual prediction mode is used in the coding of the corresponding MB in layer (n−1), the base-layer reconstructed prediction residual from the lower layers is added to the reconstructed residual of the MB in layer (n−1). Another example of such a function is to always set the base-layer reconstructed prediction residual to the reconstructed prediction residual of the immediate base layer, layer n−1, no matter whether the residual prediction mode is used in coding the corresponding MB in the layer n−1.
In another embodiment of the invention, the residual prediction flag is always coded, regardless of whether or not all of the base layers are discrete layer. In this case, the residual prediction flag can be coded using certain contexts, such as the ones discussed above.
If all of the base layers are discrete layer, the device determines whether the discrete base-layer reconstructed prediction residual, which can be calculated from a function of the reconstructed prediction residuals of all the discrete base layers, is zero 104. If it is, the residual prediction flag is not coded 106. If the discrete base-layer reconstructed prediction residual, which can be calculated from a function of the reconstructed prediction residuals of all the base layers, is nonzero, the residual prediction flag can be encoded 110. Optionally, the device can determine the context for coding the residual prediction flag 108 as discussed above.
In an alternative embodiment, the residual prediction flag is always coded, however it is coded in certain contexts as described herein.
If all of the base layers are discrete layer, the device determines whether the discrete base-layer reconstructed prediction residual, which can be calculated from a function of the reconstructed prediction residuals of all discrete base layers, is zero 204. If it is, the residual prediction flag is not decoded 206. If the discrete base-layer reconstructed prediction residual, which can be calculated from a function of the reconstructed prediction residuals of all discrete base layers, is not zero, the residual prediction flag can be decoded 210. Optionally, the device can determine the context for decoding the residual prediction flag 208 as discussed above.
In an alternative embodiment, the residual prediction flag is always decoded, however it is decoded in certain contexts as described herein.
While several embodiments of the invention have been described, it is to be understood that modifications and changes will occur to those skilled in the art to which the invention pertains. Accordingly, the claims appended to this specification are intended to define the invention precisely.
Number | Date | Country | |
---|---|---|---|
60687058 | Jun 2005 | US |