The present invention was first filed as U.S. Patent Application 61/119,696 filed on Dec. 3, 2008, which is incorporated herewith by reference in its entirety.
The present invention relates to the coding and decoding of digital video and image material. More particularly, the present invention relates to the efficient coding and decoding of transform coefficients in video and image coding.
This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
A video encoder transforms input video into a compressed representation suited for storage and/or transmission. A video decoder uncompresses the compressed video representation back into a viewable form. Typically, the encoder discards some information in the original video sequence in order to represent the video in a more compact form, i.e., at a lower bitrate.
Conventional hybrid video codecs, for example ITU-T H.263 and H.264, encode video information in two phases. In a first phase, pixel values in a certain picture area or “block” of pixels are predicted. These pixel values can be predicted, for example, by motion compensation mechanisms, which involve finding and indicating an area in one of the previously coded video frames that corresponds closely to the block being coded.
Alternatively, pixel values can be predicted via spatial mechanisms, which involve using the pixel values around the block to estimate the pixel values inside the block. A second phase involves coding a prediction error or prediction residual, i.e., the difference between the predicted block of pixels and the original block of pixels. This is typically accomplished by transforming the difference in pixel values using a specified transform (e.g., a Discrete Cosine Transform (DCT) or a variant thereof), quantizing the transform coefficients, and entropy coding the quantized coefficients. By varying the fidelity of the quantization process, the encoder can control the balance between the accuracy of the pixel representation (i.e., the picture quality) and the size of the resulting coded video representation (i.e., the file size or transmission bitrate). It should be noted that with regard to video and/or image compression, it is possible to transform blocks of an actual image and/or video frame without applying prediction.
The entropy coding mechanisms, such as Huffman coding, arithmetic coding, exploit statistical probabilities of symbol values representing quantized transform coefficients to assign shorter codewords to more probable signals. Furthermore, to exploit correlation between transform coefficients, pairs of transform coefficients may be entropy coded. Additionally, adaptive entropy coding mechanisms typically achieve efficient compression over broad ranges of image and video content. Efficient coding of transform coefficients is a significant part of the video and image coding codecs in achieving higher compression performance.
In accordance with one embodiment, the position and the value of the last non-zero coefficient of the block is coded, after which, the next coefficient grouping, e.g., (run, level) pair, is coded. If the cumulative sum of amplitudes (excluding the last coefficient) that are bigger than 1 is less than a predetermined constant value, and the position of the latest non-zero coefficient within the block is smaller than a certain location threshold, the next pair is coded. These processes are repeated until the cumulative sum of amplitudes (excluding the last coefficient) that are bigger than 1 is no longer less than the predetermined constant value, and/or the position of the latest non-zero coefficient within the block is no longer smaller than the certain location threshold. When this occurs, the rest of the coefficients are coded in level mode.
In accordance with another embodiment, the position and the value of the last non-zero coefficient of the block is coded, after which, the next coefficient grouping, e.g., (run,level) pair is coded. If the amplitude of the current level is greater than 1, it is indicated in the bitstream whether or not the code should continue coding in run mode or whether the coder is to switch to level mode. If run mode is indicated, the process continues and the next pair is coded. Otherwise, the rest of the coefficients are coded in level mode.
Various embodiments described herein improve earlier solutions to coding transform coefficients by defining more accurately, the position where switching from one coding mode to another should occur. This in turn improves coding efficiency. Signaling the switching position explicitly further enhances coding efficiency by directly notifying the coder where to switch coding modes.
These and other advantages and features of the invention, together with the organization and manner of operation thereof, will become apparent from the following detailed description when taken in conjunction with the accompanying drawings, wherein like elements have like numerals throughout the several drawings described below.
Embodiments of various embodiments are described by referring to the attached drawings, in which:
Various embodiments are directed to a method for improving efficiency when entropy coding a block of quantized transform coefficients (e.g., DCT coefficients) in video and/or image coding. Quantized coefficients are coded in two separate coding modes, run mode coding and level mode coding. “Rules” for switching between these two modes are also provided, and various embodiments are realized by allowing an entropy coder to adaptively decide when to switch between the two coding modes based on context information and the rules and/or by explicitly signaling the position of switching (e.g., explicitly informing the entropy coder whether or not it should switch coding modes).
The quantized transform coefficients from 124 are entropy coded at 126. That is, the data describing prediction error and predicted representation of the image block 112 (e.g., motion vectors, mode information, and quantized transform coefficients) are passed to entropy coding 126. The encoder typically comprises an inverse transform 130 and an inverse quantization 128 to obtain a reconstructed version of the coded image locally. Firstly, the quantized coefficients are inverse quantized at 128 and then an inverse transform operation 130 is applied to obtain a coded and then decoded version of the prediction error. The result is then added to the prediction 112 to obtain the coded and decoded version of the image block. The reconstructed image block may then undergo a filtering operation 116 to create a final reconstructed image 140 which is sent to a reference frame memory 114. The filtering may be applied once all of the image blocks are processed.
The decoder reconstructs output video by applying prediction mechanisms that are similar to those used by the encoder in order to form a predicted representation of the pixel blocks (using motion or spatial information created by the encoder and stored in the compressed representation). Additionally, the decoder utilizes prediction error decoding (the inverse operation of the prediction error coding, recovering the quantized prediction error signal in the spatial pixel domain). After applying the prediction and prediction error decoding processes, the decoder sums up the prediction and prediction error signals (i.e., the pixel values) to form the output video frame. The decoder (and encoder) can also apply additional filtering processes in order to improve the quality of the output video before passing it on for display and/or storing it as a prediction reference for the forthcoming frames in the video sequence.
In conventional video codecs, motion information is indicated by motion vectors associated with each motion-compensated image block. Each of these motion vectors represents the displacement of the image block in the picture to be coded (in the encoder side) or decoded (in the decoder side) relative to the prediction source block in one of the previously coded or decoded pictures. In order to represent motion vectors efficiently, motion vectors are typically coded differentially with respect to block-specific predicted motion vectors. In a conventional video codec, the predicted motion vectors are created in a predefined way, for example by calculating the median of the encoded or decoded motion vectors of adjacent blocks.
In accordance with various embodiments, it is assumed that there is at least one non-zero coefficient in the block to be coded. Coefficients are generally coded in a last to first coefficient order, where higher frequency coefficients are coded first. However, coding in any other order may be possible. If at any point during the coding process there are no more coefficients to be coded in the block, an end of block notification is signaled, if needed, and coding is stopped for the current block.
One method of entropy coding involves adaptively coding transform coefficients using two different modes. In a first mode referred to as “run” mode, coefficients are coded as (run,level) pairs. That is, a “run-level” refers to a run-length of zeros followed by a non-zero level, where quantization of transform coefficients generally results in higher order coefficients being quantized to 0. If the next non-zero coefficient has an amplitude greater than 1, the codec switches to a “level” mode. In the level mode, remaining coefficients are coded one-by-one as single values, i.e. the run values are not indicated in this mode.
For example, quantized DCT coefficients of an 8×8 block may have the following values.
Quantized DCT coefficients are ordered into a 1-D table as depicted in
2 0-2 0 1 0 1 0 0 1 0 1 0 0 0 0 0-1 0 . . . 0
The ordered coefficients are coded in reverse order starting from the last non-zero coefficient. First, the position and the value (−1) of the last-non-zero coefficient is coded. Then, the next coefficients are coded in the run mode resulting in the following sequences of coded (run,level) pairs.
Since the latest coded coefficient had an amplitude greater than 1, the coder switches to the level mode. In the level mode, the remaining coefficients (0 and 2) are coded one at a time after which the coding of the block is finished.
Such a coding scheme often results in the switching to level mode even if it would be beneficial to continue in the run mode (e.g., the number of bits produced by the codec would be fewer when continuing in run mode). This is because run coding is based upon coding information about runs of identical numbers instead of coding the numbers themselves. Switching between the modes may happen at a fixed position or at any point not implicitly determined.
In one embodiment, the position and the value of a last non-zero coefficient of the block is coded. If the amplitude of the last coefficient is greater than 1, the process proceeds to level coding. Otherwise, the next (run,level) pair is coded. If the amplitude of the current level is equal to 1, the coding process returns to the previous operation and the next pair is coded. Lastly, the rest of the coefficients are coded in level mode.
Various embodiments utilize multiple coefficients to decide whether or not to switch between run and level coding modes. Furthermore, various embodiments consider the position of the coefficients as part of the switching criterion. It should be noted that a cumulative threshold value of 3 is chosen according to empirical tests. However, other values could be used, where, e.g., the cumulative threshold L is made to depend on a quantization parameter (QP) value to reflect the changing statistics of different quality levels. Similarly, the value for the location threshold K can vary (e.g., based on the QP used in coding the block, coding mode of the block or the picture). Moreover, although the two modes described herein are the run mode and level mode, any two coding modes can be used.
As described above, various embodiments allow for adaptively deciding when to switch from, e.g., run mode to level mode, based upon an explicit signal indicating whether or not modes should be switched.
There are different methods of coding the switching indication in the bistream in accordance with various embodiments. For example, an indication can be implemented as a single bit stored in the bitstream. Alternatively, the indication can be combined with one or more other coding elements.
Various embodiments described herein improve earlier solutions to coding transform coefficients by defining more accurately, the position where switching from one coding mode to another should occur. This in turn improves coding efficiency. Signaling the switching position explicitly further enhances coding efficiency by directly notifying the coder where to switch coding modes.
The coded media bitstream is transferred to a storage 620. The storage 620 may comprise any type of mass memory to store the coded media bitstream. The format of the coded media bitstream in the storage 620 may be an elementary self-contained bitstream format, or one or more coded media bitstreams may be encapsulated into a container file. Some systems operate “live”, i.e. omit storage and transfer coded media bitstream from the encoder 610 directly to the sender 630. The coded media bitstream is then transferred to the sender 630, also referred to as the server, on a need basis. The format used in the transmission may be an elementary self-contained bitstream format, a packet stream format, or one or more coded media bitstreams may be encapsulated into a container file. The encoder 610, the storage 620, and the server 630 may reside in the same physical device or they may be included in separate devices. The encoder 610 and server 630 may operate with live real-time content, in which case the coded media bitstream is typically not stored permanently, but rather buffered for small periods of time in the content encoder 610 and/or in the server 630 to smooth out variations in processing delay, transfer delay, and coded media bitrate.
The server 630 sends the coded media bitstream using a communication protocol stack. The stack may include but is not limited to Real-Time Transport Protocol (RTP), User Datagram Protocol (UDP), and Internet Protocol (IP). When the communication protocol stack is packet-oriented, the server 630 encapsulates the coded media bitstream into packets. For example, when RTP is used, the server 630 encapsulates the coded media bitstream into RTP packets according to an RTP payload format. Typically, each media type has a dedicated RTP payload format. It should be again noted that a system may contain more than one server 630, but for the sake of simplicity, the following description only considers one server 630.
The server 630 may or may not be connected to a gateway 640 through a communication network. The gateway 640 may perform different types of functions, such as translation of a packet stream according to one communication protocol stack to another communication protocol stack, merging and forking of data streams, and manipulation of data stream according to the downlink and/or receiver capabilities, such as controlling the bit rate of the forwarded stream according to prevailing downlink network conditions. Examples of gateways 640 include MCUs, gateways between circuit-switched and packet-switched video telephony, Push-to-talk over Cellular (PoC) servers, IP encapsulators in digital video broadcasting-handheld (DVB-H) systems, or set-top boxes that forward broadcast transmissions locally to home wireless networks. When RTP is used, the gateway 640 is called an RTP mixer or an RTP translator and typically acts as an endpoint of an RTP connection.
The system includes one or more receivers 650, typically capable of receiving, de-modulating, and de-capsulating the transmitted signal into a coded media bitstream. The coded media bitstream is transferred to a recording storage 655. The recording storage 655 may comprise any type of mass memory to store the coded media bitstream. The recording storage 655 may alternatively or additively comprise computation memory, such as random access memory. The format of the coded media bitstream in the recording storage 655 may be an elementary self-contained bitstream format, or one or more coded media bitstreams may be encapsulated into a container file. If there are multiple coded media bitstreams, such as an audio stream and a video stream, associated with each other, a container file is typically used and the receiver 650 comprises or is attached to a container file generator producing a container file from input streams. Some systems operate “live,” i.e. omit the recording storage 655 and transfer coded media bitstream from the receiver 650 directly to the decoder 660. In some systems, only the most recent part of the recorded stream, e.g., the most recent 10-minute excerption of the recorded stream, is maintained in the recording storage 655, while any earlier recorded data is discarded from the recording storage 655.
The coded media bitstream is transferred from the recording storage 655 to the decoder 660. If there are many coded media bitstreams, such as an audio stream and a video stream, associated with each other and encapsulated into a container file, a file parser (not shown in the figure) is used to decapsulate each coded media bitstream from the container file. The recording storage 655 or a decoder 660 may comprise the file parser, or the file parser is attached to either recording storage 655 or the decoder 660.
The coded media bitstream is typically processed further by a decoder 660, whose output is one or more uncompressed media streams. Finally, a renderer 670 may reproduce the uncompressed media streams with a loudspeaker or a display, for example. The receiver 650, recording storage 655, decoder 660, and renderer 670 may reside in the same physical device or they may be included in separate devices.
A sender 630 according to various embodiments may be configured to select the transmitted layers for multiple reasons, such as to respond to requests of the receiver 650 or prevailing conditions of the network over which the bitstream is conveyed. A request from the receiver can be, e.g., a request for a change of layers for display or a change of a rendering device having different capabilities compared to the previous one.
Various embodiments described herein are described in the general context of method steps or processes, which may be implemented in one embodiment by a computer program product, embodied in a computer-readable medium, including computer-executable instructions, such as program code, executed by computers in networked environments. A computer-readable medium may include removable and non-removable storage devices including, but not limited to, Read Only Memory (ROM), Random Access Memory (RAM), compact discs (CDs), digital versatile discs (DVD), etc. Generally, program modules may include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of program code for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps or processes.
Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside, for example, on a chipset, a mobile device, a desktop, a laptop or a server. Software and web implementations of various embodiments can be accomplished with standard programming techniques with rule-based logic and other logic to accomplish various database searching steps or processes, correlation steps or processes, comparison steps or processes and decision steps or processes. Various embodiments may also be fully or partially implemented within network elements or modules. It should be noted that the words “component” and “module,” as used herein and in the following claims, is intended to encompass implementations using one or more lines of software code, and/or hardware implementations, and/or equipment for receiving manual inputs.
Individual and specific structures described in the foregoing examples should be understood as constituting representative structure of means for performing specific functions described in the following the claims, although limitations in the claims should not be interpreted as constituting “means plus function” limitations in the event that the term “means” is not used therein. Additionally, the use of the term “step” in the foregoing description should not be used to construe any specific limitation in the claims as constituting a “step plus function” limitation. To the extent that individual references, including issued patents, patent applications, and non-patent publications, are described or otherwise mentioned herein, such references are not intended and should not be interpreted as limiting the scope of the following claims.
The foregoing description of embodiments has been presented for purposes of illustration and description. The foregoing description is not intended to be exhaustive or to limit embodiments of the present invention to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of various embodiments. The embodiments discussed herein were chosen and described in order to explain the principles and the nature of various embodiments and its practical application to enable one skilled in the art to utilize the present invention in various embodiments and with various modifications as are suited to the particular use contemplated. The features of the embodiments described herein may be combined in all possible combinations of methods, apparatus, modules, systems, and computer program products.
Number | Date | Country | |
---|---|---|---|
61119696 | Dec 2008 | US |