Not applicable.
Not applicable.
The amount of video data needed to depict even a relatively short film can be substantial, which may result in difficulties when the data is to be streamed or otherwise communicated across a communications network with limited bandwidth capacity. Thus, video data is generally compressed prior to being communicated across modern day telecommunications networks. Video compression devices often use software and/or hardware at the source to code the video data prior to transmission, thereby decreasing the quantity of data needed to represent digital video images. The compressed data is then received at the destination by a video decompression device that decodes the video data. Due to limited network resources, improved compression and decompression techniques that increase compression ratios without substantially reducing image quality are desirable.
In one embodiment, the disclosure includes an apparatus comprising a processor configured to receive a current block of a video frame, and determine a coding mode for the current block based on only a bit rate cost function, wherein the coding mode is selected from a plurality of available coding modes, and wherein calculation of the bit rate cost function does not consider distortion of the current block.
In another embodiment, the disclosure includes a method comprising receiving a current block of a video frame, and determining a coding mode for the current block based on only a bit rate cost function, wherein the coding mode is selected from a plurality of available coding modes, and wherein calculation of the bit rate cost function does not consider distortion of the current block.
In yet another embodiment, the disclosure includes an apparatus used in video coding comprising a processor configured to for each of a plurality of pixels in a block, determine a difference with one of a plurality of corresponding pixels in a reference block, wherein each difference is based on two color values of a pair of compared pixels, and if each of the differences is within a pre-set boundary, generate information to signal the block as a skipped block, wherein the information identifies the block and the reference block, and include the information into a bitstream without further encoding of the block.
In yet another embodiment, the disclosure includes a method used in video coding comprising for each of a plurality of pixels in a block, determining a difference with one of a plurality of corresponding pixels in a reference block, wherein each difference is based on two color values of a pair of compared pixels, and if each of the differences is within a pre-set boundary, generating information to signal the block as a skipped block, wherein the information identifies the block and the reference block, and including the information into a bitstream without further encoding of the block.
These and other features will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims.
For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.
It should be understood at the outset that, although an illustrative implementation of one or more embodiments are provided below, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or in existence. The disclosure should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.
Typically, video media involves displaying a sequence of still images or frames in relatively quick succession, thereby causing a viewer to perceive motion. Each frame may comprise a plurality of picture elements or pixels, each of which may represent a single reference point in the frame. During digital processing, each pixel may be assigned an integer value (e.g., 0, 1, . . . or 255) that represents an image quality or characteristic, such as luminance or chrominance, at the corresponding reference point. In use, an image or video frame may comprise a large amount of pixels (e.g., 2,073,600 pixels in a 1920×1080 frame), thus it may be cumbersome and inefficient to encode and decode (referred to hereinafter simply as code) each pixel independently. To improve coding efficiency, a video frame is usually broken into a plurality of rectangular blocks or macroblocks, which may serve as basic units of processing such as prediction, transform, and quantization. For example, a typical N×N block may comprise N2 pixels, where N is an integer greater than one and is often a multiple of four.
In a working draft of the International Telecommunications Union (ITU) Telecommunications Standardization Sector (ITU-T) and the International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC), High Efficiency Video Coding (HEVC), which is poised to be the next video standard, new block concepts have been introduced. For example, coding unit (CU) may refer to a sub-partitioning of a video frame into rectangular blocks of equal or variable size. In HEVC, a CU may replace macroblock structure of previous standards. Depending on a mode of inter or intra prediction, a CU may comprise one or more prediction units (PUs), each of which may serve as a basic unit of prediction. For example, for intra prediction, a 64×64 CU may be symmetrically split into four 32×32 PUs. For another example, for an inter prediction, a 64×64 CU may be asymmetrically split into a 16×64 PU and a 48×64 PU. Similarly, a PU may comprise one or more transform units (TUs), each of which may serve as a basic unit for transform and/or quantization. For example, a 32×32 PU may be symmetrically split into four 16×16 TUs. Multiple TUs of one PU may share a same prediction mode, but may be transformed separately. Herein, the term block may generally refer to any of a macroblock, CU, PU, or TU.
Depending on the application, a block may be coded in either a lossless mode (i.e., no distortion or information loss) or a lossy mode (i.e., with distortion). In use, high quality videos (e.g., with YUV subsampling of 4:4:4) may be coded using a lossless mode, while low quality videos (e.g., with YUV subsampling of 4:2:0) may be coded using a lossy mode. Sometimes, a single video frame or slice (e.g., with YUV subsampling of either 4:4:4 or 4:2:0) may employ both lossless and lossy modes to code a plurality of regions, which may be rectangular or irregular in shape. Each region may comprise a plurality of blocks. For example, a compound video may comprise a combination of different types of contents, such as texts, computer graphics, and natural-view content (e.g., camera-captured video). In a compound frame, regions of texts and graphics may be coded in a lossless mode, while regions of natural-view content may be coded in a lossy mode. Lossless coding of texts and graphics may be desired, e.g. in computer screen sharing applications, since lossy coding may lead to poor quality or fidelity of texts and graphics, which may cause eye fatigue. Current HEVC test models (HMs), such as HM 3.0, may code natural-view content fairly efficiently. However, the current HMs may lack a lossless coding mode for certain videos, thus their coding efficiency and speed may be limited.
In lossy coding schemes of current HMs, a bit rate and distortion of a coded video may need to be balanced. To achieve low distortion, often more information (e.g., pixel values or transform coefficients) needs to be encoded, leading to more encoded bits and thus a higher bit rate. On the other hand, to achieve a smaller bit rate, certain information may need to be removed. For example, through a two-dimensional transform operation, pixel values in a spatial domain are converted to transform coefficients in a frequency domain. In a transform coefficient matrix, high-index transform coefficients (e.g., in bottom-right corner) corresponding to small spatial features may have relatively small values. Thus, in a subsequent quantization operation, larger quantization coefficients may be applied on the high-index transform coefficients. After integer rounding, a number of zero-valued transform coefficients may be created in the high-index positions, which may then be skipped in following encoding steps. Although quantization may lower the bit rate, information for small spatial features may be lost in the coding process. The lost information may be irretrievable, thus distortion may be increased and coding fidelity lowered in the decoded video.
In use, there may be a plurality of coding modes to code a video frame. For example, a particular slice of the frame may use various block partitions (number, size, and shape). For each partition, if inter prediction is to be used, there may be various motion vectors associated with one or more reference frames. Otherwise, if intra prediction is to be used, there may be various reference pixels corresponding to various intra prediction modes. Each coding mode may lead to a different bit rate and/or distortion. Thus, a rate-distortion optimization (RDO) module in a video encoder may be configured to select a best coding mode from the plurality of coding modes to determine an optimal balance or trade-off between the bit rate and distortion.
Current HMs may jointly evaluate an overall cost of bit rate and distortion by using a joint rate-distortion (RD) cost. For example, a bit rate (denoted as R) and a distortion cost (denoted as D) may be combined into a single joint rate-distortion (RD) cost (denoted as J), which may be mathematically presented as:
J=D+λR
where λ is a Lagrangian coefficient representing the relationship between a bit rate and a particular quality level.
Various mathematical metrics may be used to calculate distortion, such as a sum of squared distortion (SSD), sum of absolute error (SAE), sum of absolute differences (SAD), mean of absolute difference (MAD), or mean of squared errors (MSE). Using any of these distortion metrics, the RDO process may attempt to find a coding mode that minimizes J.
In current HMs, the selection of an optimal coding mode in an encoder may be a complex process. For example, for every available coding mode (denoted as m) of every block, the encoder may code the block using mode m and calculate R, which is the number of bits required to code the block. Then, the encoder may reconstruct the block and calculate D, which is a difference between the original and reconstructed block. Then, the encoder may calculate the mode cost Jm using the equation above. This process may be repeated for every available coding mode. Then, the encoder may choose a mode that gives the minimum Jm. The RDO process in the encoder may be a computationally intensive process, since there may be potentially hundreds of possible coding modes, e.g., based on various combinations of block sizes, inter prediction frames, intra prediction directions. Both R and D of the block may need to be calculated hundreds of times before the best coding mode may be determined.
In addition, when a sequence of video frames is being coded, sometimes certain regions may remain stable for a relatively long period of time. For example, in video conferencing applications, a background region of each user may remain unchanged for tens of minutes. In current encoders, the RDO module may still evaluate bit rate and/or distortion for blocks in these regions, which may consume valuable computation resource and time.
Disclosed herein are systems and methods for improved video coding. The disclosure provides a lossless coding mode and a forced skip mode, which may complement a lossy coding mode in coding of a video such as a compound video. The lossless mode may include a transform bypass coding scheme and a transform without quantization coding scheme. In lossless coding of a block, since no distortion (or only slight distortion) may be induced, the RDO mode selection process may be simplified. In an embodiment, only a bit rate portion of a joint RD cost is preserved. Thus, from a plurality of available coding modes, the RDO process may only need to determine an optimal coding mode that leads to a least number of bits. A reconstructed block may not need to be compared with an original source block, which may save both computation resource and time. Furthermore, if a video frame or slice comprises one or more regions which remain stable for a relatively long period (e.g., tens of seconds or minutes), the RDO process may implement a forced skip mode in the one or more regions. In an embodiment of the forced skip mode, if a CU is found to be an exact match (or an approximate match with difference in a pre-set boundary) with a corresponding reference CU in a reference frame, the CU may be skipped in the rest of the encoding steps. Due to implementation of the simplified RDO mode selection scheme and the forced skip mode, videos may be coded both faster and more efficiently.
In use, there may be a module before an encoder to analyze contents of a video frame, and identify certain regions (e.g., texts and/or graphics regions) where lossless encoding is desired. Information or instructions regarding which regions to encode in a lossless mode may be passed to the encoder. Based on the information, the encoder may encode the identified regions using the lossless mode. Alternatively, a user may manually define certain regions to be encoded using a lossless mode, and provide the encoder with information identifying these regions. Thus, a video (e.g., a compound video) may be encoded in a lossless mode and/or a lossy mode, depending on information received by the encoder. Herein, the lossless encoding mode may include transform bypass encoding and transform without quantization encoding. These two lossless encoding schemes as well as a lossy encoding scheme are described herein.
Likewise, based on information contained in a received bitstream, a video decoder may decode a video frame using a lossless mode and/or a lossy mode. The lossless decoding mode may include transform bypass decoding and transform without quantization decoding. The two lossless decoding schemes as well as a lossy decoding scheme are described herein.
The RDO module 110 may be configured to make logic decisions for one or more of other modules. In an embodiment, based on one or more previously encoded frames, the RDO module 110 may determine how a current frame (or slice) being encoded is partitioned into a plurality of CUs, and how a CU is partitioned into one or more PUs and TUs. For example, homogeneous regions of the current frame (i.e., no or slight difference from previously encoded frames) may be partitioned into relatively larger blocks, and detailed regions of the current frame (i.e., significant difference from previously encoded frames) may be partitioned into relatively smaller blocks.
In addition, the RDO module 110 may control the prediction module 120 by determining how the current frame is predicted. The current frame may be predicted via inter and/or intra prediction. Inter prediction (i.e., inter frame prediction) may exploit temporal redundancies in a sequence of frames, e.g. similarities between corresponding blocks of successive frames, to reduce compression data. In inter prediction, the RDO module 110 may determine a motion vector of a block in the current frame based on a corresponding block in one or more reference frames. On the other hand, intra prediction may exploit spatial redundancies within a single frame, e.g., similarities between adjacent blocks, to reduce compression data. In intra prediction, reference pixels adjacent to a current block may be used to generate a prediction block. Intra prediction (i.e., intra frame prediction) may be implemented using any of a plurality of available prediction modes or directions (e.g., 34 modes in HEVC), which may be determined by the RDO module 110. For example, the RDO module 110 may calculate a sum of absolute error (SAE) for each prediction mode, and select a prediction mode that results in the smallest SAE.
Based on logic decisions made by the RDO module 110, the prediction module 120 may utilize either one or more reference frames (inter prediction) or a plurality of reference pixels (intra prediction) to generate a prediction block, which may be an estimate of a current block. Then, the current block may be subtracted by the prediction block, thereby generating a residual block. The residual block may comprise a plurality of residual values, each of which may indicate a difference between a pixel in the current block and a corresponding pixel in the prediction block. Then, all values of the residual block may be scanned and encoded by the entropy encoder 130 into an encoded bitstream. The entropy encoder 130 may employ any entropy encoding scheme, such as context-adaptive binary arithmetic coding (CABAC) encoding, exponential Golomb encoding, or fixed length encoding, or any combination thereof. In the transform bypass encoding scheme 100, since the residual block is encoded without a transform step or a quantization step, no information loss may be induced in the encoding process.
To facilitate continuous encoding of video frames, the residual block may also be fed into the reconstruction module 140, which may generate either reference pixels for intra prediction of future blocks or reference frames for inter prediction of future frames. If desired, filtering may be performed on the reference frames/pixels before they are used for inter/intra prediction. A person skilled in the art is familiar with the functioning of the prediction module 120 and the reconstruction module 140, so these modules will not be further described. It should be noted that
For a current block being decoded, a residual block may be generated after the execution of the entropy decoder 210. In addition, information containing a prediction mode of the current block may also be decoded by the entropy decoder 210. Then, based on the prediction mode, the prediction module 220 may generate a prediction block for the current block based on previously decoded blocks or frames. If the prediction mode is an inter mode, one or more previously decoded reference frames may be used to generate the prediction block. Otherwise, if the prediction mode is an intra mode, a plurality of previously decoded reference pixels in reference blocks may be used to generate the prediction block. Then, the reconstruction module 230 may combine the residual block with the prediction block to generate a reconstructed block. Additionally, to facilitate continuous decoding of video frames, the reconstructed block may be used in a reference frame to inter predict future frames. Some pixels of the reconstructed block may also serve as reference pixels for intra prediction of future blocks in the same frame.
In use, if an original block is encoded and decoded using lossless schemes, such as the transform bypass encoding scheme 100 and the transform bypass decoding scheme 200, no information loss may be induced in the entire coding process. Thus, barring distortion caused during transmission, a reconstructed block may be exactly the same with the original block. This high fidelity of coding may improve a user's experience in viewing video contents such as texts and graphics.
During lossless coding of certain regions in a video frame, sometimes it may be desirable to include a transform step into the coding process. For example, for some blocks of a text region, an added transform step may generate a shorter bitstream compared to a transform bypass coding scheme. In an embodiment, a RDO module may be configured to determine whether to include the transform step. For example, a test transform may be performed to convert a residual block to a matrix of transform coefficients. If a number of bits needed to encode transform coefficients may be smaller compared to a number of bits needed to encode residual values without transform in the residual block, the transform step may be included. Otherwise, the transform step may be bypassed.
The transform without quantization encoding scheme 300 may be implemented in a video encoder, which may receive an input video comprising a sequence of video frames. The RDO module 310 may be configured to control one or more of other modules, and may be the same or similar to the RDO module 110 in
Instead of being entropy encoded directly, the residual block in the transform without quantization encoding scheme 300 may be first transformed from a spatial domain to a frequency domain by the transform module 330. The transform module 330 may convert the values of the residual block (i.e., residual values) to a transform matrix comprising a plurality of transform coefficients. The transform module 330 may be implemented using any appropriate algorithm, such as a discrete cosine transform (DCT), a fractal transform (FT), or a discrete wavelet transform (DWT). In use, some algorithms, such as a 4×4 integer transform defined in H.264/advanced video coding (AVC), may not induce any information loss, while other algorithms, such as an 8×8 integer DCT transform defined in the HEVC working draft, may induce slight information loss. For example, since the 8×8 integer DCT transform in HEVC may not be fully reversible, recovered values of the residual block after the inverse transform module 350 may be slightly different (e.g., up to ±2 values) from the original values of the residual block before the transform module 330. When slight information loss is induced, the encoding may be near lossless instead of lossless. However, compared with a quantization step, the information loss caused by the transform step may be insignificant or unnoticeable, thus the transform without quantization encoding scheme 300 may also be included herein as part of a lossless coding scheme.
Transform coefficients generated by the transform module 330 may be scanned and encoded by the entropy encoder 340 into an encoded bitstream. The entropy encoder 340 may be the same or similar with the entropy encoder 130. To facilitate continuous encoding of video frames, the transform coefficients may also be fed into the inverse transform module 350, which may perform the inverse of the transform module 330 and generate an exact version (i.e., lossless) or an approximation (i.e., near lossless) of the residual block. Then, the residual block may be fed into the reconstruction module 360, which may generate either reference pixels for intra prediction of future blocks or reference frames for inter prediction of future frames. The reconstruction module 360 may be the same or similar to the reconstruction module 140 in
After execution of the entropy decoder 410, a matrix of transform coefficients may be generated, which may then be fed into the inverse transform module 420. The inverse transform module 420 may convert the transform coefficients in a frequency domain to residual pixel values in a spatial domain. In use, depending on whether an algorithm used by the inverse transform module 420 is fully reversible, an exact version (i.e., lossless) or an approximation (i.e., near lossless) of the residual block may be generated. The inverse transform module 420 may be the same or similar with the inverse transform module 350 in
In addition, information containing a prediction mode of the current block may also be decoded by the entropy decoder 410. Based on the prediction mode, the prediction module 430 may generate a prediction block for the current block. The prediction module 430 may be the same or similar with the prediction module 220 in
In use, if an original block is encoded and decoded using near lossless schemes, such as the transform without quantization encoding scheme 300 and the transform without quantization decoding scheme 400, only slight distortion may be induced in the coding process. Thus, barring significant distortion caused during transmission, a reconstructed block may be almost the same with the original block. Transform without quantization coding schemes may be desired sometimes, as they may achieve higher compression ratio than the transform bypass schemes, without noticeable sacrifice of coding fidelity.
As mentioned previously, in current encoders a RDO module may select an optimal coding mode based on a joint RD cost. On the contrary, in either a transform bypass lossless coding scheme or a transform without quantization coding scheme disclosed herein, a quantization step may be bypassed. Without information loss induced by quantization, the distortion of an original current block due to encoding may be, if any, negligible. Thus, a RDO module (e.g., the RDO module 110 in
J=λR
Based on the disclosed bit rate cost function, the RDO module may test a subset or all of a plurality of available coding modes for a current block. Tested coding modes may vary in block size, motion vector, inter prediction reference frame, intra prediction mode, or reference pixels, or any combination thereof. For each tested coding mode, a number of bits may be calculated for a coded residual block of the current block or a coded matrix of transform coefficients for the current block. After comparing all resulted bit numbers, the RDO module may select a coding mode that results in a least number of bits.
In comparison with current encoders which calculate both D and R in determining the optimal coding mode, the disclosed coding mode selection scheme may be relatively simpler. For example, with removal of the D portion, a reconstructed block may not need to be compared with its original block anymore. Thus, several calculation steps may be removed from the evaluation process in each coding mode, which may save coding time and computation resources. Considering there may be potentially hundreds of coding modes for the current block in the evaluation, the savings may be significant and encoding may be made faster, which may greatly facilitate real-time encoding process.
Sometimes it may be unnecessary to code an entire video frame using a lossless mode. For example, regions containing natural-view contents (e.g., captured by a low resolution camera) in a compound video may not require lossless coding, because the original video quality may already be limited, or because distortion due to lossy coding may not be significant.
The lossy encoding scheme 500 may be implemented in a video encoder, which may receive a sequence of video frames. The RDO module 510 may be configured to control one or more of other modules. Based on logic decisions made by the RDO module 310, the prediction module 320 may utilize either reference frames or reference pixels to generate a prediction block. Then, a current block from the input video may be subtracted by the prediction block to generate a residual block. The residual block may be fed into the transform module 530, which may convert residual pixel values into a matrix of transform coefficients.
In contrast to the transform without quantization encoding scheme 300, in the lossy encoding scheme 500, the transform coefficients may be quantized by the quantization module 540 before being fed into the entropy encoder 550. The quantization module 550 may alter the scale the transform coefficients and round them to integers, which may reduce the number of non-zero coefficients. Consequently, a compression ratio may be increased at a cost of information loss.
Quantized transform coefficients generated by the quantization module 540 may be scanned. Non-zero-valued coefficients may be encoded by the entropy encoder 550 into an encoded bitstream. The quantized transform coefficients may also be fed into the de-quantization module 560 to recover the original scale of the transform coefficients. Then, the inverse transform module 570 may perform the inverse of the transform module 530 and generate a noisy version of the original residual block. Then, the lossy residual block may be fed into the reconstruction module 580, which may generate either reference pixels for intra prediction of future blocks or reference frames for inter prediction of future frames.
In an embodiment, if desired, all of the aforementioned encoding schemes, including the transform bypass encoding scheme 100, the transform without quantization encoding scheme 300, and the lossy encoding scheme 500, may be implemented in a single encoder. For example, when encoding a compound video, the encoder may receive information regarding which regions should be encoded in a lossless mode and/or which regions should be encoded in a lossy mode. Based on the information, the encoder may encode certain regions using a lossy mode and other regions using a lossless mode. In the lossless mode, a RDO module (e.g., the RDO module 110 in
For a decoder to properly reconstruct an encoded video frame, it should recognize one or more encoding schemes that have been used to encode the video frame. Since lossless encoding may be applied only on some regions of the video frame (referred to hereinafter as lossless encoding regions), lossy encoding may be applied on the other regions (referred to hereinafter as lossy or regular encoding regions). Information signaling lossless encoding regions and/or lossy encoding regions may be conveyed in a bitstream that carries the encoded video frame. In use, such information may be packed in a high level syntax structure, such as a sequence parameter set (SPS) or a picture parameter set (PPS) of the bitstream. A SPS or PPS may be a key normative part of the bitstream, and may be defined by a video coding standard. After receiving of the bitstream, the decoder may extract region indication information from the SPS or PPS, and then reconstruct each region according to its encoding mode. In an embodiment, the SPS or PPS may include a number of rectangular lossless encoding regions as well as information identifying their positions in the video frame (e.g., top-left and bottom-right coordinates, or top-right and bottom-left coordinates). In another embodiment, the SPS or PPS may include a number of rectangular lossy encoding regions as well as information identifying their positions in the video frame (e.g., top-left and bottom-right coordinates, or top-right and bottom-left coordinates).
In some applications, such as sharing a screen during a video conference, certain regions of a video may remain stable across a plurality of video frames. In this case, region indication information may only change at a relatively low frequency (e.g., once in tens of seconds), thus bitrate overhead caused by this signaling method may be negligible.
Within a lossless encoding region, a transform bypass scheme and/or a transform without quantization scheme may be used. To allow proper decoding, a bitstream may also contain information regarding which blocks have been encoded via the transform bypass scheme and which blocks via the transform without quantization scheme. In an embodiment, two transform bypass flags may be introduced for each PU in the lossless encoding region. A luminance (luma) transform bypass flag may indicate whether a transform step is bypassed (or skipped) in the coding of luma pixels of a PU, and a chrominance (chroma) transform bypass flag may indicate whether a transform step is bypassed in the coding of chroma pixels of the PU. For example, if a transform module (e.g., the transform module 330 in
Both the luma and chroma transform bypass flags may be encoded by an entropy encoder (e.g., the entropy encoder 130 in
In an embodiment, the luma and chroma components of a PU may share a same lossless coding scheme, and both components may bypass or include a transform step in their coding process. In this case, a single transform bypass flag may be used for both components. Compared with separate transform bypass flags for the luma and chroma components, the single transform bypass flag may lead to less signaling overhead in the bitstream. Moreover, it should be noted that, although transform bypass flags (luma and/or chroma) are set on the PU level in the descriptions above, if desired, the transform bypass flags may also be similarly set on a TU level, which may result in finer granularity but more signaling overhead.
Next, in step 704, based on received information, the method 700 may determine if a region (e.g., rectangular) currently being encoded is a lossless encoding region. If the condition in the block 704 is met, the method 700 may proceed to step 706 to encode the current region in a lossless mode (e.g., using the transform bypass encoding scheme 100 and/or the transform without quantization encoding scheme 300). Otherwise, the method 700 may proceed to step 730 to encode the current region in a lossy mode (e.g., using the lossy encoding scheme 500).
Next, in step 706, a residual block may be generated for each block of the current region. To generate the residual block, a RDO module (e.g., the RDO module 110 in
Next, in step 708, the method 700 may determine if a transform step should be bypassed for luma and/or chroma components of the current block, which may be implemented through the RDO module. If the condition in the block 708 is met, the method 700 may proceed to step 710, where one or more transform bypass flags for the current block may be set to ‘1’. Otherwise, the method 700 may proceed to step 720, where the one or more transform bypass flags may be set to ‘0’. The binary value may be arbitrary set. For example, if desired, the one or more transform bypass flags may be set to ‘0’ in step 710 and ‘1’ in step 720. In use, luma and chroma components may use separate transform bypass flags. If the two components always use a same encoding scheme, they may also share a transform bypass flag.
Step 710 may be followed by step 712, where the residual block may be encoded using an entropy encoder (e.g., the entropy encoder 130 in
Step 720 may be followed by step 722, where the residual block may be converted in a transform module (e.g., the transform module 330 in
If a lossy encoding mode is chosen for the current region in step 704, the method 700 may proceed to step 730, where a residual block may be generated for each block of the current region. To generate the residual block, a RDO module (e.g., the RDO module 510 in
Each block of the current region may be encoded using some of steps 702-736. In an embodiment, after encoding all blocks in the current region, in step 740, the bitstream may be transmitted, for example, over a network to a decoder. It should be understood that the method 700 may only include a portion of all necessary encoding steps, thus other steps, such as de-quantization and inverse transform, may also be incorporated into the encoding process wherever necessary.
For each block of the current region, in step 808, one or more encoded transform bypass flags may be decoded in an entropy decoder (e.g., the entropy decoder 210 in
If the current region needs to be decoded in a lossy decoding mode (determined by block 806), the method 800 may proceed to step 830, where a matrix of quantized transform coefficients may be decoded in an entropy decoder (e.g., the entropy decoder 610 in
After obtaining the residual block using either a lossless or lossy decoding mode, in step 840, a prediction block may be generated. The prediction block may be based on information (decoded from the bitstream using the entropy encoder) comprising a prediction mode, as well as one or more previously coded frames or blocks. Next, in step 842, the residual block may be added to the prediction block, thus generating a reconstructed block. Depending on the encoding and decoding schemes used, the reconstructed block may be an exact, approximate, or noisy version of the original block (before encoding). Barring distortion introduced during transmission, all information from the original block may be preserved in transform bypass coding. Depending on properties of transform and inverse transform, all (or nearly all) information may be preserved in transform without quantization coding. Certain information may be lost in lossy coding, and the degree of loss may mostly depend on the quantization and de-quantization steps. To facilitate continuous decoding of blocks, some pixels of the reconstructed block may also serve as reference pixels for decoding of future blocks. Likewise, the current frame may also serve as a reference frame for decoding of future frames.
As mentioned previously, when a sequence of video frames is being coded, sometimes certain regions may remain stable for a relatively long period of time. For example, in video conferencing applications, a background region of each user may remain unchanged for tens of minutes. For another example, in computer screen sharing applications (e.g., used in online video gaming), one or more regions containing text and/or graphics may remain unchanged for tens of seconds or minutes. Since continuous coding of these stable regions may consume unnecessary computation resource and time, it may be desirable to skip these regions from the coding process.
In use, a RDO module (e.g., the RDO module 110 in
In the forced skip mode, the RDO module may skip the rest of the RDO and coding steps for the current CU, which may improve encoding speed. For example, the RDO module may skip a RDO process where RD or bit rate costs are calculated in various coding modes (e.g., various inter/intra prediction modes and/or PU/TU partitions). Instead, the current CU may be flagged or signaled as a skipped CU. Information identifying the skipped CU and its matching reference CU may be included in a bitstream. In an embodiment, for each of the skipped CU and its matching reference CU, the signaling information may comprise a size and/or a plurality of coordinates (e.g. top-left and bottom-right coordinates, or top-right and bottom-left coordinates). No residual value or transform coefficient of the skipped CU may be needed in the bitstream.
Upon receiving of the bitstream, a video decoder may check to see if a current CU has been encoded in a forced skip mode based on signaling information contained in the bitstream. If yes, then pixel values of the matching reference CU may be used to reconstruct the current CU. Since there may be potentially a large number of CUs that may be coded in the forced skip mode, the bit rate of coding these CUs may be significantly reduced. Further, the coding process may be made faster, and computation resources may be saved accordingly.
Next, in step 920, the method 900 may determine if all differences are within a pre-set boundary or tolerance or range (e.g., ±1). If the condition in the block 920 is met, the method 900 may proceed to step 930. Otherwise, the method 900 may proceed to step 940. In step 930, information may be included to the bitstream to signal that the current block is encoded in a forced skip mode. Information may identify the skipped CU and its matching reference CU. In an embodiment, for each of the skipped CU and its matching reference CU, the signaling information may comprise a size and/or a plurality of coordinates (e.g. top-left and bottom-right coordinates, or top-right and bottom-left coordinates). The rest of encoding steps (e.g., RDO mode selection, encoding of residual block) may be skipped for the current block.
In step 940, the method 900 may determine if the current block is located within a lossless encoding region. If the condition in the block 920 is met, the method 900 may proceed to step 950. Otherwise, the method 900 may proceed to step 960. In step 950, an encoding mode leading to a least number of bits may be selected as an optimal mode. The optimal mode may be determined by a RDO module (e.g., the RDO module 110 in
In step 970, an encoding mode leading to a smallest RD cost may be selected as an optimal mode. The RD cost of different encoding modes may take into account both the bit rate portion and the distortion portion in determining the optimal coding mode. Next, in step 980, the current block may be encoded in a lossy mode using a lossy encoding scheme. It should be understood that the method 900 may only include a portion of all necessary encoding steps, thus other steps, such as transform, quantization, de-quantization, inverse transform, and transmission, may also be incorporated into the encoding process wherever appropriate.
The schemes described above may be implemented on any general-purpose network component, such as a computer or network component with sufficient processing power, memory resources, and network throughput capability to handle the necessary workload placed upon it.
The secondary storage 1104 is typically comprised of one or more disk drives or tape drives and is used for non-volatile storage of data and as an over-flow data storage device if the RAM 1108 is not large enough to hold all working data. The secondary storage 1104 may be used to store programs that are loaded into the RAM 1108 when such programs are selected for execution. The ROM 1106 is used to store instructions and perhaps data that are read during program execution. The ROM 1106 is a non-volatile memory device that typically has a small memory capacity relative to the larger memory capacity of the secondary storage 1104. The RAM 1108 is used to store volatile data and perhaps to store instructions. Access to both the ROM 1106 and the RAM 1108 is typically faster than to the secondary storage 1104.
At least one embodiment is disclosed and variations, combinations, and/or modifications of the embodiment(s) and/or features of the embodiment(s) made by a person having ordinary skill in the art are within the scope of the disclosure. Alternative embodiments that result from combining, integrating, and/or omitting features of the embodiment(s) are also within the scope of the disclosure. Where numerical ranges or limitations are expressly stated, such express ranges or limitations should be understood to include iterative ranges or limitations of like magnitude falling within the expressly stated ranges or limitations (e.g., from about 1 to about 10 includes, 2, 3, 4, etc.; greater than 0.10 includes 0.11, 0.12, 0.13, etc.). For example, whenever a numerical range with a lower limit, R1, and an upper limit, Ru, is disclosed, any number falling within the range is specifically disclosed. In particular, the following numbers within the range are specifically disclosed: R=R1+k*(Ru-R1), wherein k is a variable ranging from 1 percent to 100 percent with a 1 percent increment, i.e., k is 1 percent, 2 percent, 3 percent, 4 percent, 7 percent, . . . , 70 percent, 71 percent, 72 percent, . . . , 95 percent, 96 percent, 97 percent, 98 percent, 99 percent, or 100 percent. Moreover, any numerical range defined by two R numbers as defined in the above is also specifically disclosed. The use of the term about means ±10% of the subsequent number, unless otherwise stated. Use of the term “optionally” with respect to any element of a claim means that the element is required, or alternatively, the element is not required, both alternatives being within the scope of the claim. Use of broader terms such as comprises, includes, and having should be understood to provide support for narrower terms such as consisting of, consisting essentially of, and comprised substantially of. Accordingly, the scope of protection is not limited by the description set out above but is defined by the claims that follow, that scope including all equivalents of the subject matter of the claims. Each and every claim is incorporated as further disclosure into the specification and the claims are embodiment(s) of the present disclosure. The discussion of a reference in the disclosure is not an admission that it is prior art, especially any reference that has a publication date after the priority date of this application. The disclosure of all patents, patent applications, and publications cited in the disclosure are hereby incorporated by reference, to the extent that they provide exemplary, procedural, or other details supplementary to the disclosure.
While several embodiments have been provided in the present disclosure, it may be understood that the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.
In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present disclosure. Other items shown or discussed as coupled or directly coupled or communicating with each other may be indirectly coupled or communicating through some interface, device, or intermediate component whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and may be made without departing from the spirit and scope disclosed herein.
The present application claims priority to U.S. Provisional Patent Application No. 61/503,534 filed Jun. 30, 2011 by Wen Gao et al. and entitled “Lossless Coding Tools for Compound Video”, and U.S. Provisional Patent Application No. 61/506,958 filed Jul. 12, 2011 by Wen Gao et al. and entitled “Additional Lossless Coding Tools for Compound Video”, each of which is incorporated herein by reference as if reproduced in its entirety.
Number | Date | Country | |
---|---|---|---|
61503534 | Jun 2011 | US | |
61506958 | Jul 2011 | US |