Examples described herein relate to video encoding. Examples include methods and apparatuses for adjusting macroblock quantization parameters for different regions in a video frame or video image which may improve visual quality for lossy video encoding.
Digitization of a video image, video signal, or video frame may include sampling on a discrete grid or pixels. Each pixel may be assigned a number of bits. Once the video image is converted into bits, processing may be performed, including video image enhancement, video image restoration, and video image compression.
A macroblock typically includes 16×16 samples, and may be further divided into transform blocks. A video image may be transformed to produce a set of blocks of transform coefficients to achieve lossy compression. For example, the video image may be divided into discrete macroblocks (e.g. 16 by 16 pixels in the case of MPEG). These macroblocks may be subjected to discrete cosine transform (DCT) to calculate frequency components, both vertically and horizontally, in a matrix. The transform coefficients in the DCT matrix are then quantized.
Quantization is a lossy compression technique achieved by compressing a range of values to a single quantum value. Quantization may include color quantization, which reduces the number of colors used in an image. Quantization may also include frequency quantization, which reduces the information associated with high frequency components, as a human eye is not sensitive to rapid brightness variation. As a result of frequency quantization, high frequency components may be rounded to zero.
During quantization, the transform coefficients in the DCT matrix are then divided by a standard quantization matrix and rounded to an integer value. As a result of quantization, the transform coefficients are more coarsely represented at lower bit rates, and more transform coefficients are zero. The loss of information through the quantization process may make compression lossy—e.g. some information has been lost through quantization. Statistically, images may have more low frequency components or content than high frequency components or content. For example, low frequency components may remain after quantization, which may result in blurry or low-resolution blocks.
Certain details are set forth below to provide a sufficient understanding of embodiments of the disclosure. However, it will be clear to one having skill in the art that embodiments of the disclosure may be practiced without these particular details, or with additional or different details. Moreover, the particular embodiments described herein are provided by way of example and should not be used to limit the scope of the disclosure to these particular embodiments. In other instances, well-known video components, encoder or decoder components, circuits, control signals, timing protocols, and software operations have not been shown in detail in order to avoid unnecessarily obscuring the disclosure.
Examples of methods and apparatuses are described herein that adjust macroblock quantization parameters in a manner which may improve subjective visual quality of a resulting encoded image. To simulate how a viewer may see a video image, a visual quality (VQ) importance index is calculated based upon the content of the video image. The VQ importance index is assigned to portions of a video image, e.g. frames, macroblocks, pixels, or other coding units based on the importance of the portion of the video image to subjective perception by a human viewer. Subjective video quality generally reflects how human viewers evaluate the video quality. Thus, the VQ importance index may help guide the quantization process to determine how best to distribute the encoding bits. For example, more bits may be used to encode portions of a video image which will have a greater importance to subjective video quality and fewer bits may be used to encode portions of a video image which will have lesser importance to subjective video quality.
Different regions of a video image or video frame or macroblocks of video data have different visual importance indexes to human viewers. Then, at least one quantization parameter, such as a quantization step, may be adjusted for different portions of a video image based on the characteristic of video contents and the VQ importance index. The adjusted quantization parameters (e.g. quantization step) may control the output of the bitstream. For example, a smaller quantization step value may lead to more bits generated from an entropy encoder, while a larger quantization step value may lead to less bits generated from the entropy encoder.
By identifying and classifying the regions with different visual quality importance indexes, the encoding bits spent for the video image or video frame may be adjusted to enhance the subjective visual quality for regions with higher VQ importance index by using more bits for encoding. The bit spending may be adjusted by changing the quantization parameters of macroblocks. Generally, the macroblocks with high VQ importance index may be assigned smaller quantization parameters, such that more bits would be used to encode those macroblocks. While macroblocks with low VQ importance index may be encoded with larger quantization parameters and thus fewer bits would be used for encoding these macroblocks.
The VQ importance index may be calculated per macroblock, or other coding unit, based on statistics information associated with each macroblock including, but not limited to, spatial activity of the macroblock which indicates content complexity of the macroblock, luminance contrast index of the macroblock, edge information, and skintone color information. The spatial activity generally refers to the summation of horizontal pixel absolute difference and vertical pixel absolute difference in a macroblock.
As an example, the encoder 104 may receive and encode a video signal that may include video data (which may be arranged, e.g., in frames). The video signal may be encoded in accordance with one or more encoding standards, such as but not limited to MPEG-2, MPEG-4, H.263, H.264, and/or HEVC, to provide the encoded bitstream, which may be provided to a data bus and/or to a device, such as a decoder or transcoder (not shown).
Encoder 104 may produce constant bitrate (CBR) or variable bitrate (VBR) bitstreams. As content complexity of source 104, the bitrates of the encoded bitstreams may vary over time. A quantification of content complexity is often specific to a video coding methodology and the encoder used to encode the content.
The video encoder 104 may also include a feedback loop that includes inverse transform unit 208 and inverse quantization unit 210. The inverse transform unit 208 and inverse quantization unit 210 may provide a reconstructed video image which may approximate the video image as decoded in a decoder. The reconstructed video image may be provided to the motion estimation/motion compensation unit 212 or another unit for comparison with the source image. In this manner, a residual may be obtained which may be used to improve the encoding process.
The transform unit 202 may be configured to perform a transform, such as a discrete cosine transform (DCT), on the signal received from the source 102 to produce a set of blocks of transform coefficients (typically in blocks of 8×8 pixels or 4×4 pixels) that may, for instance, correspond to spectral components of the video signal. Generally, the transform unit 202 may transform the video signal to a frequency domain representation. Although DCT is described here, other transform techniques may be used. When the DCT is used, the coefficients of the DCT matrix are typically scanned using a zig-zag scan order. The output of the transform unit 202, the block of transform coefficients, is then quantized by the quantization unit 204.
The quantization unit 204 may be configured to receive the transform coefficients and quantize the transform coefficients to produce quantized transform coefficients. The quantization parameters used to perform the quantization and generate the quantized transform coefficients may be adjusted based on the calculated VQ importance index and initial frame level quantization parameters for a macroblock in the quantization unit 204. Furthermore, trellis quantization may in some examples further fine tune the quantization process by tracing through the quantization of all pixels within the macroblock. Then, entropy encoding is applied to the quantized transform coefficients by the entropy encoder 206. Entropy coding typically combines a number of consecutive zero-valued quantized coefficients with the next non-zero quantized coefficient into a single symbol, and also indicates when all of the remaining quantized coefficient values are equal to zero.
The entropy encoder 206 may encode the quantized transform coefficients with an encoding technique, such as context-adaptive variable length coding (CAVLC). The entropy encoder 206 may receive syntax elements (e.g., quantized transform coefficients, differential motion vectors, macroblock modes, etc.) from other components of the encoder, such as directly from the quantization unit 204 and indirectly from the motion estimation/compensation block 212. The entropy encoder 206 may be a variable length coding encoder (e.g., Huffman encoder, CAVLC encoder, or context-adaptive binary arithmetic coding (CABAC) encoder), which may be configured to encode data, for instance, at a macroblock level.
The entropy encoder 206 controls the output of an encoded bitstream. The bit spending for different portions of a video image may be controlled by adjusting the quantization parameters (e.g. quantization step value) based on the VQ importance index. Entropy encoding is a data compression scheme that is independent of the specific characteristics of the medium. The entropy encoding typically uses variable-length coding tables. The entropy coding may create and assign a unique prefix-free code to each unique symbol that occurs in the input. The entropy encoding then may compress data by replacing each fixed-length input symbol with a variable-length prefix-free output code word, such that the macroblocks, or other coding units, with higher VQ importance index may use a larger number of bits.
In some embodiments, the encoder 104 may operate in accordance with the MPEG-2 video coding standard, the H.264 video coding standard, or other standards. Thus, because the MPEG-2 and the H.264 video coding standards employ motion prediction and/or compensation, the encoder 104 may further include a feedback loop that includes an inverse transform unit 208 and an inverse quantization unit 210. These elements may mirror elements included in a decoder (not shown) that is configured to reverse, at least in part, the encoding process performed by the encoder 104. Additionally, the feedback loop of the encoder 104 may include a motion estimation/compensation block 212.
The inverse transform unit 208 may inversely transform the quantized transform coefficients for a macroblock to produce reconstructed transform coefficients. The inverse quantization unit 210 may inversely quantize the reconstructed transform coefficients to provide recovered transform coefficients.
The motion estimation/compensation block 212 receives the recovered transform coefficients for use in macroblock intra-mode prediction and/or inter-mode prediction mode decision methodologies. Modem block based video coding standards such as MPEG2, H.261, H.262, H.263 and H.264 may take advantage of temporal and spatial redundancy to achieve efficient video compression. An intra-coded block or macroblock is coded based on predictions from neighboring macroblocks, whereas inter-coded macroblocks are coded based on temporal predictions. Video frames are typically organized using intra-frames (I-frames), containing all intra-coded macroblocks, with a series of inter-coded frames (P-frames) in between. P-frames cannot be properly decoded without first decoding one or more previous frames. I-frames are generally larger than P-frames, but are required for random access (e.g., a receiver capable of entering a video stream at any point, and to limit the propagation of transmission errors.
Some spatial and temporal downsampling may also be used to reduce raw data rate from source 102 before encoding which starts from transform unit 202. As shown in
The method 300 may also include receiving initial quantization parameters for the macroblocks from a rate control unit at 310 and dynamically adjusting quantization parameters based on the calculated VQ importance index in the quantization unit 204 at 314. The adjustment to the quantization parameters may vary with the video data over time, for example, dynamically.
Method 300 may also include quantizing the transform coefficients using the dynamically adjusted quantization parameters in the quantization unit 204 at 318. In a particular embodiment, a quantization scheme in video encoding may be represented by the following equation:
where F is a quantized transform coefficient, f is a transform coefficient, and Δ is a quantization step value, which is a quantization parameter. The quantization step value may be adjusted for each macroblock, or other coding unit. By adjusting the quantization step value, the number of bits used by the entropy encoder 206 to encode particular portions of the video signal may be varied.
Method 300 may further include assigning bit spending based on the VQ importance index by the entropy encoder 206 at 322. When the quantization step value is large, more bits may be eliminated during the quantization process, and less bits may be generated for the transform coefficient f in the encoded bitstream output from the entropy encoder. On the other hand, a smaller quantization step value may lead to more bits being preserved for the transform coefficient f, and thus more bits may be used in the output bitstream from the entropy encoder 206 to represent the particular transform coefficient.
Content complexity may be calculated, at least in part, based on activity of the macroblock at block 402, where the activity may be given as the summation of horizontal pixel absolute difference and vertical pixel absolute difference in a macroblock. For example, the difference between intensity of horizontally adjacent pixels may be summed for each of the horizontally adjacent pixel pairs in a macroblock. Similarly, the difference between intensity of vertically adjacent pixels may be summed for each of the vertically adjacent pixel pairs in a macroblock. These two sums may also be summed and used as a measure of activity of a macroblock. In other examples, activity may be calculated in other manners. Activity is generally a measure of how much intensity variation is present across the coding unit.
Luminance contrast index may be calculated, at least in part, based on variance of the activity of the macroblock at block 404. First, the variance of a macroblock may be calculated based on the difference between the pixel values and the macroblock pixel average value. For example, the difference between the intensity of each pixel and an average intensity for the coding unit, e.g. macroblock, may be calculated and the differences summed for all pixels in the macroblock. This sum may represent a variance of the macroblock. Then, the ratio between the activity and the variance may be calculated as the luminance contrast index.
Edge information may be identified at block 406, at least in part, based on the luminance contrast index obtained from block 404 and the content complexity obtained at block 402. Edge information may be important to human viewers, where the human eye is sensitive to the appearance of edges. Visual quality of the image may be improved by identifying edge boundaries and providing more bits to improve the sharpness of the edge. For example, if the content complexity of a macroblock exceeds a threshold while the luminance contrast index is less than a threshold value, the portion of the video image may be flagged as having edge information. In one example, the threshold value for content complexity may be 1000 and the threshold value for luminance contrast index may be 10. Such that if both conditions (Content complexity>1000) and (Luminance contrast index<10) are met in one example, the associated portion of the video image, e.g. macroblock, is flagged as containing edge information.
Skintone color information may be calculated, at least in part, using the chroma information of the video content at block 408. Skintone color information may include human skin color information, which may vary with, for example but not limited to, race and sun tanning, including dark skin or light skin. Skintone color information may also include object color information, which may have various colors. Skintone color information is important to human viewers, which helps clearly distinguish different people or objects. Accordingly, chroma information may be compared with stored values or ranges of values that may correspond with skintone coloration. Stored chroma values indicative of skintone coloration, which may include ranges of values, may be stored in any suitable electronic storage medium accessible to the VQ importance index calculator. In block 408, the chroma information of a video signal may be compared with the stored chroma values indicative of skintone coloration and coding units, e.g. macroblocks, having chroma values indicative of skintone information may be flagged as containing skintone information.
In general, macroblocks with more content complexity, edge information, or skintone color information tend to have higher VQ importance index, while macroblocks with higher luminance contrast index tend to have lower VQ importance index. In a particular embodiment, the value of VQ importance index may vary from 1 to 5 with a step size of 1. The larger value of VQ importance index indicates that the macroblock is more important to human viewers in terms of subjective visual quality. While content complexity, luminance content index, edge information, and skintone color have been described herein as factors used to calculate a VQ importance index, it is to be understood that any combination or sub-combination of these factors may be used in embodiments of the present invention. Other factors may be used in combination with or instead of these factors in other embodiments.
Once the VQ importance index is calculated, the quantization parameter (QP) of each macroblock may be adjusted from an initial quantization parameter (e.g. initial_QP) from a rate control unit using the calculated VQ importance index. First, a frame level QP may be determined by a rate control process or received from the rate control unit 214. Then, the quantization parameter may be adjusted.
The macroblock quantization parameter may also be adaptively adjusted based on the following rules. Generally, the following rules are applied to macroblock quantization parameter adjustment. In one embodiment, if a macroblock has high activity and low VQ importance index (such as 1 or 2), the macroblock quantization parameter is increased to initial_QP+1. In another embodiment, if a macroblock has middle activity and high VQ importance index (such as 4 or 5), the macroblock quantization parameter is decreased to initial_QP−1. In a further embodiment, if a macroblock has low activity and high VQ importance index, the macroblock quantization parameter is decreased to initial_QP−2.
More specifically, for regions with relatively high VQ importance index (such as 4 or 5 in a particular embodiment), QP may be reduced from an initial QP during the quantization and/or trellis quantization. For example, for regions with skintone color information or edge information, the quantization parameter may be decreased, including decreasing the QP by 2 to frame level_QP−2. Additionally or instead, for portions of the image meeting the condition that both luminance contrast index is greater than a threshold (e.g. 15) and content complexity lies within a predetermined range (e.g. from at least 4000 to 6000), the quantization parameter may be decreased, e.g. decreased by 1 to frame level_QP−1 in some examples. Furthermore, for portions of a video image meeting the condition that both luminance contrast index is greater than a threshold (e.g. 10) and content complexity lies within a predetermined range (e.g. between at least 2000 and 4000), the quantization parameter may be decreased, e.g. decreased by 1 to frame level_QP−1.
On the other hand, for portions of a video image with a lower VQ importance index (such as 1 or 2), the macroblock quantization parameter may be increased during the quantization and/or trellis quantization. For example, for regions meeting the condition that both luminance contrast index is smaller than a threshold (e.g. 20) and content complexity is greater than a threshold (e.g. 10000), the quantization parameter may be increased, e.g. increased by 1 to frame level_QP+1. Additionally or instead, for regions meeting the condition that both luminance contrast index is smaller than a threshold (e.g. 15) and content complexity lies within a predetermined range (e.g. between at least 8000 and 10000), the quantization parameter may be increased, e.g. increased by 1 to frame level_QP+1.
For portions of a video image with a middle range of VQ importance index (such as 3 which may be a threshold value), the quantization parameter may remain unchanged as initial frame level_QP.
In a particular embodiment, there may be 5 grayscale levels in the VQ importance index map shown in
In a particular embodiment, there may be 5 grayscale levels in the VQ importance index map shown in
The media source data 902 may be any source of media content, including but not limited to, video, audio, data, or combinations thereof. The media source data 902 may be, for example, audio and/or video data that may be captured using a camera, microphone, and/or other capturing devices, or may be generated or provided by a processing device. Media source data 902 may be analog or digital. When the media source data 902 is analog data, the media source data 902 may be converted to digital data using, for example, an analog-to-digital converter (ADC). Typically, to transmit the media source data 902, some type of compression and/or encryption may be desirable. Accordingly, an encoder 910 which may employ the VQ importance index calculation and bit assignment techniques described herein may be provided that may encode the media source data 902 using any encoding method in the art, known now or in the future, including encoding methods in accordance with video standards such as, but not limited to, MPEG-2, MPEG-4, H.264, HEVC, or combinations of these or other encoding standards. The encoder 910 may be implemented using any encoder according to an embodiment of the invention, including the encoder 104, and further may be used to implement the method 300 of
The encoded data 912 may be provided to a communications link, such as a satellite 914, an antenna 916, and/or a network 918. The network 918 may be wired or wireless, and further may communicate using electrical and/or optical transmission. The antenna 916 may be a terrestrial antenna, and may, for example, receive and transmit conventional AM and FM signals, satellite signals, or other signals known in the art. The communications link may broadcast the encoded data 912, and in some examples may alter the encoded data 912 and broadcast the altered encoded data 912 (e.g., by re-encoding, adding to, or subtracting from the encoded data 912). The encoded data 920 provided from the communications link may be received by a receiver 922 that may include or be coupled to a decoder. The decoder may decode the encoded data 920 to provide one or more media outputs, with the media output 904 shown in
The receiver 922 may be included in or in communication with any number of devices, including but not limited to a modem, router, server, set-top box, laptop, desktop, computer, tablet, mobile phone, etc.
Accordingly, a VQ importance index may be calculated which is indicative of the relative subjective importance of a portion (e.g. a macroblock) of a video image (e.g. frame). A higher VQ importance index value may be associated with more important regions. A lower VQ importance index may be associated with less important regions. The encoding quality may be improved utilizing the VQ importance index, because the quantization parameters may be adjusted based on the VQ importance index. As a result, the encoder may generate more bits encoding regions (e.g. macroblocks) with higher VQ importance index than regions (e.g. macroblocks) with lower VQ importance index.
Having described several embodiments, it will be recognized by those skilled in the art that various modifications, alternative constructions, and equivalents may be used without departing from the spirit of the disclosure. Additionally, a number of well-known processes and elements have not been described in order to avoid unnecessarily obscuring the embodiments disclosed herein. Accordingly, the above description should not be taken as limiting the scope of the document.
Those skilled in the art will appreciate that the presently disclosed embodiments teach by way of example and not by limitation. Therefore, the matter contained in the above description or shown in the accompanying drawings should be interpreted as illustrative and not in a limiting sense. The following claims are intended to cover all generic and specific features described herein, as well as all statements of the scope of the present method and system, which, as a matter of language, might be said to fall therebetween.