Digital video consumes large amounts of storage and transmission capacity. Many computers and computer networks lack the resources to process raw digital video. For this reason, engineers use compression (also called coding or encoding) to reduce the bit rate of digital video. Compression decreases the cost of storing and transmitting video by converting the video into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original video from the compressed form. A “codec” is an encoder/decoder system.
Compression can be lossless, in which the quality of the video does not suffer, but decreases in bit rate are limited by the inherent amount of variability (sometimes called entropy) of the video data. Or, compression can be lossy, in which the quality of the video suffers, but achievable decreases in bit rate are more dramatic. Lossy compression is often used in conjunction with lossless compression—the lossy compression establishes an approximation of information, and the lossless compression is applied to represent the approximation.
A basic goal of lossy compression is to provide good rate-distortion performance. So, for a particular bit rate, an encoder attempts to provide the highest quality of video. Or, for a particular level of quality/fidelity to the original video, an encoder attempts to provide the lowest bit rate encoded video. In practice, considerations such as encoding time, encoding complexity, encoding resources, decoding time, decoding complexity, decoding resources, overall delay, and/or smoothness in quality/bit rate changes also affect decisions made in codec design as well as decisions made during actual encoding.
In general, video compression techniques include “intra-picture” compression and “inter-picture” compression. Intra-picture compression techniques compress an individual picture, and inter-picture compression techniques compress a picture with reference to a preceding and/or following picture (often called a reference or anchor picture) or pictures.
The encoder quantizes (120) the transform coefficients (115), resulting in an 8×8 block of quantized transform coefficients (125). With quantization, the encoder essentially trades off quality and bit rate. More specifically, quantization can affect the fidelity with which the transform coefficients are encoded, which in turn can affect bit rate. Coarser quantization tends to decrease fidelity to the original transform coefficients as the coefficients are more coarsely approximated. Bit rate also decreases, however, when decreased complexity can be exploited with lossless compression. Conversely, finer quantization tends to preserve fidelity and quality but result in higher bit rates. Different encoders use different parameters for quantization. In most encoders, a level or step size of quantization is set for a block, picture, or other unit of video. Some encoders quantize coefficients differently within a given block, so as to apply relatively coarser quantization to perceptually less important coefficients, and a quantization matrix can be used to indicate the relative quantization weights. Or, apart from the rules used to reconstruct quantized values, some encoders vary the thresholds according to which values are quantized so as to quantize certain values more aggressively than others.
Returning to
In corresponding decoding, a decoder produces a reconstructed version of the original 8×8 block. The decoder entropy decodes the quantized transform coefficients, scanning the quantized coefficients into a two-dimensional block, and performing AC prediction and/or DC prediction as needed. The decoder inverse quantizes the quantized transform coefficients of the block and applies an inverse frequency transform (such as an inverse DCT (“IDCT”)) to the de-quantized transform coefficients, producing the reconstructed version of the original 8×8 block. When a picture is used as a reference picture in subsequent motion compensation (see below), an encoder also reconstructs the picture.
Inter-picture compression techniques often use motion estimation and motion compensation to reduce bit rate by exploiting temporal redundancy in a video sequence. Motion estimation is a process for estimating motion between pictures. In general, motion compensation is a process of reconstructing pictures from reference picture(s) using motion data, producing motion-compensated predictions.
For a current unit (e.g., 8×8 block) being encoded, the encoder computes the sample-by-sample difference between the current unit and its motion-compensated prediction to determine a residual (also called error signal). The residual is frequency transformed, quantized, and entropy encoded. For example, for a current 8×8 block of a predicted picture, an encoder computes an 8×8 prediction error block as the difference between a motion-predicted block and the current 8×8 block. The encoder applies a frequency transform to the residual, producing a block of transform coefficients. Some encoders switch between different sizes of transforms, e.g., an 8×8 transform, two 4×8 transforms, two 8×4 transforms, or four 4×4 transforms for an 8×8 prediction residual block. The encoder quantizes the transform coefficients and scans the quantized coefficients into a one-dimensional array such that coefficients are generally ordered from lowest frequency to highest frequency. The encoder entropy codes the data in the array.
If a predicted picture is used as a reference picture for subsequent motion compensation, the encoder reconstructs the predicted picture. When reconstructing residuals, the encoder reconstructs transform coefficients that were quantized and performs an inverse frequency transform. The encoder performs motion compensation to compute the motion-compensated predictors, and combines the predictors with the residuals. During decoding, a decoder typically entropy decodes information and performs analogous operations to reconstruct residuals, perform motion compensation, and combine the predictors with the reconstructed residuals.
In some cases, when a block of input values is frequency transformed, only the DC coefficient for the block has a significant value. This might be the case, for example, if sample values for the block are uniform or nearly uniform, with the DC coefficient indicating the average of the sample values and the AC coefficients being zero or having small values that become zero after quantization. Using DC-only blocks facilitates compression in many cases, but can result in perceptible quantization artifacts in the form of step-wise boundaries between blocks.
Blocks with nearly even proportions or gradually changing proportions of closely related values appear naturally in some video sequences. Such blocks can also result from certain common preprocessing operations like dithering on source video sequences. For example, when a source video sequence that includes pictures with 10-bit samples (or 12-bit) samples is converted to a sequence with 8-bit samples, the number of bits used to represent each sample is reduced from 10 bits (or 12 bits) to 8 bits. As a result, regions of gradually varying brightness or color in the original source video might appear unrealistically uniform in the sequence with 8-bit samples, or they might appear to have bands or steps instead of the gradations in brightness or color. Prior to distribution, the producer of the source video might therefore use dithering to introduce texture in the image or smooth noticeable bands or steps. The dithering makes minor up/down adjustments to sample values to break up monotonous regions or bands/steps, making the source video look more realistic since the human eye “averages” the fine detail.
For example, if 10-bit sample values gradually change from 16.25 to 16.75 in a region, steps may appear when the 10-bit sample values are converted to 8-bit values. To smooth the steps, dithering adds an increasing proportion of 17 values to the 16-value step and adds a decreasing proportion of 16 values to the 17-value step. This helps improve perceptual quality of the source video, but subsequent compression may introduce unintended blocking artifacts.
During compression, if the dithered regions are represented with DC-only blocks, blocking artifacts may be especially noticeable. If dithering can be disabled, that may help. In many cases, however, the dithering is performed long before the video is available for compression, and before the encoding decisions that might classify blocks as DC-only blocks in a particular encoding scenario.
In summary, the detailed description presents techniques and tools for improving quantization. For example, a video encoder quantizes DC coefficients of DC-only blocks in ways that tend to reduce blocking artifacts for those blocks, which improves perceptual quality.
In some embodiments, a tool such as a video encoder receives input values. The input values can be sample values for an image, residual values for an image, or some other type of information. The tool produces transform coefficient values by performing a frequency transform on the input values. The tool then quantizes the transform coefficient values. For example, the tool sets a quantization level for a DC coefficient value of a DC-only block.
In setting the quantization level for a coefficient value, the tool uses quantization bias that accounts for relations between quantization bins and transform bins. Generally, a quantization bin for coefficient values includes those coefficient values that, following quantization and inverse quantization by a particular quantization step size, have the same reconstructed coefficient value. A transform bin in general includes those coefficient values that, following inverse frequency transformation, yield a particular input-domain value (or at least influence the inverse frequency transform to yield that value). The boundaries of quantization bins often are not aligned with the boundaries of transform bins. This mismatch can result in blocking artifacts such as described above with reference to
In some implementations, the tool uses one or more offset tables when performing mismatch compensation. For example, the offset tables store offsets for possible DC coefficient values at different quantization step sizes. When quantizing a particular DC coefficient value at a particular quantization step size, the tool looks up an offset and, if appropriate, adjusts the quantization level for the DC coefficient value using the offset. When the offsets have a periodic pattern, offset table size can be reduced to save storage and memory.
In other implementations, the tool exposes an adjustable parameter that controls the extent of quantization bias. For example, the parameter is adjustable by a user or adjustable by the tool. The parameter can be adjusted before encoding or during encoding in reaction to results of previous encoding. Although the parameter can be set such that the tool performs mismatch compensation, it can more generally be set or adjusted to bias quantization as deemed appropriate. For example, the parameter can be set or adjusted to reduce blocking artifacts that mismatch compensation would not reduce.
The foregoing and other objects, features, and advantages of the invention will become more apparent from the following detailed description, which proceeds with reference to the accompanying figures.
The present application relates to techniques and tools for improving quantization by using quantization bias that accounts for relations between quantization bins and transform bins. The techniques and tools can be used to compensate for mismatch between transform bin boundaries and quantization bin boundaries during quantization. For example, in some embodiments, when a video encoder quantizes the DC coefficients of DC-only blocks, the encoder uses mismatch compensation to reduce or even eliminate quantization artifacts caused by such mismatches. The quantization artifacts caused by mismatches may occur in video that includes naturally uniform patches, or they may occur when video is converted to a lower sample depth and dithered. How the encoder compensates for mismatches can be predefined and specified in offset tables.
In other embodiments, an adjustable threshold controls the extent of quantization bias. For example, the amount of bias can be adjusted by software depending on whether blocking artifacts are detected by the software. Or, someone who controls encoding during video production can adjust the amount of bias to reduce perceptible blocking artifacts in a scene, image, or part of an image. When a dithered region is encoded, for example, presenting the region with a single color might be preferable to presenting the region with blocking artifacts.
Various alternatives to the implementations described herein are possible. For example, certain techniques described with reference to flowchart diagrams can be altered by changing the ordering of stages shown in the flowcharts, by repeating or omitting certain stages, etc. The various techniques and tools described herein can be used in combination or independently. Different embodiments implement one or more of the described techniques and tools. Aside from uses in video compression, the quantization bias techniques and tools can be used in image compression, audio compression, other compression, or other areas. Moreover, while many examples described herein involve quantization of DC coefficients for DC-only blocks, alternatively the techniques and tools described herein are applied to quantization of DC coefficients for other blocks, or to quantization of AC coefficients.
Some of the techniques and tools described herein address one or more of the problems noted in the Background. Typically, a given technique/tool does not solve all such problems. Rather, in view of constraints and tradeoffs in encoding time, resources, and/or quality, the given technique/tool improves encoding performance for a particular implementation or scenario.
With reference to
A computing environment may have additional features. For example, the computing environment (300) includes storage (340), one or more input devices (350), one or more output devices (360), and one or more communication connections (370). An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computing environment (300). Typically, operating system software (not shown) provides an operating environment for other software executing in the computing environment (300), and coordinates activities of the components of the computing environment (300).
The storage (340) may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing environment (300). The storage (340) stores instructions for the software (380) implementing the video encoder.
The input device(s) (350) may be a touch input device such as a keyboard, mouse, pen, or trackball, a voice input device, a scanning device, or another device that provides input to the computing environment (300). For audio or video encoding, the input device(s) (350) may be a sound card, video card, TV tuner card, or similar device that accepts audio or video input in analog or digital form, or a CD-ROM or CD-RW that reads audio or video samples into the computing environment (300). The output device(s) (360) may be a display, printer, speaker, CD-writer, or another device that provides output from the computing environment (300).
The communication connection(s) (370) enable communication over a communication medium to another computing entity. The communication medium conveys information such as computer-executable instructions, audio or video input or output, or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.
The techniques and tools can be described in the general context of computer-readable media. Computer-readable media are any available media that can be accessed within a computing environment. By way of example, and not limitation, with the computing environment (300), computer-readable media include memory (320), storage (340), communication media, and combinations of any of the above.
The techniques and tools can be described in the general context of computer-executable instructions, such as those included in program modules, being executed in a computing environment on a target real or virtual processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computing environment.
For the sake of presentation, the detailed description uses terms like “find” and “select” to describe computer operations in a computing environment. These terms are high-level abstractions for operations performed by a computer, and should not be confused with acts performed by a human being. The actual computer operations corresponding to these terms vary depending on implementation.
The encoder (400) processes video pictures. The term picture generally refers to source, coded or reconstructed image data. For progressive video, a picture is a progressive video frame. For interlaced video, a picture may refer to an interlaced video frame, the top field of the frame, or the bottom field of the frame, depending on the context. The encoder (400) is block-based and uses a 4:2:0 macroblock format for frames, with each macroblock including four 8×8 luminance blocks (at times treated as one 16×16 macroblock) and two 8×8 chrominance blocks. For fields, the same or a different macroblock organization and format may be used. The 8×8 blocks may be further sub-divided at different stages, e.g., at the frequency transform and entropy encoding stages. The encoder (400) can perform operations on sets of samples of different size or configuration than 8×8 blocks and 16×16 macroblocks. Alternatively, the encoder (400) is object-based or uses a different macroblock or block format.
Returning to
A predicted picture (e.g., progressive P-frame or B-frame, interlaced P-field or B-field, or interlaced P-frame or B-frame) is represented in terms of prediction from one or more other pictures (which are typically referred to as reference pictures or anchors). A prediction residual is the difference between predicted information and corresponding original information. In contrast, a key picture (e.g., progressive I-frame, interlaced I-field, or interlaced I-frame) is compressed without reference to other pictures.
If the current picture (405) is a predicted picture, a motion estimator (410) estimates motion of macroblocks or other sets of samples of the current picture (405) with respect to one or more reference pictures. The picture store (420) buffers a reconstructed previous picture (425) for use as a reference picture. When multiple reference pictures are used, the multiple reference pictures can be from different temporal directions or the same temporal direction. The motion estimator (410) outputs as side information motion information (415) such as differential motion vector information.
The motion compensator (430) applies reconstructed motion vectors to the reconstructed (reference) picture(s) (425) when forming a motion-compensated current picture (435). The difference (if any) between a block of the motion-compensated current picture (435) and corresponding block of the original current picture (405) is the prediction residual (445) for the block. During later reconstruction of the current picture, reconstructed prediction residuals are added to the motion compensated current picture (435) to obtain a reconstructed picture that is closer to the original current picture (405). In lossy compression, however, some information is still lost from the original current picture (405). Alternatively, a motion estimator and motion compensator apply another type of motion estimation/compensation.
A frequency transformer (460) converts spatial domain video information into frequency domain (i.e., spectral, transform) data. For block-based video pictures, the frequency transformer (460) applies a DCT, variant of DCT, or other forward block transform to blocks of the samples or prediction residual data, producing blocks of frequency transform coefficients. Alternatively, the frequency transformer (460) applies another conventional frequency transform such as a Fourier transform or uses wavelet or sub-band analysis. The frequency transformer (460) may apply an 8×8, 8×4, 4×8, 4×4 or other size frequency transform.
A quantizer (470) then quantizes the blocks of transform coefficients. The quantizer (470) applies uniform, scalar quantization to the spectral data with a step size that varies on a picture-by-picture basis or other basis. The quantizer (470) can also apply another type of quantization to the spectral data coefficients, for example, a non-uniform or non-adaptive quantization. In described embodiments, the quantizer (470) biases quantization in ways that account for relations between transform bins and quantization bins, for example, compensating for mismatch between transform bin boundaries and quantization bin boundaries.
When a reconstructed current picture is needed for subsequent motion estimation/compensation, an inverse quantizer (476) performs inverse quantization on the quantized spectral data coefficients. An inverse frequency transformer (466) performs an inverse frequency transform, producing blocks of reconstructed prediction residuals (for a predicted picture) or samples (for a key picture). If the current picture (405) was a key picture, the reconstructed key picture is taken as the reconstructed current picture (not shown). If the current picture (405) was a predicted picture, the reconstructed prediction residuals are added to the motion-compensated predictors (435) to form the reconstructed current picture. One or both of the picture stores (420, 422) buffers the reconstructed current picture for use in subsequent motion-compensated prediction.
The entropy coder (480) compresses the output of the quantizer (470) as well as certain side information (e.g., motion information (415), quantization step size). Typical entropy coding techniques include arithmetic coding, differential coding, Huffman coding, run length coding, LZ coding, dictionary coding, and combinations of the above. The entropy coder (480) typically uses different coding techniques for different kinds of information, and can choose from among multiple code tables within a particular coding technique.
The entropy coder (480) provides compressed video information (495) to the multiplexer (“MUX”) (490). The MUX (490) may include a buffer, and a buffer level indicator may be fed back to a controller. Before or after the MUX (490), the compressed video information (495) can be channel coded for transmission over the network.
A controller (not shown) receives inputs from various modules such as the motion estimator (410), frequency transformer (460), quantizer (470), inverse quantizer (476), entropy coder (480), and buffer (490). The controller evaluates intermediate results during encoding, for example, setting quantization step sizes and performing rate-distortion analysis. The controller works with modules such as the motion estimator (410), frequency transformer (460), quantizer (470), and entropy coder (480) to set and change coding parameters during encoding. When an encoder evaluates different coding parameter choices during encoding, the encoder may iteratively perform certain stages (e.g., quantization and inverse quantization) to evaluate different parameter settings. The encoder may set parameters at one stage before proceeding to the next stage. For example, the encoder may decide whether a block should be treated as a DC-only block, and then quantize the DC coefficient value for the block. Or, the encoder may jointly evaluate different coding parameters. The tree of coding parameter decisions to be evaluated, and the timing of corresponding encoding, depends on implementation.
The relationships shown between modules within the encoder (400) indicate general flows of information in the encoder; other relationships are not shown for the sake of simplicity. In particular,
Particular embodiments of video encoders typically use a variation or supplemented version of the generalized encoder (400). Depending on implementation and the type of compression desired, modules of the encoder can be added, omitted, split into multiple modules, combined with other modules, and/or replaced with like modules. For example, the controller can be split into multiple controller modules associated with different modules of the encoder. In alternative embodiments, encoders with different modules and/or other configurations of modules perform one or more of the described techniques.
III. Using Quantization Bias that Accounts for Relations Between Quantization Bins and Transform Bins.
The present application describes techniques and tools for biasing quantization in ways that account for the relations between quantization bins and transform bins. For example, an encoder biases quantization using a pre-defined threshold to compensate for mismatch between transform bin boundaries and quantization bin boundaries during quantization. Mismatch compensation (also called misalignment compensation) can help the encoder reduce or avoid certain types of perceptual artifacts that occur during encoding. Or, an encoder adjusts a threshold used to control quantization bias so as to reduce blocking artifacts for certain kinds of content, e.g., dithered content.
A. Theory and Explanation.
During encoding, a frequency transform converts a block of input values to frequency transform coefficients. The transform coefficients include a DC coefficient and AC coefficients. Ultimately, for reconstruction during encoding or decoding, an inverse frequency transform converts the transform coefficients back to input values.
Transform coefficient values are usually quantized after the forward transform so as to control quality and bit rate. When the coefficient values are quantized, they are represented with quantization levels. During reconstruction, the quantized coefficient values are inverse quantized. For example, the quantization level representing a given coefficient value is reconstructed to a corresponding reconstruction point value. Due to the effects of quantization, the inverse frequency transform converts the inverse quantized transform coefficients (reconstruction point values) to approximations of the input values. In theory, the same approximations of the input values could be obtained by shifting the original transform coefficients to the respective reconstruction points then performing the inverse frequency transform, still accounting for the effects of quantization.
In some scenarios, encoders represent blocks of input values as DC-only blocks. For a DC-only block, the DC coefficient has a non-zero value and the AC coefficients are zero or quantized to zero. For DC-only blocks, the possible values of DC coefficients can be separated into transform bins. For example, suppose that for a forward transform, any input block having an average value
In quantization a DC coefficient value is replaced with a quantization level, and in inverse quantization the quantization level is replaced with a reconstruction point value. For some quantization step sizes and DC coefficient values, the original DC coefficient value and reconstruction point value are on different sides of a transform bin boundary, which can result in perceptual artifacts for DC-only blocks. For example, suppose for a particular quantization step size that any DC coefficient value in the range of:
So, a particular DC coefficient value on one side of a transform bin boundary can be quantized to a quantization level that has a reconstruction point value on the other side of the transform bin boundary. This happens when the original DC coefficient value is closer to that reconstruction point value than it is to the reconstruction point value on its other side. After the inverse transform, however, the reconstructed input values may deviate from expected reconstructed values if the DC coefficient value has switched sides of a transform bin boundary.
1. Example Forward and Inverse Frequency Transforms.
The quantization bias and mismatch compensation techniques described herein can be implemented for various types of frequency transforms. For example, in some implementations, the techniques described herein are used in an encoder that performs frequency transforms for 8×8, 4×8, 8×4 or 4×4 blocks using the following matrices and rules.
The encoder performs forward 4×4, 4×8, 8×4, and 8×8 transforms on a data block Di×j (having i rows and j columns) as follows:
{circumflex over (D)}
4×4=(T4·D4×4·T4′)∘N4×4 for a 4×4 transform,
{circumflex over (D)}
8×4=(T8·D8×4·T4′)∘N8×4 for a 8×4 transform,
{circumflex over (D)}
4×8=(T4·D4×8·T8′)∘N4×8 for a 4×8 transform, and
{circumflex over (D)}
8×8=(T8·D8×8·T8′)∘N8×8 for a 8×8 transform,
where · indicates a matrix multiplication, ∘Ni×j indicates a component-wise multiplication by a normalization factor, T′ indicates the inverse of the matrix T, and {circumflex over (D)}i×j represents the transform coefficient block. The values of the normalization matrix Ni×j are given by:
N
i×j
=c
i
′·c
j,
where:
To reconstruct a block RM×N that approximates the block of original input values, the inverse transform in these implementations is performed as follows:
E
M×N=(DM×N·TM+4)>>3, and
R
M×N=(TN′·EM×N+CN·IM+64)>>7,
where M and N are 4 or 8, >> indicates a right bit shift, C8=(0 0 0 0 1 1 1 1)′, C4 is a zero column vector of length 4, and IM is an M length row vector of ones. The reconstructed values are truncated after right shifting, hence the 4 and 64 for the effect of rounding.
Alternatively, the encoder uses other forward and inverse frequency transforms, for example, other integer approximations of DCT and IDCT.
2. Numerical Examples.
Suppose an 8×8 block of sample values includes 39 samples having values of 17 and 25 samples having values of 16. During encoding, the input values are scaled by 16 and converted to transform coefficients using an 8×8 frequency transform as shown the previous section. The original value of the DC coefficient for the block is 1889.77777, which is rounded up to 1890:
The transform coefficients for the block are quantized. Suppose the DC coefficient is quantized using a quantization parameter stepsize=2, and the applied quantization step size is 2×stepsize. Since the sample values were scaled up by a factor of 16, the quantization step size is also scaled up by a factor of 16. Quantization produces a quantization level of 29.53125, which is rounded up to 30: 1890÷(4×16)≈30. The AC coefficients are zero or quantized to zero, as the block is a DC-only block.
During reconstruction of the DC coefficient value, the quantization level for the DC coefficient is inverse quantized, applying the same quantization step size used in encoding, resulting in a reconstruction point value of 120. 30×4=120. (The scaling factor of 16 is not applied.)
To reconstruct the 8×8 block of sample values, an inverse frequency transform is performed on the reconstructed transform coefficients (specifically, the non-zero DC coefficient value and zero-value AC coefficients for the DC-only block). The sample values of the block are computed as 17.375, which is truncated to 17. (12×((12×120+4)>>3)+64)>>7≈17. Each of the reconstructed input values has the integer value expected for the block—17—since the average value for the input block was (39×17+25×16)/64=16.61.
In other cases, however, the reconstructed input values have a value different than expected. For example, suppose an 8×8 block of sample values includes 37 samples having values of 17 and 27 samples having values of 16. The average value for the input block is (37×17+27×16)/64=16.58, and one might expect the reconstructed sample values to have the integer value of 17. For some quantization step sizes, this is not the case.
During encoding, the input values are scaled by 16 and converted to transform coefficient values using the same 8×8 transform. The original value of the DC coefficient for the block is 1886.2222, which is rounded down to 1886:
The DC coefficient for the block is quantized, with stepsize=2 (and an applied quantization step size of 64), resulting in a quantization level of 29.46875, which is rounded down to 29: 1886÷(4×16)≈29. The AC coefficients are zero or quantized to zero, as the block is a DC-only block.
During reconstruction of the DC coefficient value, the quantization level for the DC coefficient is inverse quantized, resulting in a reconstruction point value of 116. From this DC value, the sample values of the block are computed as 16.8125, which is truncated to 16. (12×((12×16+4)>>3)+64)>>7≈16. Thus, each of the reconstructed values for the block—16—is different than expected value of 17. This happened because, of the two reconstruction point values closest to 1886 (which are 1856 and 1920), 1856 is closer to 1886, and 1856 and 1886 are on different sides of a transform bin boundary. Although an inverse frequency transform of a DC-only block with DC coefficient value 1856 results in sample values of 16, an inverse transform when the DC coefficient value is 1886 results in sample values of 17.
In
The original DC coefficient value of 1886 is above the transform bin boundary between 1877 and 1878, but falls within the quantization bin at 1824 to 1887. As a result, the DC coefficient value is effectively shifted to the reconstruction point value 1856 (after quantization and inverse quantization), which is on the other side of the transform bin boundary.
In
B. Solutions.
Techniques and tools are described to improve quantization by biasing the quantization to account for relations between quantization bins and transform bins. For example, a video encoder biases quantization to compensate for mismatch between quantization bin boundaries and transform bin boundaries when quantizing DC coefficients of DC-only blocks. Alternatively, another type of encoder (e.g., audio encoder, image encoder) implements one or more of the techniques when quantizing DC coefficient values or other coefficient values.
Compensating for misalignment between quantization bins and transform bins helps provide better perceptual quality in some encoding scenarios. For DC-only blocks, mismatch compensation allows an encoder to adjust quantization levels such that the reconstructed input value for a block is closest to the average original input value for the block, where mismatch between quantization bin boundaries and transform bin boundaries would otherwise result in a reconstructed input value farther away from the original average.
Or, biasing quantization can help reduce or even avoid blocking artifacts that are not caused by boundary mismatches. For example, suppose a relatively flat region includes blocks that each have a mix of 16-value samples and 17-value samples, where the averages for the blocks vary from 16.45 to 16.55. When encoded as DC-only blocks and quantized with mismatch compensation, some blocks may be reconstructed as 17-value blocks while others are reconstructed as 16-value blocks. If a user is given some control over the threshold for quantization bias, however, the user can set the threshold so that all blocks are 17-value blocks or all blocks are 16-value blocks. Since reconstructing the fine texture for the blocks is not possible given encoding constraints, reconstructing the blocks to have the same sample values can be preferable to reconstructing the blocks to have different sample values.
The encoder then quantizes (630) the transform coefficient values. For example, the encoder uses uniform scalar quantization or some other type of quantization. In doing so, the encoder sets a quantization level for a first transform coefficient value (e.g., DC coefficient value) of the transform coefficients. When setting the quantization level, the encoder biases quantization in a way that accounts for the relations between quantization bins and transform bins. For example, the encoder follows one of the three approaches described below. In the first approach, during quantization, an encoder detects boundary mismatch problems using static criteria and compensates for any detected mismatch problems “on the fly.” In the second approach, an encoder uses a predetermined offset table that indicates offsets for different DC coefficient values to compensate for misalignment between quantization bins and transform bins. In the third approach, an encoder uses adjustable thresholds to control the quantization bias. Alternatively, the encoder uses another mechanism to bias quantization.
Each of
1. On-the-Fly Mismatch Compensation Using Static Criteria.
In some embodiments, an encoder detects mismatch problems using static criteria and dynamically compensates for any detected mismatch problems. The encoder can detect the mismatch problems, for example, using sample domain comparisons or transform domain comparisons.
a. Sample-Domain Comparisons.
With reference to
The encoder finds (730) the two reconstruction point values next to the DC coefficient value. For each of the two reconstruction point values, the encoder performs (740) an inverse frequency transform, producing a reconstructed value x′ for the samples in the block, or the encoder otherwise computes the reconstructed value x′ for the reconstruction point value.
For each of the two reconstruction point values, the encoder compares (750) the reconstructed value x′ for the samples of the block to the original average value
With reference to
b. Transform-Domain Comparisons.
In a mismatch compensation approach with transform-domain comparisons, the encoder computes a DC coefficient value. Before the DC coefficient value is quantized, the encoder shifts the DC coefficient value to the midpoint of the transform bin that includes the DC coefficient value. The shifted DC coefficient value (now the transform bin midpoint value) is then quantized. One way to find the transform bin that includes the DC coefficient value is to compare the DC coefficient value with the two transform bin midpoints on opposite sides of the DC coefficient value.
With reference to
For example, with reference to
2. Mismatch Compensation with Predetermined Offset Tables.
In some embodiments, an encoder uses an offset table when compensating for mismatch between transform bin boundaries and quantization bin boundaries for quantization. The offset table can be precomputed and reused in different encoding sessions to speed up the quantization process. Compared to the “on-the-fly” mismatch compensation described above, using lookup operations with an offset table is typically faster and has lower complexity, but it also consumes additional storage and memory resources for the offset table. In some implementations, the size of the offset table is reduced by recognizing and exploiting periodic patterns in the offsets.
a. Using Offset Tables.
Next, the encoder looks up (930) an offset for the DC coefficient value and, if appropriate, adjusts (940) the quantization level using the offset table. For example, the offset table is created as described below with reference to
Thus, in the technique (900), a mismatch compensation phase is added to the normal quantization process for the DC coefficient value. In some implementations, the encoder looks up the offset and adds it to the quantization level levelold as follows.
levelnew=levelold+offset8×8[stepsize][DC];
where offset8×8 is a two-dimensional offset table computed for a particular 8×8 frequency transform. The offset table is indexed by quantization step size and DC coefficient value. In these implementations, different offsets are computed for each DC coefficient for each possible quantization step size.
The preceding examples of offset tables store offsets to be applied to quantization levels, where the offsets are indexed by DC coefficient value. Alternatively, an offset table stores a different kind of offsets. For example, an offset table stores offsets to be applied to DC coefficient values to reach an appropriate transform bin midpoint, where the offsets are indexed by DC coefficient value. Moreover, although the offset tables described herein are typically used for mismatch compensation, different offsets can be computed for another purpose, for example, to bias quantization of DC coefficients more aggressively towards zero and thereby reduce blocking artifacts that often occur when dithered content is encoded as DC-only blocks.
b. Preparing Offset Tables.
In some embodiments, an encoder or other tool computes offsets off-line and stores the offsets in one or more offset tables for reuse during encoding. Different offset tables are typically computed for different size transforms. For example, the encoder or other tool prepares different offset tables for 8×8, 8×4, 4×8 and 4×4 transforms that the encoder might use. An offset table can be organized or split into multiple tables, one for each possible quantization step size.
In particular,
The tool then finds (1050) an adjusted quantization level (1055), level′, to be used in the offset determination process. The value of level′ is selected so that level′ and level have reconstruction points on opposite sides of DC (1015). For example, if the reconstructed DC coefficient (1025) is less than DC (1015), then level′ is level+1. Otherwise, level′ is level−1.
The tool inverse quantizes (1060) level′ (1055), producing a reconstruction point (1065) for the adjusted level. The tool inverse transforms (1070) a DC-only block that has the level′ reconstruction point (1065) for its DC coefficient value, producing a reconstructed input value (1075) for the block, shown as {circumflex over (x)}′ in
Suppose the adjusted level (1055) is above the initial level (1025) (i.e., level′ is level+1). If the absolute difference between the reconstructed input value {circumflex over (x)}′ (1075) and the original input average
When the adjusted level (1055) is below the initial level (1025) (i.e., level′ is level−1), the offset is −1 or 0. If the absolute difference between {circumflex over (x)}′ (1075) and
For example, referring again to
As another
Returning to
The tool organizes the offsets into lookup tables. For example, the tool organizes the offsets in a three-dimensional table with indices for transform size, quantization step size, and DC coefficient value. Or, the tool organizes the offsets into different tables for different transform sizes, with each table having indices for step size and DC coefficient value. Or, the tool organizes the offsets into different tables for different transform sizes and quantization step sizes, with each table having an index for DC coefficient value.
c. Reducing Offset Table Size.
For many types of frequency transforms, the offsets for possible DC coefficient values at a given quantization step size exhibit a periodic pattern. The encoder can reduce table size by storing only the offset values for one period of the pattern. For example, for one implementation of the 8×8 transform described in section III.A, the pattern of −1, 0 and +1 offsets repeats every 1024 values for the DC coefficient. During encoding, the encoder looks up the offset and adds it to the quantization level levelold as follows:
levelnew=levelold+offset8×8[stepsize][(DC−DCminimum)&1023],
where offset8×8 has 1024 offsets per quantization step size. The minimum allowed DC coefficient value, DCminimum, and bit mask operation (& 1023) are used to find the correct position in the periodic pattern for DC. The index is given by (DC−DCminimum) & 1023, which provides the least significant 10 bits of the difference DC−DCminimum.
In one example table, offset8×8[2][1024] has offsets of 0 in each position except the following, in which the offset is 1 or −1:
When the offset tables are computed, periodic patterns can be detected by software analysis of the offsets or by visual analysis of the offset patterns by a developer. Alternatively, the encoder or other tool uses a different mechanism to exploit periodicity in offset values to reduce lookup table size. Or, the offset tables are kept at full size.
3. Quantization Bias with Adjustable Boundaries.
There are many different approaches to biasing quantization in ways that account for the relations between quantization bins and transform bins. Some approaches use predetermined offsets (e.g., as in
Using predetermined adjustments (as in the offset tables of
Using static criteria for deciding what to adjust (e.g., as in
Similarly, mismatch compensation (e.g., as in
Thus, in some embodiments, an encoder uses adjustable thresholds to bias quantization. For example, the encoder adjusts a threshold that effectively changes how DC coefficient values are classified in transform bins for purposes of quantization decisions for DC-only blocks. Whereas the static threshold examples described herein account for misalignment between transform bin boundaries and quantization bin boundaries, the adjustable threshold more generally allows control over the bias of quantization for DC coefficients in DC-only blocks.
In some implementations, the user is allowed to vary the threshold during encoding or re-encoding to react to blocking artifacts that the user perceives or expects. In general, an on/off control for mismatch compensation can be exposed to a user as a command line option, encoding session wizard option, or other control no matter the type of quantization bias used. When bias thresholds are adjustable, another level of control can be exposed to the user. For example, the user is allowed to control thresholds for quantization bias for DC-only blocks on a scene-by-scene basis, picture-by-picture basis, or some other basis. In addition to setting a threshold parameter, the user can be allowed to define regions of an image in which the threshold parameter is used for quantization for DC-only blocks. In other implementations, the encoder automatically detects blocking artifacts between DC-only blocks and automatically adjusts the threshold to reduce differences between the blocks.
a. Using Adjustable Thresholds.
Next, the encoder computes (1120) or otherwise gets the DC coefficient value for the block and finds (1130) the distance between one or more transform bin midpoints and the DC coefficient value for the block. In some implementations, the encoder finds just the distance between the DC coefficient value and the transform bin midpoint lower than it. In other implementations, the encoder finds the distances between the DC coefficient value and the transform bin midpoint on each side of the DC coefficient value.
The encoder compares (1140) the distance(s) to the threshold. The encoder selects (1150) one of the transform bin midpoints and quantizes the selected midpoint, producing a quantization level to be used for the DC coefficient value. For example, the encoder determines if the distance between the DC coefficient value and transform bin midpoint lower than it is less than the threshold. If so, the midpoint is used for the DC coefficient value. Otherwise, the transform bin midpoint higher than the DC coefficient value is used for the DC coefficient value.
In this way, the encoder biases quantization of the DC coefficient value in a way that accounts for the relations between quantization bins and transform bins. The encoder shifts the DC coefficient value to the middle of a transform bin, selected depending on the threshold, and performs quantization. The resulting quantization level depends on the quantization bin that includes the transform bin midpoint.
b. Example Pseudocode.
To start, the routine computes an intermediate input-domain value from iDC. The intermediate value is an integer truncated such that it indicates the reconstructed value for the adjacent transform bin midpoint closer to zero than iDC. For example, if iDC=1886, the value of 16.58 is truncated to 16 (the reconstructed input value for the transform bin midpoint 1820).
If iDC is negative, the difference between the transform bin midpoint closer to zero and iDC is computed. If the difference is greater than iDCThresh, the intermediate value is decremented such that it is the reconstructed value for the adjacent transform bin midpoint farther from zero than iDC. The transform bin midpoint for the intermediate value is computed and then quantized according to iDCStepSize. For example, if iDC=−1886, and the adjacent transform bin midpoint closer to zero is −1820 (for an intermediate value of −16), the difference is −1820-−1886=66. If 66 is greater than iDCThresh, the intermediate value is changed to −17. Otherwise, the intermediate value stays at −16. When iDCStepSize=64 and iDCThresh=62, then iQuantLevel=−30, after truncation: ((−17×116495>>10)−32)/64=−30.
If iDC is not negative, the difference between iDC and the transform bin midpoint closer to zero is computed. If the difference is greater than iDCThresh, the intermediate value is incremented such that it is the reconstructed value for the adjacent transform bin midpoint farther from zero than iDC. The transform bin midpoint for the intermediate value is computed and then quantized according to iDCStepSize. For example, if iDC=1886, and the adjacent transform bin midpoint closer to zero is 1820 (for an intermediate value of 16), the difference is 1886−1820=66. If 66 is greater than iDCThresh, the intermediate value is changed to 17. Otherwise, the intermediate value stays at 16. If iDCStepSize=64 and iDCThresh=62, then iQuantLevel=30, after truncation: ((17×116495>>10)+32)/64=30.
As another example, if iDC=1876, the adjacent transform bin midpoint closer to zero is 1820 and the intermediate value is initially 16. If iDCThresh=62, the difference of 56 is not greater than iDCThresh, and the intermediate value is unchanged. iQuantLevel=28, after truncation: ((16×116495>>10)+32)/64=28. In this example, despite the fact that 1876 falls within the quantization bin for the quantization level 29, the iDC is assigned quantization level 28. This is because the selected transform bin midpoint, 1820, is within the quantization bin for the quantization level 28.
In the pseudocode of
As noted above, in
Although the techniques and tools described herein are in places presented in the context of video encoding, quantization bias (including mismatch compensation) for DC-only blocks can be used in other types of encoders, for example audio encoders and still image encoders. Moreover, aside from DC-only blocks, quantization bias (including mismatch compensation) can be used for DC coefficients of blocks that have one or more non-zero AC coefficients.
The forward transforms and inverse transforms described herein are non-limiting. The described techniques and tools can be applied with other transforms, for example, other integer-based transforms.
Having described and illustrated the principles of our invention with reference to various embodiments, it will be recognized that the various embodiments can be modified in arrangement and detail without departing from such principles. It should be understood that the programs, processes, or methods described herein are not related or limited to any particular type of computing environment, unless indicated otherwise. Various types of general purpose or specialized computing environments may be used with or perform operations in accordance with the teachings described herein. Elements of embodiments shown in software may be implemented in hardware and vice versa.
In view of the many possible embodiments to which the principles of the disclosed invention may be applied, it should be recognized that the illustrated embodiments are only preferred examples of the invention and should not be taken as limiting the scope of the invention. Rather, the scope of the invention is defined by the following claims. We therefore claim as our invention all that comes within the scope and spirit of these claims.