The present invention relates to a method and apparatus for calculating the sum of squared differences between an original block and a reconstructed block of image or video data and/or a method and apparatus for determining the distortion caused by encoding a block of image or video data. The present invention also proposes an apparatus and method for encoding image or video data, in which an appropriate encoding mode is selected from a plurality of possible encoding modes. The method and apparatus may be used for encoding a H.264/AVC video signal, but the present invention is not limited to that format and may be used to encode other formats of video and image data as well.
A video signal typically comprises a series of frames, each showing an image at one instant in time. When the frames are displayed quickly in succession they give the impression of a moving image. In order to reduce the data rate required to store or send a video signal, compression algorithms (commonly known as a ‘codecs’) are used to encode the data. Such compression algorithms typically divide each frame into a number of smaller blocks, each of which is encoded.
Color video images typically comprise several color planes, for example a RGB image comprises red, green and blue color planes, which when overlaid or mixed make up the full color image. Video applications commonly use a color scheme in which the planes do not correspond to specific colors. Instead one plane corresponds to luminance (the brightness of each pixel in the color image), and the other planes—usually two of them—contain certain color information (chrominance). When the chrominance information is combined with the luminance information, the color image can be derived and displayed, either directly or by first converting the information into separate RGB levels. The reason that a luminance-chrominance system is commonly used is that human perception is much more sensitive to differences in luminance than chrominance. Therefore video compression algorithms typically encode the chrominance information at a lower resolution than the luminance information, in order to reduce the amount of data needed, without unduly affecting image quality. Such blocks of data with differing resolutions of luminance and chrominance data are called ‘macroblocks’. A typical macroblock may, for example, have two planes of chrominance data at half the vertical and half the horizontal resolution of the luminance data. However, in this patent specification the term ‘macroblock’ is used to mean any block of image data that has chrominance data at less resolution than luminance data.
Data in the blocks of the video frame is typically encoded by use of a transform, which transforms the data into frequency space. A Discrete Cosine Transform (DCT) is often used for this purpose, but other types of transform may be used instead. The human eye is less sensitive to information contained in the high frequency components and therefore some information relating to the higher frequencies may be discarded or encoded using fewer bits, in order to reduce the amount of data. Once this is done the transformed block may be quantized, by scaling the transform coefficients to the nearest of a number of predetermined values. For example, if the transform coefficients are between −1 and 1, then scaling the coefficient by 20 and rounding to the nearest integer quantizes the coefficient to the nearest of 41 quantization points (from −20 to +20, including 0).
After quantization, the number of bits required to encode the data is reduced further by taking advantage of certain statistical properties of the quantized data. This process is called ‘entropy encoding’. For example, after quantization, many of the coefficients may have a value of zero; a type of entropy encoding called run-length coding, takes account of consecutive zero coefficients and encodes the length of each such ‘run’, rather than encoding each zero value separately. Other types of entropy encoding which take advantage of the statistical properties of the data, for example variable length encoding (VLC) or arithmetic coding, may also be used. The above describes simple encoding methods in which each block in each frame is encoded independently of the other frames. This method of encoding is still used, however most modern compression algorithms allow a variety of different block encoding modes, any of which may be used to encode a particular block.
An intra encoding mode is a mode in which each block is encoded on the basis of data held within that block (the source block) and on the basis of data in other blocks (reference blocks) in the same frame. The encoding process may work as follows. The contents of the source block are predicted on the basis one or more reference blocks in the same frame (this is called intra prediction). The difference between the predicted block and the source block is called a residual block. The residual block is encoded by image transforming, quantizing and entropy encoding, as explained in the paragraphs above. The encoded residual block is stored together with coding data identifying the reference blocks and identifying the encoding mode used for the intra prediction. During decoding the predicted block is computed from the coding data and the source block is reconstructed by adding the (decoded) residual block to the predicted block. There may be several different possible intra encoding modes based on different block sizes or different positions of the reference block(s) relative to the source block.
An inter encoding mode makes use of the fact that in a video signal there are often substantial similarities between successive frames, for example, in areas of the image in which there is no movement, or areas relating to a moving object which translates in position between successive frames. An inter encoding mode ‘predicts’ the content of a particular block (a source block) on the basis of another block (called a reference block) in a different frame (which may be one or more frames before or after the frame containing the block being predicted). This is called inter-prediction as the prediction is based in blocks in other frames. The residual block is the difference between the predicted block and the source block. The residual block is encoded by using an image transform, quantizing and entropy encoding. The encoded residual block is stored together with coding data identifying the reference block used and the particular inter-prediction mode used. The coding data may for example comprise a motion vector relating the reference block and the predicted block. During decoding the predicted block is computed from the coding data and the source block is reconstructed by adding the predicted block to the (decoded) residual block. There may be several different possible inter-prediction modes, each based on different block sizes, different reference blocks or different frames relative to the source block.
A skip mode is a special case of an inter encoding mode. It relates a source block directly to a reference block in another frame (i.e. the two are predicted to be identical). Thus, the source block is predicted to have exactly the same contents as the reference block. The source block may then be encoded as data indicating that it is a skip mode and data indicating the identity of the reference block. Decoding is carried out by finding the identity of the reference block and copying its data to form the reconstructed block.
It can be useful to know the distortion caused by encoding a block of video or image data; for example, if it is desired to encode an image but retain a given image quality. The distortion is typically measured as the sum of squared differences between coefficients of the original source block and the coefficients of the reconstructed block. Knowing the distortion is also useful when deciding which encoding mode to use, as will be explained below.
It is important to select the best mode for encoding each block, as this is an important factor in the performance of the compression algorithm. There are two principal considerations when selecting the block encoding mode; the first is the distortion which results from the encoding (i.e. the difference between the source image and the reconstructed image after decoding) and the second is the number of bits required to encode the block. Sometimes the latter consideration is referred to as ‘bit rate’, which is the number of bits required per second required to transmit the image at a given resolution. The bit rate is related to the overall number of bits required to encode the block. It is necessary not only to select between inter, intra and skip modes, but also to select the best type of inter encoding or best type of intra encoding.
One known theoretical method of choosing the best block encoding mode is to compute the rate-distortion cost of all the possible modes. The rate-distortion cost is a parameter, which takes account of both the distortion caused by the encoding and the number of bits required to encode the block.
It is possible to encode and decode each block to find the distortion and bit rate for each mode directly. For example, in the H.264/AVC encoding process, the best macroblock encoding mode may be selected by computing the rate-distortion cost of all possible modes. The best mode is typically the one with minimum rate-distortion cost. The rate distortion cost for a given mode may be defined as:
J
RD
=SSD(S,C)+λ·R (EQUATION 1)
where JRD represents the rate distortion, A is the Lagrange multiplier, R is the number of bits required to encode the block according to that mode, and the SSD(S,C) is the sum of the squared differences (SSD) between the original blocks S and the reconstructed block C when that encoding mode is used. The sum of squared differences can be expressed as:
where sij and cij are the (i,j)th elements of the current original block S and the reconstructed block C, respectively. Moreover, N is the image block size (N=4 in H.264/AVC standard) and ∥ ∥F is Frobenius norm. We shall call the SSD(S,C) a spatial-domain SSD since the distortion computation is performed in spatial-domain pixel values. The inventors have found that the computation of a spatial-domain SSD is very time-consuming, since it is necessary to obtain the reconstructed block after Transformation—Quantization—Inverse Quantization—Inverse Transformation—Pixel Reconstruction for each possible mode. The above method of finding the best mode by calculating the SSD (S,C) and bit rate directly for each mode is called Rate Distortion Optimization (RDO). It can find the best mode accurately, but takes a lot of time and processing power.
To accelerate the coding process, the JVT reference software version JM 6.1d estimates the rate-distortion cost by using a fast SAD-based cost function instead:
where SAD(S,P) is the sum of absolute differences between the original block S and the predicted block P. λ1 is an approximate exponential function of the quantization parameter (QP) which is almost the square of λ, and K is equal to 0 for the probable mode and 1 for the other modes. The SAD(S,P) is expressed by:
where sij and pij are the (i,j)th elements of the current original block S and the predicted block P, respectively. This SAD-based cost function could save a lot of computations as the distortion part is based on the differences between the original block and the predicted block instead of the reconstructed block. However, this computation reduction usually comes with a quite significant degradation of coding efficiency. To achieve better rate-distortion performance, JM6.1d also provided an alternative SATD-based cost function:
where SATD(S,P) is the sum of absolute Hadamard-transformed difference between the original block S and the predicted block P, which is given by:
where hij are the (i, j)th element of the Hadamard transformed image block H which is the difference between the original block S and the predicted block P. The Hadamard transformed block H is defined as:
Experimental results show that the JSATD can achieve better rate-distortion performance than the JSAD, but its overall rate-distortion performance is still lower than the optimized JRD (found by computing the rate distortion of each mode directly). Thus, neither SAD nor SATD-based functions can predict the real distortion accurately, and therefore they lead to selection of sub-optimum encoding modes which have a higher bit rate or higher distortion than the optimum.
A rate-distortion performance comparison of H.264/AVC using RDO-based, SAD-based and SATD-based cost functions for different QPs (quantization step sizes) and three well-known test sequences in terms of PSNR and bit-rate is shown in the table in
In summary, computing the rate-distortion cost (hereinafter also referred to as rate-distortion) of each mode directly from the source and reconstructed blocks takes a lot of processing power and is not practical to carry out in real time without high end computing hardware. Meanwhile using the SAD and SADT functions are not good at predicting the real rate-distortion caused by the encoding process and may result in sub-optimum modes being selected.
It would be desirable to have a quick and efficient way of determining the distortion caused by encoding a block and a quick and efficient method of encoding blocks using an encoding scheme that is capable of utilizing a plurality of different possible encoding modes and selecting the best mode for the job.
Accordingly, one aspect of the present invention proposes that the distortion is calculated in transform domain space, for example on the basis of a difference between the source and reconstructed blocks in frequency space. It is further advantageous if the quantization of the transformed block in frequency space is carried out with the aid of a look up table. In preferred embodiments of the invention it may be possible to carry out the quantization and/or inverse quantization step without multiplication or division functions, which take a lot of processing power.
A first aspect of the present invention provides a method for calculating the sum of squared differences between an original block S and a reconstructed block R of image or video data, the method comprising the steps of:
a) computing a predicted block P corresponding to the original block S, using inter or intra frame prediction;
b) calculating a residual block D from the original block S and the predicted block P, said residual block D having a plurality of coefficients;
c) applying an integer image transform to the coefficients of the residual block D so as to obtain a transformed residual block F*, said transformed residual block having a plurality of coefficients f*ij;
d) finding a plurality of coefficients {circumflex over (f)}*ij, the coefficients {circumflex over (f)}*ij being defined by the equation {circumflex over (f)}*ij=Q−1(Q(f*ij)), where Q is an operator which performs quantizing and Q−1 is the inverse of the Q operator; and
As the sum of squared differences is calculated in the residual domain, it is not necessary to reconstruct the block in order to calculate the distortion. This saves some processing time. Furthermore, as the sum of squared differences is calculated in the transform domain, further processing is saved as it is not necessary to carry out the inverse image transform to calculate the distortion. In addition, as the method uses an integer image transform, the coefficients f*ij, will be integers. The coefficients {circumflex over (f)}*ij may be integers or fractional numbers.
The quantizing performed by the operator Q may be pre-scaled quantizing. Pre-scaled quantizing means that the quantization step for each coefficient is scaled according to the location of the coefficient in the integer transformed residual block F*. This has the effect of making the integer image transform orthogonal (which is required by most video and image encoding processes). In other words, the necessary scaling of the integer transform is integrated into the quantization process.
For example, the ICT transform requires scaling to make it orthogonal. Carrying out pre-scaled quantizing on a ICT transformed block F*, gives the same quantization matrix Z as would be obtained by (unscaled) quantizing of a DCT transformed residual block. The same is true of some other integer image transforms and their discrete equivalents. However, not every integer image transform requires scaling; for example the Walsh and Hadamard transforms are integer image transforms which do not require scaling.
Preferably step d) is carried out with the aid of a look up table. As a look up table is used it may be possible to avoid multiplication, division and/or rounding operations in the quantization and/or inverse quantization processes. For example, it may be possible to carry out the quantization and/or inverse quantization by simple comparison of f*ij values with values held in the look up table. Preferably the look up table is referred to iteratively.
Preferably step (d) comprises the step of iteratively comparing each coefficient fij* of the transformed residual block to boundary points of quantization sub-zones stored in the look up table. A quantization sub-zone contains all the values of fij* between its upper and lower boundary points. The operator Q maps all the values of fij* within the sub-zone to a particular quantization point having a quantized value zij corresponding to that sub-zone. The operator Q−1 maps the quantized value zij to an inverse quantized value {circumflex over (f)}*ij corresponding to that sub-zone.
The look up table may comprise a plurality of quantization sub-zone boundaries each corresponding to a quantization point. In a preferred embodiment it may only be necessary for the look up table to relate each quantization point to a respective upper boundary, rather than both upper and lower boundaries. In the look up process, the value of each coefficient fij* may be iteratively compared with the upper boundaries of quantization points and when said upper boundary is greater than said coefficient, the coefficient fij* can be assigned a quantized value zij and an inverse quantized value {circumflex over (f)}*ij corresponding to the quantization sub-zone having that upper bounday. The quantized value may be stored in a look up table or based on the number of comparisons made. The inverse quantized value may be stored in the look up table or found by multiplying the quantized value with a quantization parameter Δij which may be stored in the look up table. The quantization parameter Δij may have different values for different i,j positions. In this way the Q and Q−1 operations can be carried out by using the look up table.
Preferably the absolute value of each coefficient fij* is iteratively compared to boundaries of quantization sub-zones having progressively higher quantization point values until said absolute value of said coefficient is lower than a quantization sub-zone boundary. Because an absolute value is used, the sign (positive or negative) of the coefficient fij* can be ignored in the look up process and introduced later to ensure that the {circumflex over (f)}*ij and/or zij values are given the right sign. This makes the comparison process simpler and quicker and simpler (as only positive values have to be compared in the look up process and the look up table does not have to contain negative values).
Preferably the iterative table-look up process is carried out in parallel for each coefficient fij* of the transformed residual block. This further enhances the efficiency of the computation process.
Preferably for each coefficient fij* of said transformed residual block, said quantization sub-zone boundaries referred to in the look up table, depend on the position of the coefficient fij* in the transformed residual block.
While it would be possible to find the inverse quantized value {circumflex over (f)}*ij directly from the look up table, without first finding the quantized value, it is preferred that the quantized value is found as well as described above. Once the quantized values have been found the numbers of bits required to entropy encode the quantized block Z can then easily be found. This information gives an indication of the effectiveness of the compression process and can be used to calculate the rate-distortion.
The method may include the step of entropy encoding the block Z. Any suitable type of entropy encoding may be used or considered by the method, for example VLR encoding, run length coding and arithmetic coding.
The source block may correspond to a single macroblock, contain one or more sub-sections of a macroblock, or contain several macroblocks.
The integer image transform may be any type of integer image transform. In a preferred embodiment an integer cosine transform (ICT) is used. Alternatively a Hadamard transform or a Walsh transform could be used. Other possibilities will be apparent to a person skilled in the art.
A second aspect of the present invention provides a method of calculating the distortion caused by encoding image or video data according to a first encoding mode, the method comprising carrying out the first aspect of the invention, wherein in step (a) the predicted block is computed according to said first encoding mode and wherein the calculated distortion is based on the sum of squared differences computed in step (e). This information can be used to decide whether the encoding mode is suitable or not.
A third aspect of the present invention provides a method of encoding video or image data, the method comprising defining an area in a frame of video or image data, calculating the distortion which would be caused by encoding said area according to a first encoding mode, said distortion being calculated by using the method of the second aspect of the present invention, comparing said distortion with the distortion which would be caused by encoding said area according to a second encoding mode, selecting one of said first and second encoding modes on the basis of said comparison, and encoding said data according to the selected encoding mode.
It is usually best to take account of not just the distortion which would be caused by a particular encoding mode, but also the bit rate required to encode a block according to that mode. Therefore, it is preferred that the second aspect of the present invention calculates the rate-distortion and the third aspect of the invention selects an encoding mode on the basis of a comparison of the rate distortions caused by the various possible encoding modes.
The rate-distortion caused for a particular encoding mode can be calculated based on the sum of squared differences computed in step (e) of the methods above and on the number of bits required to entropy encode the quantized transformed residual block Z.
Preferably the rate-distortion is calculated by the formula JRD=SSD+λ·R, where JRD is a parameter representing the rate distortion, SSD is the sum of squared differences computed in step (e), R is the number of bits required to entropy encode the block and λ is a Lagrange multiplier.
Preferably the method selects the encoding mode which will produce the least distortion of the block or the encoding mode which has the lowest rate-distortion.
Preferably the distortion or rate-distortion produced by the second block mode is estimated or calculated according to the methods described above.
A fourth aspect of the present invention provides a method for calculating the sum of squared differences between an original block S and a reconstructed block R of image or video data, the method comprising the steps of:
a) computing a predicted block P corresponding to the original block S, using inter or intra frame prediction;
b) calculating a residual block D from the original block S and the predicted block P, said residual block D having a plurality of coefficients;
c) applying an image transform to the coefficients of the residual block D so as to obtain a transformed residual block F, said transformed residual block having a plurality of coefficients;
d) finding the coefficients of a first matrix {circumflex over (F)}, said first matrix being an inverse quantized transform of said residual block; and
e) computing the sum of squared differences between said transformed residual block F and said first matrix {circumflex over (F)}.
This method is similar to the first aspect of the present invention, but less restrictive in that the image transform does not need to be an integer image transform and therefore pre-scaling may not be necessary. For example, the image transform may be a Discrete Cosine Transform. The method has the advantage that the sum of squared differences is calculated in the transformed residual domain and so the computations involved in reconstruction and the inverse transform are avoided.
Preferably the method further comprises the step of quantizing the coefficients of said transformed residual block F to obtain a quantized transformed residual block Z.
Preferably step (d) is carried out with the aid of a look up table.
A fifth aspect of the present invention provides a system for calculating the sum of squared differences between an original block S and a reconstructed block R of image or video data; the system comprising:
a) a predicting module for predicting a predicted block P, corresponding to an original block of data S, by using inter or intra prediction;
b) a residual block defining module for calculating a residual block D, which is the difference between said predicted block P and said original block S;
c) a transform module for performing an integer image transform on said residual block D to obtain a transformed residual block F* having a plurality of co-efficients f*ij;
d) an inverse quantization module for finding a plurality of coefficients {circumflex over (f)}*ij the coefficients {circumflex over (f)}*ij being defined by the equation {circumflex over (f)}*ij=Q−1(Q(f*ij)), where Q is an operator which performs quantizing and Q−1 is the inverse of the Q operator;
e) a difference function computing module for computing a sum of squared differences between f*ij and {circumflex over (f)}*ij.
The operator Q may perform pre-scaled quantization.
Preferably the inverse quantization module is arranged to operate by referring to a look up table stored in a memory of the system.
A sixth aspect of the present invention provides a system for encoding video or image data, the system comprising:
(f) a block defining module for defining a first area in the video or image data, said area comprising one or more blocks of data;
(g) a system according to the fifth aspect of the present invention for calculating the sum of squared differences between f*ij and {circumflex over (f)}*ij when a block of data in said area is encoded according to an encoding mode;
(h) a quantizing module for carrying out quantizing of the coefficients of the transformed residual block F* to obtain a quantized transformed residual block Z (i) an entropy encoding module for entropy encoding the quantized transformed residual block Z;
(j) a rate-distortion calculating module for calculating the rate-distortion of an encoding mode, based on the number of bits required to entropy encode one or more blocks of data according to said encoding mode and the sum of squared differences between f*ij and {circumflex over (f)}*ij for one or more blocks of data encoded according to said encoding mode;
(k) an encoding mode selection module for selecting an encoding mode for encoding the video or image data from a plurality of possible encoding modes, said module being arranged to select the encoding mode on the basis of a comparison of the respective rate-distortions of the possible encoding modes.
The quantizing module may be arranged for carrying out pre-scaled quantization.
The quantizing module and inverse quantizing module may be provided as a single module.
In both the above aspects of the invention, the modules carry out similar functions to the method of the first aspect of the present invention. The modules may be hardware, software or firmware elements or combinations thereof. For example, they may be circuits or parts of an integrated circuit, or may be software elements in a programmable logic device or in a program run on a computer.
A seventh aspect of the invention provides a system comprising hardware, software or firmware elements, arranged for carrying out a method according to any one of the first to fourth aspects of the present invention.
A eighth aspect of the present invention provides a program, stored on a computer-readable medium, the program including instructions for causing an apparatus to carry out the method of any one of the first to fourth aspects of the present invention.
The present invention relates to a method and apparatus for determining the distortion caused by encoding a block of video or image data. This may be conveniently calculated based on the squared sum or differences between a source block and a reconstructed block of image or video data. The rate-distortion of a particular inter or intra encoding mode may also be calculated on the basis of the distortion and the number of bits required to entropy encode a block which has been encoding according to the particular inter or intra encoding mode. Preferred embodiments of the present invention relate to a method of encoding image or video data, including selecting an encoding mode based on the rate-distortion of said mode.
The present invention may be utilized with any image or video encoding standard which uses inter or intra prediction. However, it may be particularly suitable for use with the new H.264/AVC standard. Accordingly this standard now will be briefly discussed, although it is to be understood that the present invention may be used with other encoding standards. H.264/AVC is one of the newest image and video encoding standards. H.264/AVC greatly outperforms the previous MPEG-1/2/4 and H.261/263 standards in terms of both picture quality and compression efficiency. In some implementations it may provide the same picture quality as DVD (or MPEG-2) video while only consuming about 25% of the storage space and its bit-rate is about half of that of the MPEG-4 advanced simple profile. To achieve this superior coding performance, H.264/AVC adopts many advanced techniques, such as directional spatial prediction for intra frame coding, variable and hierarchical block transform, arithmetic entropy coding, multiple reference frame motion compensation, deblocking, etc. It also uses 7 different block sizes for motion-compensation in the inter mode, and 3 different block sizes with various spatial directional prediction modes in the intra mode. The main critical process employed is the rate-distortion optimized mode decision technique which provides H.264/AVC much better coding performance in terms of compression efficiency and visual quality. To select the best macroblock coding mode, an H.264/AVC encoder needs to compute the rate-distortion cost of all possible modes, which involves computation of integer transform, quantization, variable length coding and pixel reconstruction processes. All of this processing explains the high computational complexity of rate-distortion cost calculation. Hence, the cost function computation makes H.264/AVC impossible to realize in real-time applications without high computing hardware.
Accordingly, a preferred embodiment of the present invention proposes a new fast sum of squared difference (FSSD) computation algorithm which uses an iterative table-lookup quantization process. This may reduce the complexity of the H.264 rate-distortion cost function calculation with good coding efficiency as compared with conventional methods of rate-distortion optimization. The proposed algorithm can also be combined with fast bit-rate estimation algorithm to further speed up the computation with minimal performance degradation.
In section II we give the inventors' analysis of the fundamental causes, which determine distortion. In Section III, a FSSD algorithm for calculating distortion is presented. Section IV describes certain preferred methods and apparatus for performing the methods and Section V presents simulation results generated by a preferred embodiment of the invention.
In this section, we analyze the major cause of the SSD (sum of squared differences) between the original block and reconstructed block in the rate-distortion cost function. One method of calculating the rate-distortion cost (JRD) for video and image encoding schemes, such as MPEG-like or H.264/AVC schemes, can be summarized as:
Compute the predicted block using inter or intra frame prediction: P
Using the original block S and the predicted block P to compute the residual (difference) block: D=S−P
Discrete Cosine Transform (DCT) the residual block:
F=DCT(D)=TDCTDTDTCT (EQUATION 9)
where TDCT is the DCT matrix which is a unitary matrix and TDCTT is the transported matrix of TDCT.
Quantization of the transformed residual block: Z=Q(F)
Entropy code of the Z to find the number of bits to encode the block: R=VLC(Z)
Inverse quantization: {circumflex over (F)}=Q−1(Z)
Inverse transform the inverse quantized block: {circumflex over (D)}=DCT−1({circumflex over (F)})
Compute the reconstructed image block: C={circumflex over (D)}+P
Calculate the R-D cost: JRD=SSD(S,C)+λ·R
Mathematically, the original block S and reconstructed block C can be expressed as:
S=D+P EQUATION (10)
C={circumflex over (D)}+P EQUATION (11)
where the P is the predicted block, D is the residual block and {circumflex over (D)} is the reconstructed residual block. Based on this relationship, the spatial-domain SSD(S,C) can be expressed as differential-domain SSD(D,{circumflex over (D)}):
SSD(S,C)=∥S−C∥F2=∥D+P−{circumflex over (D)}−P∥F2=∥D−{circumflex over (D)}∥F2=SSD(D,{circumflex over (D)}) (EQUATION 12)
That means the spatial-domain SSD(S,C) is equivalent to differential-domain SSD(D,{circumflex over (D)})). Based on this relationship, we can calculate the rate distortion cost in the differential-domain with JRD=SSD(D,{circumflex over (D)})+λ·R, which avoids computing the reconstructed image block (C={circumflex over (D)}+P).
Before we define the transform-domain SSD, we need to emphasize that the DCT matrix TDCT used in MPEG-like or H.264/AVC video coding is a unitary matrix, which has the property [16] of:
∥X∥F=∥TDCTXTDCTT∥F (EQUATION 13)
where X is a square matrix. As the DCT matrix TDCT is a unitary matrix, it is also possible to perform an inverse transform. According to Equations 12 and 13, we can also express the SSD in transform-domain as:
where F and {circumflex over (F)} are the transformed residual block and inverse quantized-transformed residual block. Equation 14 shows that the cause of the SSD is due to the quantization errors in the DCT transformed residual block {circumflex over (F)}. The reason behind this is that the quantization is applied to the transformed coefficients of F. That is why SAD(S,P) and SATD(S,P) cannot well predict the SSD(S,C). Both SAD(S,P) and SATD(S,P) are determined by the original block and predicted block, without considering the influence of quantization which the inventors have determined is the real reason of the SSD. For example, if the quantization step=18 and all the transform coefficients are multiples of 18, then the quantized coefficients would be the same as the inverse quantized coefficients without error. By way of example only if:
This demonstrates that even if most of the coefficients of F are large, SSD(S,C) can be zero if ∥F−{circumflex over (F)}∥F2=0. The example indicates that the elements of D or F are not directly related to the SSD(S,C) which is determined by the quantization error. That is why the rate-distortion performance of SAD-based and SATD-based cost functions, which do not take account of the quantization error, is sub-optimum.
On the other hand, the transform-domain SSD(F,{circumflex over (F)}) can be used to reduce the number of computations required for the rate distortion cost calculation; i.e. the rate distortion can be calculated using the equation JRD=SSD(F,{circumflex over (F)})+λ·R, which allows the inverse DCT transform and image block reconstruction to be avoided. Another advantage of using this transform-domain SSD(F, {circumflex over (F)}) is that, ignoring the clipping function applied in the practical computation, there should not be any performance degradation in terms of both coding efficiency and reconstructed image distortion as SSD(F,{circumflex over (F)}) and SSD(S,C) are theoretically equivalent.
In H.264/AVC, however, a DCT is not used as the image transform. Rather an integer image transform, the Integer Cosine Transform (ICT), is used instead to reduce the computational complexity. This is discussed below.
The practical implementation of the DCT and quantization process in H.264/AVC is a little bit different from Equation 9 and its architecture is shown in
In simple terms the ICT is an integer transform which is computationally easier to calculate than a DCT because unlike a DCT, all of its coefficients are integers. Therefore many division and/or floating point operations are avoided and the ICT may be realized by shift and addition operations only. Prior to quantization, scaling factors are applied to make the ICT equivalent to a DCT.
The present invention is not limited to cosine transforms and other types of transform can be used instead, for example an image a Hadamard Transform or a Walsh Transform. It is preferred to use an integer image transform. Some integer image transforms such as an ICT, will need scaling factors in order to make it an orthogonal transform, as being orthogonal is necessary requirement for most video and image encoding standards.
The relationship between the DCT and the ICT can be expressed as:
F=DCT(D)=CfDCTf⊕Qfor{right arrow over (w)}=ICT(D)⊕Qforw=F*⊕Qforw (EQUATION 15)
where, Cf is called ICT core matrices, Qforw is called scaling factors and F* is the ICT transformed block. The symbol ⊕ indicates the operator that each element of CfDCTf (or F*) is multiplied by the scaling factor in the corresponding position. The forward core and scale transform matrices are defined as:
where a=½, b=√{square root over (2/5)}. The purpose of carrying out an ICT rather than a DCT is to reduce the computation complexity, because the core transform of the ICT can be realized by shift and addition operations only without multiplication. The quantization process of Z=Q(F) for the transformed residual block F can be expressed as a rounding operation on each coefficient of F:
z
ij=round(fij/Δ) (EQUATION 16)
where zij and fij are coefficients of the quantized transform and unquantized transform blocks of Z and F, respectively. Δ is the quantization step size, which is determined by the QP factor (the quantized step size). On the other hand, the inverse quantization process of {circumflex over (F)}=Q−1(Z) can be expressed as an operation on each coefficients of Z:
{circumflex over (f)}
ij
=z
ij·Δ (EQUATION 17)
where {circumflex over (f)}ij are coefficients of the inverse quantized transformed block {circumflex over (F)}. In the inverse transform, the core matrix and scale matrix is not the same as those in forward transform.
{circumflex over (D)}=C
b
T({circumflex over (F)}⊕Qback)Cb (EQUATION 18)
where Cb and Qback are defined as:
In the H.264/AVC, scale transform and quantization are combined together to further reduce computational complexity.
where qij are the scale coefficients of the Qforw matrix and f*ij are coefficients of the ICT transformed block F* using the ICT core matrix.
As illustrated in the last section, the SSD(S,C) could be determined in transform domain using SSD(F, {circumflex over (F)})=∥F−{circumflex over (F)}∥F2, so it is unnecessary to calculate the distortion using the reconstructed block C. The computation of DCT-transformed F is, however, much more complex than the ICT-transformed F* as the scale transform contains fractional coefficients. Thus, if we can build a bridge between SSD(F, {circumflex over (F)}) and F*, then we can skip the calculation of F and {circumflex over (F)}. In order to achieve this purpose, we rearrange the quantization and inverse quantization processes of {circumflex over (F)} in terms of pre-scaled quantization and inverse quantization of F*. The coefficients of {circumflex over (F)} can be expressed as:
where Δij=Δ/qij·Δij is a quantization parameter and represents the scaled quantization step for ICT-transformed coefficients f*ij of F*. The quantization steps are not equal for all coefficients of F* and the values of Δij depend on both Δ and the coefficients qij of Qforw. The values of Δij differ for different positions as shown in
{circumflex over (f)}*
ij
=Q
−1(Q(f*ij))=Δij·round(f*ij/Δij) (EQUATION 21)
Based on this new relationship, we can reformulate the SSD(F, {circumflex over (F)}) in terms of ICT transformed coefficients f*ij as:
This equation indicates that SSD(F, {circumflex over (F)}) can be directly related to the quantization error of f*ij. Therefore, we can calculate SSD(F, {circumflex over (F)}) more easily as we do not need to obtain fij and {circumflex over (f)}ij. However, division and multiplication operations are required to perform the quantization and inverse quantization processes of f*ij. They are normally performed in the following operations:
In order to make use of the advantage given by Equation 22, we propose to use an iterative table-lookup method for simplifying the quantization process of f*ij; and computing of the SSD using {circumflex over (f)}*ij. Thus, the computationally intensive division and multiplication operations can be avoided.
Basically, the quantization process is to find the nearest quantization point as shown in
So, for example, quantization sub-zone 212 is bounded by boundaries 213 and 214 (also referred to as boundary points). Values of f*ij between those boundaries are given a quantized value zij of −2 and an inverse quantized value of −2 Δij. f*ij shown in the diagram is in a sub-zone having boundaries 238 and 242 and a quantization point 240. Therefore f*ij is given a quantized value of 1 and an inverse quantized value of Δij
Example quantization sub zone boundaries and corresponding inverse quantized values are shown in
The relation between Δij, quantization sub-zones and the inverse quantized values for an ICT transform is shown in the table of
A preferred embodiment of the invention proposes an iterative table-lookup quantization process to find f*ij corresponding sub-zone by searching from the zero point towards positive axis direction by comparing the absolute value off; with boundary points held in the look up table until f*ij is smaller than a certain boundary point.
After this iterative look up process, we can obtain the quantization value zij, which is equal to the number of comparison times. Besides, we can also get {circumflex over (f)}*ij by multiplying zij by a quantization parameter stored in the look up table or by referring directly to the look up table if the inverse quantized values are stored in the look table.
To use this approach, the boundary points of the quantization sub-zones and the inverse quantized values or the quantization parameter are generated in advance during an initial part of the encoding process and their values are stored in memory for table lookup. Thus, no extra computation is required in this quantization process for determining the quantization points and quantization sub-zones.
The overall rate-distortion cost computation process using this proposed iterative table-lookup quantization for FSSD computation can be summarized as:
Step 1: Compute the predicted block using inter or intra frame prediction: P
Step 2: Compute the residual (difference) block: D=S−P
Step 3: ICT transform the residual block: F*=ICT (D)=CfDCTf
Step 4: Set SSD=0 and for i=0 to N−1 and j=0 to N−1 determine the quantized coefficients zij and the SSD by the following iterative table-lookup quantization process:
Step I: Set k=0 and if f*ij<0 then set sign=−1, otherwise sign=1.
Step II: If |f*ij|≧(k+0.5)Δij, then k=k+1 and goto Step II;
Step III: Set zij=sign·k and {circumflex over (f)}*ij=sign k·Δij;
Step IV: SSD=SSD+qij2(f*ij−{circumflex over (f)}ij)2; If not the last f*ij coefficient, goto Step I.
Step 5: Entropy code of the Z to find the number of bits to encode the block: R=VLC(Z)
Step 6: Calculate the R-D cost: JRD=SSD+λ·R
In the above procedures, we assume that the boundaries of quantization sub-zones (having values (k+0.5)Δij for k=0, 1, 2, . . . ), the scaling factor or its square qij2 and the quantization parameter Δij for each position region of (i, j) are loaded in the encoder and stored in a look up table during the initial process. The above method has the advantage that arithmetic operations are avoided and only a simple comparison operation between |f*ij| and boundary points is required. Therefore, it is very suitable for hardware implementation.
An overview of the conventional SSD(S,C) calculation is shown in
In the conventional method in
In the preferred embodiment of the present invention shown in
Thus, in step 400 a predicted block P is predicted by intra or inter prediction from the original source block S. In step 410 the predicted block P is subtracted from the source block S to obtain a residual block D. In step 420 an integer image transform, for example an ICT, is performed on the residual block D to obtain an integer transformed residual block F* having coefficients f*ij. In step 440 a look up table is referred to iteratively to find the corresponding quantized coefficients zij which make up the quantized transformed residual block Z. The values {circumflex over (f)}*ij which make up a matrix {circumflex over (F)}* are then found either directly from the look up table or by multiplying the corresponding quantized coefficients zij by quantization point values found in the look up table.
Thus compared to the conventional method, many processor intensive processes are avoided. Furthermore, as mentioned above, use of a look up table enables the quantization to be carried out by a simple comparison operation. This is computationally efficient as addition, subtraction, division or multiplication operations may be avoided or at least minimized. Some of the benefit is obtained by carrying out the SSD in the residual transform domain (which avoids the inverse transform and reconstruction operations), however it is thought the most significant reduction in processing time comes from use of the look up table to carry out the quantization and inverse quantization operations (the quantization operation is a pre-scaled quantization operation if the transform is an ICT).
At the beginning of the encoding process, quantization table generator 550 generates a quantization table. The quantization table contains the boundary points of quantization sub zones for each (i,j) position region and the quantization point which each sub zone is associated with. For example, boundary points Δ/2a2, 3Δ/2a2 and 5Δ/2a2 and corresponding quantization points 0, 1, 2 respectively—see position region I in
The particular position regions, quantization points and boundaries in the tables of
Operation of the circuit 520 will now be discussed in detail, it being understood that the other circuits 530, 540 etc operate in the same way.
The RAM 522 contains the look up table referred to above and a pointer that points to the current boundary point. The current boundary point is loaded in a register 523 and compared with |f*ij| after the sign of fij is abstracted by a sign selector device 521. If the comparator 524 judges that |f*ij| is larger than the current boundary point, then a ‘Loop’ signal signal is issued which tells counter 523 to increment its count by one and causes the pointer in RAM 522 to shift forward and point to the next boundary point in the look up table. This process continues until the comparator 524 finds |f*ij| that is smaller than the current boundary point. The comparator 524 then generates an ‘Out’ signal which resets the pointer to the initial position and causes the counter 525 to output its accumulated count (which is the quantized value zij). The ‘Out’ signal also causes the register to output the current inverse quantized value, {circumflex over (f)}*ij (which may be stored in the look up table or calculated at that moment, for example by multiplying the quantized value by a quantization parameter stored in the look up table). From the structure in
A preferred embodiment of the present invention will now be described with reference to
In step 680 the sum of squared differences (SSD) between f*ij and {circumflex over (f)}*ij is computed. Preferably it is a weighted SSD, for example given by the equation
Steps 640, 650, 660 and 670 may be carried out in parallel for each of the i,j values. In step 690 the rate distortion is calculated from the SSD, which was computed in step 680, and the number of bits required to entropy encode the block, which was noted in step 660. The rate distortion may for example be calculated according to the equation JRD=SSD+λ·R; where JRD is the rate distortion, SSD is the sum of squared differences calculated in step 680, λ is a Lagrange multiplier and R is the number of bits required to entropy encode the block.
While
As mentioned above the area for which the rate-distortion is calculated may contain one or more blocks. The area may contain a different number of blocks according to the encoding mode used (i.e. the encoding mode determines the number and size of the blocks which the area is split into). However, as the rate-distortion for the entire area is computed (e.g. as the sum of the rate-distortion cost of all the blocks in the area), the rate-distortion of different encoding modes may be compared even if the different encoding modes produce different numbers and sizes of blocks.
Alternatively the area may just contain one block (e.g. one macroblock) in all the encoding modes. In that case the rate-distortion of the area according to a particular encoding mode is simply the rate-distortion of that block.
Source block defining module 900 receives or defines a source block S in the image or video data. Prediction module 910 predicts a predicted block P based on the source block S using inter or intra prediction according to one of a plurality of possible block encoding modes. Residual block defining module 920 computes a residual block D by subtracting the predicted block P from the source block S. The residual block computed by the module 920 is then output to an integer image transform module 930 which performs an integer image transform (e.g. an ICT) on the residual block D in order to obtain a transformed residual block F*. A quantizating and inverse quantizing module 940 iteratively refers to a look up table in order to perform pre-scaled quantizing of the coefficients of the transformed residual block F* which produces a plurality of quantized coefficients zij. The module 940 also inverse quantizes the coefficients zij to obtain inverse quantized coefficients {circumflex over (f)}*ij. The coefficients zij are output to an entropy encoding module 950 which entropy encodes the coefficients. Meanwhile a difference computing module 960 computes a weighted sum of squared differences (SSD) between f*ij and {circumflex over (f)}*ij, using the coefficients f*ij of the block F* computed by module 930 and the coefficients {circumflex over (f)}*ij computed by module 940. A rate-distortion calculating module then calculates the rate distortion on the basis of the sum of squared differences output by modules 960 and the number of bits which module 950 required to entropy encode the coefficients zij.
An embodiment of the invention using the proposed iterative table-lookup quantization processing and FSSD computation was tested using the first 100 frames from four video sequences (Akiyo, Foreman, Stefan and Container) all in QCIF format 176×144. They present different kinds of video sequences respectively: Akiyo (slow motion), Foreman and Container (medium motion), Stefan (high motion). The experiments were carried out in the JVT JM 8.3 encoder. Test parameters are listed below:
Test condition:
The percentage of reduced time of calculating distortion is defined as:
where TorgTOT is the computation time of the original H.264/AVC encoder using conventional spatial-domain SSD(S,C) algorithm; while TproposedTOT is the computation time of a H.264/AVC encoder using the proposed FSSD computation method according to an embodiment of the present invention.
The tables in
The reduction in encoding time for the low motion sequence (Akiyo) was more than that for the high motion sequence (Stefan). A possible reason for this is that since the prediction accuracy of low motion sequence is better than that of high motion sequence, its residue is smaller and so less time is spent on the iterative table-lookup operation.
Since the proposed FSSD computing method improved the computational efficiency of distortion measure with very little loss of rate-distortion performance, it could be combined with different types of H.264/AVC fast algorithms, such as fast inter/intra mode selection algorithms and rate estimation algorithm. Here, FSSD algorithm is combined with a conventional rate estimation method to reduce more computation complexity. The way to estimate the number of bits is:
Total_bits=α·total_zeros+β·total_coeff+SAD (EQUATION 26)
where total_zeros and total_coeff present the number of zeros and the number of non-zero coefficients respectively after quantization. SAD is the sum of absolute value of quantized transform coefficients. The experiences values of α and β are 1 and 3. Experimental results are listed in the tables in