This application relates generally to a video encoder/decoder (codec) and, more specifically, to a method and an apparatus for applying secondary transforms on enhancement-layer residuals.
Most existing image- and video-coding standards employ block-based transform coding as a tool to efficiently compress an input image or video signals. This includes standards such as JPEG, H.264/AVC, VC-1, and the next generation video codec standard HEVC (High Efficiency Video Coding). Pixel-domain data is transformed to frequency-domain data using a transform process on a block-by-block basis. For typical images, most of the energy is concentrated in low-frequency transform coefficients. Following the transform, a bigger step-size quantizer can be used for higher-frequency transform coefficients in order to compact energy more efficiently and attain better compression. Optimal transforms for each image block to fully de-correlate the transform coefficients are desired.
This disclosure provides a method and an apparatus for applying secondary transforms on enhancement-layer residuals.
A method includes receiving a video bitstream and a flag and interpreting the flag to determine a transform that was used at an encoder. The method also includes, upon a determination that the transform that was used at the encoder includes a secondary transform, applying an inverse secondary transform to the received video bitstream, where the inverse secondary transform corresponds to the secondary transform used at the encoder. The method further includes applying an inverse discrete cosine transform (DCT) to the video bitstream after applying the inverse secondary transform.
A decoder includes processing circuitry configured to receive a video bitstream and a flag and to interpret the flag to determine a transform that was used at an encoder. The processing circuitry is also configured to, upon a determination that the transform that was used at the encoder includes a secondary transform, apply an inverse secondary transform to the received video bitstream, where the inverse secondary transform corresponds to the secondary transform used at the encoder. The processing circuitry is further configured to apply an inverse DCT to the video bitstream after applying the inverse secondary transform.
A non-transitory computer readable medium embodying a computer program is provided. The computer program includes computer readable program code for receiving a video bitstream and a flag and interpreting the flag to determine a transform that was used at an encoder. The computer program also includes computer readable program code for, upon a determination that the transform that was used at the encoder includes a secondary transform, applying an inverse secondary transform to the received video bitstream, where the inverse secondary transform corresponds to the secondary transform used at the encoder. The computer program further includes computer readable program code for applying an inverse DCT to the video bitstream after applying the inverse secondary transform.
Before undertaking the DETAILED DESCRIPTION below, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document. The term “couple” and its derivatives refer to any direct or indirect communication between two or more elements, whether or not those elements are in physical contact with one another. The terms “transmit,” “receive,” and “communicate,” as well as derivatives thereof, encompass both direct and indirect communication. The temis “include” and “comprise,” as well as derivatives thereof, mean inclusion without limitation. The temi “or” is inclusive, meaning and/or. The phrase “associated with,” as well as derivatives thereof, means to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, have a relationship to or with, or the like. The term “controller” means any device, system or part thereof that controls at least one operation. Such a controller may be implemented in hardware or a combination of hardware and software and/or firmware. The functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. The phrase “at least one of,” when used with a list of items, means that different combinations of one or more of the listed items may be used, and only one item in the list may be needed. For example, “at least one of: A, B, and C” includes any of the following combinations: A, B, C, A and B, A and C, B and C, and A and B and C.
Moreover, various functions described below can be implemented or supported by one or more computer programs, each of which is formed from computer readable program code and embodied in a computer readable medium. The terms “application” and “program” refer to one or more computer programs, software components, sets of instructions, procedures, functions, objects, classes, instances, related data, or a portion thereof adapted for implementation in a suitable computer readable program code. The phrase “computer readable program code” includes any type of computer code, including source code, object code, and executable code. The phrase “computer readable medium” includes any type of medium capable of being accessed by a computer, such as read only memory (ROM), random access memory (RAM), a hard disk drive, a compact disc (CD), a digital video disc (DVD), or any other type of memory. A “non-transitory” computer readable medium excludes wired, wireless, optical, or other communication links that transport transitory electrical or other signals. A non-transitory computer readable medium includes media where data can be permanently stored and media where data can be stored and later overwritten, such as a rewritable optical disc or an erasable memory device.
Definitions for other certain words and phrases are provided throughout this patent document. Those of ordinary skill in the art should understand that in many if not most instances, such definitions apply to prior as well as future uses of such defined words and phrases.
For a more complete understanding of the present disclosure and its advantages, reference is now made to the following description taken in conjunction with the accompanying drawings, in which like reference numerals represent like parts:
As shown in
The quantized transform coefficients can be restored to residual values by passing through an inverse quantizer 130 and an inverse transform unit 132. The restored residual values can be post-processed by passing through a de-blocking unit 135 and a sample adaptive offset unit 140 and output as the reference frame 145. The quantized transform coefficients can be output as a bitstream 127 by passing through an entropy encoder 125.
As shown in
Each functional aspect of the encoder 100 and decoder 150 will now be described.
Intra-Prediction (units 111 and 172): Intra-prediction utilizes spatial correlation in each frame to reduce the amount of transmission data necessary to represent a picture. Intra-frame is essentially the first frame to encode but with a reduced amount of compression. Additionally, there can be some intra blocks in an inter frame. Intra-prediction is associated with making predictions within a frame, whereas inter-prediction relates to making predictions between frames.
Motion Estimation (unit 112): A fundamental concept in video compression is to store only incremental changes between frames when inter-prediction is performed. The differences between blocks in two frames can be extracted by a motion estimation tool. Here, a predicted block is reduced to a set of motion vectors and inter-prediction residues.
Motion Compensation (units 115 and 175): Motion compensation can be used to decode an image that is encoded by motion estimation. This reconstruction of an image is performed from received motion vectors and a block in a reference frame.
Transform/Inverse Transform (units 120, 132, and 170): A transform unit can be used to compress an image in inter-frames or intra-frames. One commonly used transform is the Discrete Cosine Transform (DCT). Another transform is the Discrete Sine Transform (DST). Optimally selecting between DST and DCT based on intra-prediction modes can yield substantial compression gains.
Quantization/Inverse Quantization (units 122, 130, and 165): A quantization stage can reduce the amount of information by dividing each transform coefficient by a particular number to reduce the quantity of possible values that each transform coefficient value could have. Because this makes the values fall into a narrower range, this allows entropy coding to express the values more compactly.
De-blocking and Sample adaptive offset units (units 135, 140, and 182): De-blocking can remove encoding artifacts due to block-by-block coding of an image. A de-blocking filter acts on boundaries of image blocks and removes blocking artifacts. A sample adaptive offset unit can minimize ringing artifacts.
In
As shown in
Following the prediction, the transform unit 120 can apply a transform in both the horizontal and vertical directions. The transform (along horizontal and vertical directions) can be either DCT or DST depending on the intra-prediction mode. The transform is followed by the quantizer 122, which reduces the amount of information by dividing each transform coefficient by a particular number to reduce the quantity of possible values that a transform coefficient could have. Because quantization makes the values fall into a narrower range, this allows entropy coding to express the values more compactly and aids in compression.
Scalable video coding is an important component of video processing because it provides scalability of video in various fashions, such as spatial, temporal, and SNR scalability.
As shown in
The BL bitstream can be decoded at devices with relatively low processing power (such as mobile phones or tablets) or when network conditions are poor and only BL information is available. When the network quality is good or at devices with relatively greater processing power (such as laptops or televisions), the EL bitstream is also decoded and combined with the decoded BL to produce a higher fidelity reconstruction.
Currently, the Joint Collaborative Team on Video Coding (JCTVC) is standardizing scalable extensions for HEVC (High Efficiency Video Coding) (S-HEVC). For spatial scalability in S-HEVC, a prediction mode known as an Intra_BL mode is used for inter-layer prediction of the enhancement layer from the base layer. Specifically, in the Intra_BL mode, the base layer is up-sampled and used as the prediction for the current block at the enhancement layer. The Intra_BL mode can be useful when traditional temporal coding (inter) or spatial coding (intra) do not provide a low-energy residue. Such a scenario can occur when there is a scene or lightning change or when a new object enters a video sequence. Here, some information about the new object can be obtained from the co-located base layer block but is not present in temporal (inter) or spatial (intra) domains.
In the S-HEVC Test Model, for the Luma component of the Intra_BL prediction residue, the DCT Type 2 transform is applied at block sizes 8, 16 and 32. At size 4, the DST Type 7 transform may be used because the coding efficiencies of DST Type 7 and DCT are almost the same in Scalable-Test Model (SHM) 1.0, but DST is used as the transform for Intra 4×4 Luma Transform Units in the base layer. For the Chroma component of Intra_BL residue, the DCT is used across all block sizes. It is noted that unless otherwise specified, the use of DCT herein refers to DCT Type 2.
Research has shown that different transforms other than DCT Type 2 can provide substantial gains when applied on the Intra_BL block residue. For example, in one test, at sizes 4 to 32, the DCT Type 3 transform and DST Type 3 transform were used in addition to the DCT Type 2 transform. At the encoder, a Rate-Distortion (R-D) search was performed, and one of the following transforms was chosen: DCT Type 2, DCT Type 3, and DST Type 3. The transform choice can be signaled by a flag (such as a flag that can take one of three values for each of the three transforms) to the decoder. At the decoder, the flag can be parsed, and the corresponding inverse transform can be used.
However, the scheme described above requires two additional transform cores at each of sizes 4, 8, 16 and 32. This means eight additional new transform cores are required (two transforms for each of four sizes). Furthermore, additional transform cores (especially larger ones, such as at size 32×32) are extremely expensive to implement in hardware. Thus, to avoid large alternate transforms for inter-prediction residues, a low-complexity transform method that can be applied efficiently on the Intra_BL residues is needed.
To overcome the shortcomings described above and to improve the coding efficiency of SHM (which is the test model for scalable extensions of HEVC), embodiments of this disclosure provide secondary transforms for use with enhancement-layer residuals. The disclosed embodiments also provide fast factorizations for the secondary transforms. In accordance with the disclosed embodiments, a secondary transform can be applied after DCT for Intra_BL and Inter residues. This overcomes the limitations described above by improving inter-layer coding efficiency without significant implementation costs. The secondary transforms disclosed here can be used in the SHM for standardization of the S-HEVC video codec in order to improve compression efficiency.
Low Complexity Secondary Transform
To improve the compression efficiency of an inter-residue block, primary alternate transforms other than a conventional DCT can be applied at block sizes 8×8, 16×16, and 32×32. However, these primary transforms may have the same size as the block size. In general, these alternate transforms at higher block sizes such as 32×32 may have marginal gains that may not justify the enormous cost of supporting an additional 32×32 transform in the hardware.
In general, most of the energy of the DCT coefficients of the DCT transformed block 300 is concentrated among the low-frequency coefficients in an upper-left block 301. Accordingly, it may be sufficient to perform operations only on a small fraction of the DCT output, such as only on the upper-left block 301 (which could represent a 4×4 block or an 8×8 block). These operations can be performed using a secondary transform of size 4×4 or 8×8 on the upper-left block 301. Moreover, the same secondary transform derived for a block size such as 8×8 can be applied at higher block sizes (such as 16×16 or 32×32). This re-utilization at higher block sizes is one advantage of embodiments of this disclosure.
Furthermore, the secondary transforms according to this disclosure can be reused across various block sizes, while a primary alternate transform cannot be used. For example, the same 8×8 matrix can be reused as a secondary matrix for the 8×8 lowest frequency band following 16×16 and 32×32 DCT. Advantageously, no additional storage is required at larger blocks (such as 16×16 and higher) for storing any of the new alternate or secondary transforms.
Boundary-Dependent Secondary Transforms for Inter and Intra BL Residue in Enhancement Layer
In some embodiments, an existing secondary transform is extended to be applied on Intra_BL residue. For example, consider
Applying Secondary Transform Via Multiple “Flips”
In some embodiments, instead of using a “flipped” DST, the data can be flipped. Based on this reasoning, a secondary transform can be applied as follows at larger blocks for TU0 400, such as 32×32, instead of applying a 32×32 DCT.
At the encoder, the input data is first flipped. For example, for an N-point input vector x with entries xi (i=1 . . . N), define vector y with elements yi=xN+1−i. The DCT of y is determined, and the output is denoted as vector z. A secondary transform is applied on the first K elements of z. Let the output be denoted as w, where the remaining N−K high-frequency elements from z on which the secondary transform was not applied are copied.
Similarly, at the decoder, the input for transform module is defined as vector v, which is a quantized version of w. The following operations can be performed for taking the inverse transform. The inverse secondary transform on the first K elements of v is applied. Let the output be denoted as b, where the N−K high frequency coefficients are identical to that of v. The inverse DCT of b is determined, and the output is denoted as d. The data in d is flipped, such as by defining f with elements fi=dN+1−i. As a result, f represents the reconstructed values for the pixels in x.
For TU1 401, the flipping operations may not be required, and a simple DCT followed by a secondary transform can be taken at the encoder. At the decoder, the process takes the inverse secondary transform followed by the inverse DCT.
It is noted that the flipping operation at the encoder and decoder for TU0 400 can be expensive in hardware. Thus, the secondary transform can be adapted for these “flip” operations in order to avoid the flipping of data. In one example, assume the N-point input vector x with entries x1 to xN in TU0 400 needs to be transformed appropriately. Let the two-dimensional N×N DCT matrix be denoted as C with elements as follows:
C(i,j), where 1<=(i,j)<=N.
As an example, a normalized (by 128√{square root over (2)}) 8×8 DCT is as follows:
with basis vectors along the columns. Note that in DCT, C(i,j)=(−1)(j-1)*C(N+1−i,j). In other words, the odd (first, third, . . . ) basis vectors of DCT are symmetric about the half-way mark. Also, the even (second, fourth, . . . ) basis vectors are symmetric but have opposite signs. This is one property of DCT that can be utilized to appropriately “modulate” the secondary transform.
Extensions for Vertical Secondary Transform
For TU0 400 in
Rate-Distortion Based Secondary Transforms for Intra BL Residue
Research has shown that primary alternative transforms DCT Type 3 and DST Type 3 can be used instead of DCT Type 2. One of the three possible transforms (DCT Type 2, DCT Type 3, and DST Type 3) can be selected via a Rate-Distortion search at the encoder, and the selection can be signaled at the decoder via a flag. At the decoder, the flag can be parsed, and the corresponding inverse transform can be used. However, as explained above, to avoid the significant computational cost, a low-complexity secondary transform for Intra_BL residue can be derived from DCT Type 3 and DST Type 3. This secondary transform achieves similar gains, but at lower complexity.
A description of how a low-complexity secondary transform can be used for Intra_BL residues is now provided. While the derivation and usage of secondary transforms having secondary transform sizes of K*K (K=4 or 8) is shown, this disclosure is not limited thereto, and the derivation and usage can be extended to other block sizes.
Consider a secondary transform of size 4×4. At size 4×4, it is assumed that DCT Type 2 is used as the primary transform. Corresponding to DCT Type 3, a secondary transform is derived as follows. Let C denote the DCT Type 2 transform. DCT Type 3, which is simply the inverse (or transpose) of DCT Type 2, is given by CT. Note that the normalization factors (such as √{square root over (2)}) in the definition of the DCTs are ignored, which is a common practice in the art. Also let S denote the DST Type 3 transform.
For an alternate primary transform A and an equivalent secondary transform M, C*M=A. That is, the DCT Type 2 transform followed by M should be mathematically equivalent to A. Therefore, CT*C*M=CT*A, or M=CT*A, since CTC=I for the orthogonal DCT matrix.
If the alternate transform is DCT Type 3 (such as CT), then M=CT*A=CT*CT. For DST Type 3, M would be CT*S.
Derivation for Secondary Transform Corresponding to DCT Type 3
As an example, at size 4×4, DCT Type 2 is given by (basis vectors along columns):
The secondary transform corresponding to DCT Type 3 (M) is given by:
After rounding and shifting by seven bits, the following is determined:
The above matrix MC,4 has basis vectors along columns. To get the basis vectors along rows, MC,4 is transposed to obtain:
For a secondary transform of size 8×8, start with a DCT Type 2 transform given by (basis vectors along columns):
For a secondary matrix equivalent to DCT Type 3, the following is obtained:
Rounding and shifting by seven bits yields:
Note that MC,4 and MC,8 are low-complexity secondary transforms that provide similar gains on applying to Intra_BL residue, but at considerably lower complexity, as compared to applying DCT Type 3 as an alternate primary transform.
Derivation of Secondary Transform Corresponding to DST Type 3
The DCT Type 2 matrix at size four is:
The DST Type 3 matrix (with basis vectors along the columns) at size 4×4 is given by:
When the DST Type 3 matrix is made into a secondary transform MS,4, the following is obtained:
Rounding and shifting by seven bits yields:
where the basis vectors are along the columns. Transposing the matrix to have basis vectors along the rows gives the following:
For a secondary transform of size 8×8, a DCT Type 2 transform is given by:
A DST Type 3 transform at size 8×8 is given by:
The secondary transform M is given by:
Rounding and shifting the secondary transform by seven bits yields:
To have the basis vectors along rows, the matrix MS,8 is given by:
Note that MS,4 and MS,8 are low-complexity secondary transforms that provide similar gains on applying to Intra_BL residue, but at considerably lower complexity, as compared to applying DST Type 3 as an alternate primary transform.
In the secondary transforms derived using DCT Type 3 and DST Type 3, the coefficients have the same magnitude, and only a few coefficients have alternate signs. This can reduce secondary transform hardware implementation costs. For example, a hardware core for the secondary transform corresponding to DCT Type 3 can be designed. For the secondary transform corresponding to DST Type 3, the same transform core can be used with sign changes for just a few of the transform coefficients.
Research has shown that an 8×8 DCT Type 2 transform can be implemented using 11 multiplications and 29 additions. Therefore, the DCT Type 3 transform, which is a transpose of the DCT Type 2 transform, can also be implemented using 11 multiplications and 29 additions.
The secondary transform MC,8=C8T*C8T can be considered as a cascade of two DCTs and therefore can be implemented using 22 multiplications and 58 additions, which is fewer calculations than a full matrix multiplication at size 8×8 (which requires 64 multiplications and 56 additions). Similarly, the secondary transform corresponding to DST Type 3 (which can be obtained by changing signs of some transform coefficients of the previous secondary transform matrix) can also be implemented via 22 multiplications and 58 additions.
It is noted that the derivations of secondary transforms have been shown only for sizes 4 and 8 assuming primary transforms of DCT Type 3 and DST Type 3. However, it will be understood that these derivations can be extended to other transform sizes and other primary transforms.
Rotational Transforms
Some rotational transforms have been derived for Intra residue in the context of HEVC. In fact, the rotational transforms are special cases of secondary transforms and can also be used as secondary transforms for Intra_BL residues. Specifically, the following four rotational transform matrices (with eight-bit precision) and their transposes (which are also rotational matrices) can be used as secondary transforms.
Due to the structure of rotational transform matrices, there are only twenty non-zero elements at size 8×8. Accordingly, each rotational transform matrix can be implemented using only 20 multiplications and 12 additions, which is much smaller than 64 multiplications and 56 additions required for a full 8×8 matrix. Of the rotational matrices provided above, experimental testing has shown that Rotational Transform 4 Transform Core and Rotational Transform 4 Transpose Transform Core can provide maximum gains when used as secondary transforms.
In addition to or instead of an 8×8 rotational transform, a 4×4 rotational transform can be used. This further reduces the number of required operations. Likewise, the number of operations can be reduced by using a lifting implementation of rotational transforms.
Methods are now described illustrating how a secondary transform can be implemented at block sizes 8, 16, and 32 in a video codec at the encoder and the decoder.
At operation 501, the encoder selects the transform to be used for encoding. This could include, for example, the encoder selecting from among the following choices of transforms for the transform units in a coding unit (CU) via a Rate-distortion search:
In operation 503, based on the transform selected, the encoder parses a flag to identify the selected transform (such as DCT, DCT+M1, or DCT+M2). In operation 505, the encoder encodes the coefficients of a video bitstream using the selected transform and encodes the flag with an appropriate value. In some embodiments, it may not be necessary to encode the flag in certain conditions.
At operation 601, the decoder receives a flag and a video bitstream and interprets the received flag to determine the transform used at the encoder (such as DCT, DCT+M1, or DCT+M2). At operation 603, the decoder determines if the transform used at the encoder is DCT only. If so, in operation 605, the decoder applies an inverse DCT to the received video bitstream. In some embodiments, the order of the transform is {Inverse Vertical DCT, Inverse Horizontal DCT}.
If it is determined in operation 603 that the used transform is not DCT only, in operation 607, the decoder determines if the used transform is DCT+M1. If so, in operation 609, the decoder applies an inverse secondary transform M1 to the received video bitstream. The order of the transform may be either {Inverse horizontal secondary transform, inverse vertical secondary transform} or {Inverse vertical secondary transform, inverse horizontal secondary transform}. That is, the order of the transform may be the inverse of what was applied at the encoder in the forward transform path. In operation 611, the decoder applies an inverse DCT to the received video bitstream with an order of the transform of {Inverse Vertical DCT, Inverse Horizontal DCT}.
If it is determined in operation 607 that the used transform is not DCT+M1, the used transform is DCT+M2. Accordingly, in operation 613, the decoder applies an inverse secondary transform M2 to the received video bitstream. The order of the transform may be either {Inverse horizontal secondary transform, inverse vertical secondary transform} or {Inverse vertical secondary transform, inverse horizontal secondary transform}. That is, the order of the transform may be the inverse of what was applied at the encoder in the forward transform path. In operation 615, the decoder applies an inverse DCT to the received video bitstream with an order of the transform of {Inverse Vertical DCT, Inverse Horizontal DCT}.
While the methods 500, 600 are described with only two secondary transform choices (M1 and M2), it will be understood that the methods 500, 600 can be extended to additional transform choices, including different transform sizes and block sizes. For example, the secondary transform can be applied at block sizes 16, 32, and so on, and the size of the secondary transform can be K×K (where K=4, 8, etc.). In some embodiments, a rotational transform core can also be used as a secondary transform.
Fast Factorization for Secondary Transforms
Consider the 4×4 secondary transform described above, which is derived from DCT Type 3 (CT), where C denotes DCT Type 2 (M=CT*CT).) In general, the 4×4 matrix M may require 16 multiplications and 12 additions for implementation. In the following embodiment, it will be shown that the actual implementation of M (and hence its transpose MT=C*C) can be performed in only 6 multiplications and 14 additions. This represents a 62.5% reduction in the number of multiplications and only a slight increase (16.67%) in the number of additions. Because implementation complexity, especially from multiplications, can be a significant challenge to transform deployment in image/video coding, this embodiment advantageously adds value by reducing overall complexity.
The derivation of a fast factorization algorithm will now be described. Specifically, consider the matrix Ct=CT=CT, which can be represented as follows:
C
T(k,n)=c(n)cos(2πn(2k+1)/4N) k,n=0 . . . N−1 (20)
where
The value
can be factored from all terms in the matrix Ct. Also, the following is defined:
Accordingly, the matrix Ct can be written as follows:
Using the properties of the cosine function, the following holds:
γ(−k)=−γ(k)
γ(16+k)=cos(2π(16+k)/16)=cos(2π+2πk/16)=cos(2πk/16)=γ(k)
γ(8+k)=cos(2π(8+k)/16)=cos(π+2πk/16)=−cos(2πk/16)=−γ(k)
γ(8−k)=cos(2π(8−k)/16)=cos(π−2πk/16)=−cos(−2πk/16)=γ(k) (22)
Thus, after some substitutions and using the above properties for γ(k), the matrix Ct can be rewritten as follows:
Before calculating the various terms in matrix M=Ct*Ct, the following standard trigonometric identities are noted:
2γ(m)/γ(n)=γ(m+n)+γ(m−n)
2φ(m)φ(n)=γ(m−n)+γ(m+n) (24)
where
For the matrix M, element M(1,1) is the inner product of the first row of Ct and its first column. The kth row of Ct is denoted as Ct(k,1:4), and the lth column of Ct is denoted as Ct(1:4,l). Thus, element M(1,1) is computed as follows:
Element M(1, 2)=Ct(1,1:4)*Ct(1:4,2) is computed as follows:
Element M(1, 3) is computed as:
Element M(1, 4) is computed as:
Therefore the first row of the matrix M, denoted as M(1, :) can be written as:
M(1,:)=2γ(2)γ(2)[[1+γ(1)−γ(3)γ(3)1−γ(1)]] (29)
It is defined that γ(1)=a and γ(3)=b. Therefore, M(1, :)=[[1+a−b b 1−a]].
For the other rows of matrix M, the following can be shown. Element M(2, 1) is:
where γ(3)γ(3)+γ(1)γ(1)=γ(3)γ(3)+φ(1)φ(1)=1 since cos2(x)+sin2(x)=1.
Therefore, the matrix M can be written as:
The operations for a fast factorization method are now described when a four-point input X=[x0, x1, x2, x3]T is transformed to output Y=[y0, y1, y2, y3]T via M. Specifically, after rearranging a few terms, the following can be shown:
y
0=(x0+x3)+b(x2−x1)+a(x0−x3)
y
1
=b(x0+x3)+(x1−x2)+a(x1+x2)
y
2
=b(x3−x0)+(x1+x2)+a(x2−x1)
y
3=(x3−x0)+a(x3+x0)−b(x1+x2)
(43)
Let the following be defined:
c
0
=x
0
+x
3
c
1
=x
2
−x
1
c
2
=x
0
−x
3
c
3
=x
2
+x
1 (44)
Combining (43) and (44) provides the following:
y
0
=c
0
+bc
1
ac
2
y
1
=bc
0
−ac
3
y
2
=−bc
2
+c
3
+ac
1
y
3
=−c
2
+ac
0
−bc
3 (45)
The computation of the equations in (45) requires only 8 multiplications and 12 additions. Also, it is noted that a rotation is performed in the computation of y0 and y2 and similarly in the computation of y1 and y3. Therefore, the number of multiplications can be further reduced by 2 as follows by defining c4 and c5:
c
4
=a*(c1+c2)
c
5
=a*(c0+c3) (46)
and
y
0
=c
0+(b−a)c1+c4
y
1
=−c
1+(b−a)c0+c5
y
2=−(b+a)c2+c4+c3
y
3
=−c
2−(b+a)c3+c5 (47)
Using the equations in (46) and (47), a transform M can be applied using only 6 multiplications and 14 additions. It is noted that (b−a) and (b+a) are constants and are counted as one entity respectively. As an example, an equivalent 4×4 matrix Mequiv can be computed after rounding and shifting by seven bits as follows:
M
equiv=round(128*CT*CT)
The terms in (48) that correspond to (1+a) and (1−a) in (42) are 123 and 5, respectively. Due to bit shifts, (1+a) and (1−a) can be written as 64+59 and 64−59, respectively. Thus, defining a=59 and b=24 gives the following:
c
0
=x
0
+x
3
c
1
=x
2
−x
1
c
2
=x
0
−x
3
c
3
=x
2
+x
1 (49)
c
4=59*(c1+c2)
c
5=59*(c0+c3) (50)
and
y
0
=c
0<<6+(b−a)c1+c4
y
1
=−c
1<<6+(b−a)c0+c5
y
2=−(b+a)c2+c4+c3<<6
y
3
=−c
2<<6−(b+a)c3+c5 (51)
or
y
0
=c
0<<6−35*c1+c4
y
1
=−c
1<<6−35*c0+c5
y
2=−83*c2+c4+c3<<6
y
3
=−c
2<<6−83*c3+c5 (52)
It is noted that there are 4 additional shifts due to rounding operations in the computation of the transform, but shifts are generally easy to implement in hardware as compared to multiplications and additions.
The 4×4 secondary matrix MS,4 obtained from DST Type 3 can similarly be evaluated using only 6 multiplications and 14 additions, since some of its elements have sign changes as compared to MC,4. The inverse of the matrices MC,4 and MS,4 can also be computed using 6 multiplications and 14 additions, since they are simply the transpose of MC,4 and MS,4 respectively, and the operations (for example in a signal-flow-graph) of computation of the transposed matrix can be obtained by simply reversing those for the original matrix. The normalizations (or rounding after bit-shifts) for matrix MC,4, etc., to an integer matrix do not have any effect on the computation, and the transform can still be calculated using 6 multiplications and 14 additions.
The fast factorization algorithm described above can also be used to compute a fast factorization for 8×8 and higher order (e.g., 16×16) secondary transform matrices.
In some literature, there exists a class of scaled DCTs where an 8×8 DCT Type 2 matrix can be computed using 13 multiplications and 29 additions. Out of these 13 multiplications, 8 are at the end, and can be combined with quantization. It is possible to derive a DCT Type 3 matrix similarly with 5 multiplications in the beginning, and 8 at the end. This implies that the inverse of DCT-Type 3 (i.e., DCT Type 2) can have 8 multiplications in the beginning So for the computation of MC,8=C8*C8, 8 multiplications at the end of C appearing first in MC,8, and 8 multiplications in the beginning of C8 appearing later in MC,8 can be combined. This can result in a total number of only 5+8+5=18 multiplications, and 29+29=58 additions, which is lower than the 22 multiplications and 58 additions that would be required if two standard DCT computations using Loeffler's algorithm is implemented.
Although the present disclosure has been described with example embodiments, various changes and modifications may be suggested to one skilled in the art. It is intended that the present disclosure encompass such changes and modifications that fall within the scope of the appended claims.
This application claims priority under 35 U.S.C. §119(e) to: U.S. Provisional Patent Application Ser. No. 61/775,208 filed on Mar. 8, 2013; andU.S. Provisional Patent Application Ser. No. 61/805,404 filed on Mar. 26, 2013. The above-identified provisional patent applications are hereby incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
61775208 | Mar 2013 | US | |
61805404 | Mar 2013 | US |