1. Field
The following description generally relates to encoders and decoders and, in particular, to an efficient MDCT/IMDCT implementation for voice and audio codecs.
2. Background
One goal of audio coding is to compress an audio signal into a desired limited information quantity while keeping as much as the original sound quality as possible. In an encoding process, an audio signal in a time domain is transformed into a frequency domain, and a corresponding decoding process reverses such operation.
As part of such an encoding process, a signal may be processed by a modified discrete cosine transform (MDCT). The modified discrete cosine transform (MDCT) is a Fourier-related transform based on the type-IV discrete cosine transform (DCT-IV), with the additional property that blocks are overlapped so that the ending of one block coincides with the beginning of the next block. This overlapping helps to avoid aliasing artifacts, and in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications.
MDCT transform has also found applications in speech compression. ITU-T G.722.1 and G.722.1C vocoders apply MDCT on input speech signal, while more recent ITU-T G.729.1 and G.718 algorithms use it to process residual signal, remaining after the use of Code Excited Linear Prediction (CELP) encoder. The above mentioned vocoders operate with input sampling rates of either 8 kHz or 16 kHz, and 10 or 20-millisecond frames. Hence, their MDCT filterbanks are either 160 or 320-point transforms.
However, if future speech coders will support block-switching functionality support for decimated sizes (e.g. 160, 80, 40-points) may also be needed. Consequently, efficient implementations of small transform sizes are desirable to implement a larger transform using a small size core transform.
The following presents a simplified summary of one or more embodiments in order to provide a basic understanding of some embodiments. This summary is not an extensive overview of all contemplated embodiments, and is intended to neither identify key or critical elements of all embodiments nor delineate the scope of any or all embodiments. Its sole purpose is to present some concepts of one or more embodiments in a simplified form as a prelude to the more detailed description that is presented later.
An encoding method and/or device are provided for computing transform values. Time-domain input values representing an audio signal are received. The input values are transformed into spectral coefficients using a Modified Discrete Cosine Transform (MDCT) that is recursively decimated into a plurality of 5-point transforms. Various factorizations may be implemented to efficiently process the 5-point transform.
In one example (
w0=x0−x4;
w4=x0+x4;
w1=x1−x3;
w3=x1+x3;
u2=x2+w3+w4;
u3=−d*w3+c*w4;
u4=d*w4+c*w3;
such that
X0=u2;
X1=b*w1+a*w0;
X2=u3−x0;
X3=a*w1−b*w0;
X4=u4+x0;
In another example (
w0=x0−x4;
w1=x1−x3;
z2=x1+x3;
z4=x0+x4;
u2=z2+z4;
such that
X0=u2+x2;
X1=b*w1+a*w0;
X2=c*u2+0.5*z2−x2;
X3=a*w1−b*w0;
X4==−c*u2−0.5*z4+x2;
In another example (
w0=x0−x4;
w1=x1−x3;
z2=x1+x3;
z4=x0+x4;
t2=z2+z4;
t4=z2−z4;
c′=c+0.25;
such that
X0=t2+x2;
X1=b*w1+a*w0;
X2=c′*t2−0.25*t4−x2=0.25*t4+c′*t2−x2);
X3=a*w1−b*w0;
X4=−c′*t2−0.25*t4+x2=0.25*t4−(c′*t2−x2);
In another example (
w1=x0+x4;
w2=x4−x0;
w3=x3−x1;
w4=x3+x1;
w5=w1+w4;
w6=w4−w1;
u1=x2−αw5;
u2=x2+w5;
u3=βw2+γw3;
u4=βw3−γw2;
u5=δw6;
such that
X0=u2;
X1=u4;
X2=u4−u1;
X3=u3;
X4=u1+u5;
Alternatively, at least one of the plurality of 5-point transforms may include at least one transform (802) factorized by twelve (12) addition operations, five (5) multiplication operations, one (1) shift operation, and a longest path length of four (4) operations.
In another example (
k1=g*x1+h*x3;
k2=h*x1+g*x3;
k3=f*x0+i*x4;
k4=i*x0+f*x4;
k5=i*x1−f*x3;
k6=−f*x1+i*x3;
k7=g*x0−h*x4;
k8=h*x0−g*x4;
j1=x0+x4;
j2=x3−x1;
such that
X0=k3+k1+x2;
X1=k7+k5−x2;
X2=j1+j2−x2;
X3=h*x0−g*x4−f*x1+i*x3+x2;
X4=k4−k2+x2.
In another example (
q1=x0+x4;
q2=x3−x1;
p1=(x1−x3)*g−x1*(g+h)=q2*g−x1*(g+h);
p2=(x1−x3)*g+x3*(h+g)=q2*g+x3*(g+h);
p3=(x0+x4)*f+x0*(i−f)=q1*f+x0*(i−f);
p4=(x0+x4)*f+x4*(i−f)=q1*f+x4*(i−f);
p5=(x3−x1)*f+x3*(i−f)=q2*f+x3*(i−f);
p6=(x3−x1)*f−x1*(i−f)=q2*f−x1*(i−f);
p7=(x0+x4)*g+x0*(h+g)=q1*g+x0*(h+g);
p8=(x0+x4)*g+x4*(h+g)=q1*g+x4*(h+g);
such that:
X0=p2+p4+x2;
X1=p5+p7−x2;
X2=q1+q2−x2;
X3=p6+p8+x2;
X4=p1+p3+x2;
In another example (
w0=f*x0−i*x4;
w1=g*x1−h*x3;
z2=g*x1+h*x3;
z4=f*x0+i*x4;
v1=2b*w1+2a*w0;
v2=z2+z4;
v3=2b*w0−2a*w1;
y2=2c*v2+z2−2*x2;
y4=−2c*v2−z4+2*x2;
such that:
X0=v2+x2;
X1=v1−2*X0;
X2=y2−X1;
X3=v3−X2;
X4=y4−X3;
In another example (
w0=f*x0−i*x4;
w1=g*x1−h*x3;
z2=g*x1+h*x3;
z4=f*x0+i*x4;
v1=2b*w1+2a*w0;
v2=z2+z4;
v3=2b*w0−2a*w1;
y2=(2c+2)*v2+z2;
y4=2c*v2+z4;
such that
X0=v2+x2;
X1=v1−2*X0;
X2=y2−v1;
X3=v3−X2;
X4=−y4+2*x2−X3;
In another example (
w0=f*x0−i*x4;
w1=g*x1−h*x3;
z2=g*x1+h*x3;
z4=f*x0+i*x4;
v1=2b*w1+2a*w0;
v2=z2+z4;
v3=2b*w0−2a*w1;
d2=(2c+2)*z2+(2c+2)*z4;
d4=(2c+2)*z4+2c*z2;
such that:
X0=z2+z4+x2;
X1=v1−2*X0;
X2=d2−v1;
X3=v3−X2;
X4=−d4+2*x2−X3;
In another example (
w0=f*x0−i*x4;
w1=g*x1−h*x3;
z2=g*x1+h*x3;
z4=f*x0+i*x4;
z1=2a*w0+2b*w1
z3=(2b+2a)*w0−(2a−2b)*w1;
d2=2(c+2)*z2+(2c+2)*z4;
d4=(2c+2)*z4+2c*z2;
such that:
X0=z2+z4+x2;
X1=z1−2*X0;
X2=d2−z1;
X3=z3−d2;
X4=−d4+2*x2−X3;
In another example (
w0=f*x0−i*x4;
w1=g*x1−h*x3;
z2=g*x1+h*x3;
z4=f*x0+i*x4;
z1=2a*w0+2b*w1
z3=(2b+2a)*w0−(2a−2b)*w1;
r2=(2c+2)*z2+(2c+2)*z4;
r4=4(c+1)*z2+4(c+1)*z4.
such that
X0=z2+z4+x2;
X1=z1−2*X0;
X2=d2−z1;
X3=z3−r2;
X4=−r4+2*x2−z3;
Additionally, the transform method and/or device may perform a windowing operation on the input values prior to performing the transformation, wherein the windowing operation implements an asymmetric window function
In some implementations, the MDCT may implement at least one of a 640, 320, 160, 80, 40-point transform using a 5-point Discrete Cosine Transform type II.
In other implementations, the MDCT may implement at least one of a 640, 320, 160, 80, 40-point transform using a 5-point Discrete Cosine Transform type IV.
In yet other implementations, the MDCT may implement at least one of a 640, 320, 160, 80, 40-point transform using a 5-point Discrete Cosine Transform type II and a 5-point Discrete Cosine Transform type IV.
In yet other implementations, the MDCT implements at least one of a 640, 320, 160, 80, 40-point transform using a 5-point Discrete Sine Transform type IV.
A decoding method and/or device are provided for computing inverse transform values. Spectral coefficient input values representing an audio signal are received. The spectral coefficient input values are then transformed into time-domain output values using an Inverse Modified Discrete Cosine Transform (IMDCT) that is recursively decimated into a plurality of 5-point inverse transforms.
In one example (
u1=X4−X2;
u5=X4+X2;
w0=X0+u1;
w5=X0−αu1;
w2=1X3−γX1;
w3=γX3−βX1;
w6=5u5;
w1=w5−w6;
w4=w5+w6;
such that
x0=w1−w2;
x1=w4+w3;
x2=w0.
x3=w4−w3;
x4=w1+w2;
Additionally, the decoding method and/or device may perform a windowing operation on the input values after performing the inverse transformation, wherein the windowing operation implements an asymmetric window function
In one implementation, the IMDCT may implement at least one of a 640, 320, 160, 80, 40-point transform using a 5-point Inverse Discrete Cosine Transform type II.
In another implementation, the IMDCT may implement at least one of a 640, 320, 160, 80, 40-point transform using a 5-point Inverse Discrete Cosine Transform type IV.
In yet another implementation, the IMDCT may implement at least one of a 640, 320, 160, 80, 40-point transform using a 5-point Inverse Discrete Cosine Transform type II and a 5-point Inverse Discrete Cosine Transform type IV.
In one implementation, the IMDCT may implement at least one of a 640, 320, 160, 80, 40-point transform using a 5-point Inverse Discrete Sine Transform type IV.
Various features, nature, and advantages may become apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference characters identify correspondingly throughout.
Various embodiments are now described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be evident, however, that such embodiment(s) may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing one or more embodiments.
One feature provides for implementing an N-point MDCT transform (where N=5*2^K, for some integer K>=1) by mapping it into smaller sized N/2-point DCT-IV, DST-IV, and/or DCT-II transforms. In one example, the MDCT may be systematically decimated by factor of 2 and utilizing a scaled 5-point core function at the last stage. One feature provides several fast algorithms for computing DCT-II, DCT-IV, and DST-IV core transforms of size five (5). The overall transform architecture that is claimed here is a generic decimation process, recursively splitting transforms of size N to two transforms of sizes N/2, where N=5*2^K, and where the final (smallest) 5-point transforms are implemented by using fast techniques described herein. Transforms of such size arise in the design of MDCT filterbanks for speech and audio coding applications, such as recent and emerging standards G.729.1, G.718, and EVRC-WB.
Another feature provides for using a modified windowing stage of an MDCT that combines the above architecture for computing MDCT with an asymmetric window to reduce the delay associated to the transform stage to while keeping the same number of frequency coefficients.
Codec Structure
Note that the inputs to the MDCT 102 and IMDCT 302 transforms may be processed as frames or blocks having a plurality of data points. Consequently, in order for an MDCT-based vocoder (such as, e.g. G.722.1 or G.722.1C) to support data blocks having frame lengths smaller than 320, transforms of decimated sizes are needed. For blocks having a frame length of 160, 80, 40, etc., it is observed that these sizes are all multiples of 5. Therefore, the last non-reducible (by decimation techniques) block size could use a transform of size 5. It is observed that, in terms of computational complexity, it is much more efficient to design a 5-point DCT-II transform than either DCT-IV or FFF transforms.
Defining MDCT Transforms
Using matrix notation, an MDCT transform can be represented by a matrix M:
Consequently, X=Mx and {circumflex over (x)}=MTX, where x represents a matrix of input samples [x(0), . . . , x(N−1)]T, X represents a matrix of resulting MDCT coefficients
and {circumflex over (x)} represents a matrix of reconstructed outputs [{circumflex over (x)}(0), . . . {circumflex over (x)}(N−1)]T.
In order to implement the MDCT transform, it may be mapped into an N/2-point core transform function. For example, the transform 116 of
A DCT-IV transform can be defined as:
Meanwhile, an IDCT-IV transform can be defined as:
The MDCT transform can be mapped to an N/2-point DCT-IV transform as
MT=PSCN/.2IV
and the IMDCT transform can be mapped to an N/2-point IDCT-IV transform as
M=CN/.2IVSPT
where
where IN/4 is an N/4×N/4 identity matrix and JN/4 is an N/4×N/4 order reversal matrix, and matrix S is defined as
and CN/.2IV is an N/2×N/2 DCT-IV matrix that can be defined as
By using the symmetry and involutory properties of the DCT-IV matrix, it can be mapped into a DCT-II transform. The DCT-II transform may be defined as:
Likewise, an IDCT-II transform may be defined as:
where λ(k)=1/√{square root over (2)}, if k=0, otherwise 1.
Defining DCT-IV, DST-IV, and DCT-II Transforms
According to a feature, the transform 116 (
A DCT-IV and IDCT-IV may be defined, correspondingly, as:
A DST-IV and IDST-IV may be defined, correspondingly, as:
Similarly, the DCT-II and its inverse transforms may be defined, correspondingly, as:
where λ(k)=½, if k=0, otherwise 1.
In Equations 1-6, {x(n)}, for n=0, 1, . . . N−1, represents the input sequence of samples, N denotes the frame length, X(k) is the resulting MDCT coefficients.
In the case where N=5, the matrices C_IV for DCT-IV, S_IV for DST-IV, and C_II for DCT-II transforms can be represented, correspondingly, as:
To simplify the representation of a DCT-II, factors λ(k)=1/√{square root over (2)} can be ignored and all coefficients can be multiplied by √{square root over (5/2)}, while using the following notation:
thereby producing:
Here, note that a2+b2=1.25, and that c2+d2=0.75. Moreover, also note that c−d=0.5. This follows from algebraic expressions for the involved cosine values:
Similarly, in the case of DCT-IV, all coefficients are multiplied by √{square root over (5)}, and use notation:
producing:
Note that f2+i2=2, and similarly g2+h2=2. Moreover, notice that f+i=√{square root over (2)}×c, and that h+g=√{square root over (2)}×a. This follows from algebraic expressions for the involved cosine values:
Finally, in the case of DST-IV, all coefficients may be multiplied by √{square root over (5)}, and use notation:
to produce:
Similar to the DCT-IV case, note that here f2+i2=2, and similarly g2+h2=2.
Derivation of Fast Algorithms for Computing 5-Point DCT-II
In order to achieve processing efficiency, the smallest size transforms used by a larger transform should be fast and efficient. This is accomplished by minimizing the operations (e.g., multiplications, additions, and shifts) performed by these small size transforms. Consequently, various factorizations for smallest size transforms may be implemented to achieve this. The choice of which transform factorization is implemented may depend on various factors, including the capabilities of the processor being used.
An efficient DCT-II transform may be implemented in various ways. For instance, assuming that the input to transform is provided by an input vector x, such that
the product of vector x with the scaled DCT-II matrix (scaled by √{square root over (5)} as in Matrix D) produces a DCT-II matrix X:
Consider the computation of the odd coefficients X1 and X3 in this matrix X:
X1=a*x0+b*x1−b*x3−a*x4=a*(x0−x4)+b*(x1−x3);
X3=b*x0−a*x1+a*x3−b*x4=b*(x0−x4)−a*(x1−x3);
This suggests that both coefficients X1 and X3 can be computed as a simple butterfly over x0−x4 and x1−x3. Now, consider the computation of the even coefficients X2 and X4 in this matrix X:
X2=c*x0−d*x1−d*x3+c*x4−x2=c*(x0+x4)−d*(x1+x3)−x2;
X4=d*x0−c*x1−c*x3+d*x4+x2=d*(x0+x4)−c*(x1+x3)+x2;
Here, again, it appears that computations can be organized as a simple butterfly over x0+x4 and x1+x3.
The actual transform operations may be efficiently implemented by rearranging the internal transforms operations to reduce the overall number of additions, multiplications, and/or shifts. Consequently, different intermediate results may be achieved by different factorizations of a transform and such intermediate results characterize each corresponding transform.
w0=x0−x4;
w4=x0+x4;
w1=x1−x3;
w3=x1+x3;
and
u2=x2+w3+w4;
u3=−d*w3+c*w4;
u4=d*w4+c*w3;
to obtain the outputs:
X0=u2;
X1=b*w1+a*w0;
X2=u3−x0;
X3=a*w1−b*w0;
X4=u4+x0;
where the input coefficients 504 (x0, x1, x2, x3, x4) are transformed by the into the output coefficients 506 (X0, X1, X2, X3, and X4). The complexity of this scheme of
w0=x0−x4;
w1=x1−x3;
z2=x1+x3;
z4=x0+x4;
u2=z2+z4.
Additionally, the fact that c−d=0.5 is used to represent outputs X0, X2, and X4 as:
X0=z4+z2+x2;
X2=c*z4−d*z2−x2=c*(z4+z2)+(c−d)*z2−x2=c*(z4+z2)+0.5*z2−x2;
X4=d*z4−c*z2+x2=−c*(z4+z2)−(c−d)*z4+x2=−c*(z4+z2)−0.5*z4+x2.
Consequently, the 5-point DCT II transform 602 is characterized by:
X0=u2+x2;
X1=b*w1+a*w0;
X2=c*u2+0.5*z2−x2;
X3=a*w1−b*w0;
X4==c*u2−0.5*z4+x2.
The complexity of this transform 602 is twelve (12) additions, five (5) multiplications, and two (2) shifts. Note that the ½ factors in this transform are a dyadic rational, and so such “multiplication” by ½ is just a binary shift operation (i.e., a shift). The longest path length here is four (4) operations.
X2=c*(z4+z2)+0.5*z2−x2=c′*(z4+z2)+d′*(z4−z2)−x2;
X4=−c*(z4+z2)−0.5*z4+x2=−c′*(z4+z2)+d′*(z4−z2)+x2;
where values c′ and d′ are selected such that:
c*(z4+z2)+0.5*z2=c′*(z4+z2)+d′(z4−z2)
−c*(z4+z2)−0.5*z4=−c′*(z4+z2)+d′(z4−z2). (Equation 7)
Equation 7 can be rearranged such that:
z4*c+z2*(c+0.5)=z4*(c′+d′)+z2*(c′−d′).
Consequently, it can be shown that:
c=c′+d′; and
c+0.5=c′−d′.
By subtracting both of these equations, it can be shown that:
0.5=−2d′; or
d′=−0.25, and
c′=c−d′=c+0.25.
Consequently, output coefficients X2 and X4 can be represented as:
X2=c′*(z4+z2)−0.25*(z4−z2)−x2=0.25*(z2−z4)+(c′*(z4+z2)−x2);
X4=−c′*(z4+z2)−0.25*(z4−z2)+x2=0.25*(z2−z4)−(c′*(z4+z2)−x2);
which leads to the transform 702 of
Consequently, intermediate results are computed as:
w0=x0−x4;
w1=x1−x3;
z2=x1+x3;
z4=x0+x4;
t2=z2+z4;
t4=z2−z4;
c′=c+0.25.
Consequently, the 5-point DCT II transform 702 is characterized by:
X0=t2+x2;
X1=b*w1+a*w0;
X2=c′*t2−0.25*t4−x2=0.25*t4+c′*t2−x2);
X3=a*w1−b*w0;
X4=−c′*t2−0.25*t4+x2=0.25*t4−(c′*t2−x2).
This transform 702 can be implemented in twelve (12) additions, five (5) multiplications, and one (1) shift. Note that the ¼ factor in this transform 702 is a dyadic rational, and so such “multiplication” by ¼ is just a binary shift operation (i.e., a shift). The longest path length here is also four (4) operations.
In this example, the multipliers are:
The DCT-II transform 802 may include intermediate results such that:
w1=x0+x4;
w2=x4−x0;
w3=x3−x1;
w4=x3+x1;
w5=w1+w4;
w6=w4−w1;
u1=x2−αw5;
u2=x2+w5;
u3=βw2+γw3;
u4=βw3−γw2;
u5=δw6.
Consequently, the outputs X0, X1, X2, X3, and X4 for the DCT-II transform 802 may be represented as:
X0=u2;
X1=u4;
X2=u4−u1;
X3=u3;
X4=u1+u5.
Note that the intermediate results for the transforms illustrated in
Derivation of Inverse Transform
The transforms illustrated in
u1=X4−X2;
u5=X4+X2;
w0=X0+u1;
w5=X0−αu1;
w2=βX3−γX1
w3=/γX3−βX1; //using negated factors in software compared to flow-graph//
w6=δu5;
w1=w5−w6;
w4=w5+w6;
where
Consequently, the outputs x0, x1, x2, x3, and x4 3206 for the IDCT-II transform 3202 may be computed as:
x0=w1−w2;
x1=w4+w3;
x2=w0.
x3=w4−w3;
x4=w1+w2.
Derivation of Fast Algorithms for Computing 5-Point DCT-IV and DST-IV
An efficient DCT-IV transform and/or DST-IV may be implemented in various ways. For instance, assuming that the input to transform is provided by a vector x, such that
the product of vector x with the scaled DCT-IV matrix (scaled by √{square root over (5)} as in Matrix E) produces a DCT-IV matrix X:
X0=f*x0+i*x4+g*x1+h*x3+x2;
X1=g*x0−h*x4+i*x1−f*x3−x2;
X2=−x1+x3−x2+x0+x4;
X3=h*x0−g*x4−f*x1+i*x3+x2;
X4=i*x0+f*x4−h*x1−g*x3+x2;
Note that the transform 902 may be computed using intermediate results where:
k1=g*x1+h*x3;
k2=h*x1+g*x3;
k3=f*x0+i*x4;
k4=i*x0+f*x4;
k5=i*x1−f*x3;
k6=−f*x1+i*x3;
k7=g*x0−h*x4;
k8=h*x0−g*x4;
j1=x0+x4;
j2=x3−x1.
Consequently, the transform 902 may be represented as:
X0=k3+k1+x2;
X1=k7+k5−x2;
X2=j1+j2−x2;
X3=h*x0−g*x4−f*x1+i*x3+x2;
X4=k4−k2+x2.
Therefore, the output coefficients X0, X1, X2, X3, and X4 can be computed by using four (4) butterflies 908a, 908b, 908c, and 908d as illustrated in the transform 902 of
f*x0+i*x4=(x0+x4)*f+x4*(i−f);
i*x0+f*x4=(x0+x4)*f+x0*(i−f);
g*x1+h*x3=(x1−x3)*g+x3*(h+g);
−h*x1−g*x3=(x1−x3)*g−x1*(g+h).
Similarly, the component operations for output coefficients X1 and X3 can be written as:
g*x0−h*x4=(x3−x1)*f+x3*(i−f);
i*x1−f*x3=(x0+x4)*g+x0*(h+g);
h*x0−g*x4=(x3−x1)*f−x1*(i−f);
f*x1+i*x3=(x0+x4)*g+x4*(h+g).
By using such decompositions, the output coefficients for transform 1002 can be characterized by:
X0=(x0+x4)*f+x4*(i−f)+(x1−x3)*g+x3*(h+g)+x2;
X1=(x3−x1)*f+x3*(i−f)+(x0+x4)*g+x0*(h+g)−x2
X2=−x1+x3−x2+x0+x4;
X3=(x3−x1)*f−x1*(i−f)+(x0+x4)*g+x4*(h+g)+x2;
X4=(x0+x4)*f+x0*(i−f)+(x1−x3)*g−x1*(g+h)+x2.
Note that the transform 1002 may be computed using intermediate results where:
q1=x0+x4;
q2=x3−x1;
p1=(x1−x3)*g−x1*(g+h)=q2*g−x1*(g+h);
p2=(x1−x3)*g+x3*(h+g)=q2*g+x3*(g+h);
p3=(x0+x4)*f+x0*(i−f)=q1*f+x0*(i−f);
p4=(x0+x4)*f+x4*(i−f)=q1*f+x4*(i−f);
p5=(x3−x1)*f+x3*(i−f)=q2*f+x3*(i−f);
p6=(x3−x1)*f−x1*(i−f)=q2*f−x1*(i−f);
p7=(x0+x4)*g+x0*(h+g)=q1*g+x0*(h+g);
p8=(x0+x4)*g+x4*(h+g)=q1*g+x4*(h+g).
Consequently, the transform 902 may be represented as:
X0=p2+p4+x2;
X1=p5+p7−x2;
X2=q1+q2−x2;
X3=p6+p8+x2;
X4=p1+p3+x2.
The complexity of this transform 1002 is now twenty (20) additions and twelve (12) multiplications. The length of the longest path here is four (4) operations.
In an alternative approach, a DCT-IV transforms may be derived by mapping it to a DCT-II transform.
For example,
X0=(f*x0+i*x4)+(h*x3+g*x1)+x2;
X1=[2a*(f*x0−i*x4)+2b*(g*x1−h*x3)]−[2*X0];
X2=[2c*(f*x0+i*x4+g*x1+h*x3)+(g*x1+h*x3)−2*x2]−[X1];
X3=[2b*(f*x0−i*x4)−2a*(g*x1−h*x3)]−[X2];
X4=[−2c*(f*x0+i*x4+g*x1+h*x3)−(f*x0+i*x4)+2*x2]−[X3];
Note that intermediate results may be computed as:
w0=f*x0−i*x4;
w1=g*x1−h*x3;
z2=g*x1+h*x3;
z4=f*x0+i*x4;
v1=2b*w1+2a*w0;
v2=z2+z4;
v3=2b*w0−2a*w1;
y2=2c*v2+z2−2*x2;
y4=−2c*v2−z4+2*x2.
Consequently the outputs may be represented as:
X0=v2+x2;
X1=v1−2*X0;
X2=y2−X1;
X3=v3−X2;
X4=y4−X3.
This DCT-IV transform 1402 uses only sixteen (16) additions, nine (9) multiplications, and two (2) shifts. Note that the 2 factors in this transform are a dyadic rational, and so such “multiplication” by 2 is just a binary shift operation (i.e., a shift).
X0=(f*x0+i*x4)+(h*x3+g*x1)+x2;
X1=[2a*(f*x0−i*x4)+2b*(g*x1−h*x3)]−[2*X0];
X2=[(2c+2)*(f*x0+i*x4+g*x1+h*x3)]+(g*x1+h*x3)−[2a*(f*x0−i*x4)+2b*(g*x1−h*x3)];
X3=[2b*(f*x0−i*x4)−2a*(g*x1−h*x3)]−[X2];
X4=[−2c*(f*x0+i*x4+g*x1+h*x3)−(f*x0+i*x4)+2*x2]−[X3].
Note that intermediate results may be computed as:
w0=f*x0−i*x4;
w1=g*x1−h*x3;
z2=g*x1+h*x3;
z4=f*x0+i*x4;
v1=2b*w1+2a*w0;
v2=z2+z4;
v3=2b*w0−2a*w1;
y2=(2c+2)*v2+z2;
y4=2c*v2+z4.
Consequently the outputs may be represented as:
X0=v2+x2;
X1=v1-2*X0;
X2=y2−v1;
X3=v3−X2;
X4=−y4+2*x2−X3.
Consequently, this DCT-IV transform 1502 uses only fifteen (15) additions, ten (10) multiplications, and two (2) shifts. Note that the “2” factors in this transform are a dyadic rational, and so such “multiplication” by 2 is just a binary shift operation (i.e., a shift). The longest path length in this implementation is only five (5) operations.
X0=(f*x0+i*x4)+(h*x3+g*x1)+x2;
X1=[2a*(f*x0−i*x4)+2b*(g*x1−h*x3)]−[2*X0];
X2=[(2c+2)*(g*x1+h*x3)+(2c+2)*(f*x0+i*x4)]-[2a*(f*x0−i*x4)+2b*(g*x1−h*x3)];
X3=[2b*(f*x0−i*x4)−2a*(g*x1−h*x3)]−[X2];
X4=[−(2c+2)*(f*x0+i*x4)−2c*(g*x1+h*x3)+2*x2]−[X3].
Note that intermediate results may be computed as:
w0=f*x0−i*x4;
w1=g*x1−h*x3;
z2=g*x1+h*x3;
z4=f*x0+i*x4;
v1=2b*w1+2a*w0;
v2=z2+z4;
v3=2b*w0−2a*w1;
d2=(2c+2)*z2+(2c+2)*z4;
d4=(2c+2)*z4+2c*z2.
Consequently the outputs may be represented as:
X0=z2+z4+x2;
X1=v1−2*X0;
X2=d2−v1;
X3=v3−X2;
X4=−d4+2*x2−X3.
Consequently, this DCT-IV transform 1602 uses only fifteen (15) additions, eleven (11) multiplications, and two (2) shifts. Note that the 2 factors in this transform are a dyadic rational, and so such “multiplication” by 2 is just a binary shift operation (i.e., a shift). The longest path length in this implementation is only five (5) operations.
X0=(f*x0+i*x4)+(h*x3+g*x1)+x2;
X1=[2a*(f*x0−i*x4)+2b*(g*x1−h*x3)]−[2*X0];
X2=[2(c+2)*(g*x1+h*x3)+(2c+2)*(f*x0+i*x4)]−[2a*(f*x0−i*x4)+2b*(g*x1−h*x3)];
X3=[(2b+2a)*(f*x0−i*x4)−(2a−2b)*(g*x1−h*x3)]−[2(c+2)*(g*x1+h*x3)+(2c+2)*(f*x0+i*x4)];
X4=[−(2c+2)*(f*x0+i*x4)−2c*(g*x1+h*x3)+2*x2]−[X3].
Note that intermediate results may be computed as:
w0=f*x0−i*x4;
w1=g*x1−h*x3;
z2=g*x1+h*x3;
z4=f*x0+i*x4;
z1=2a*w0+2b*w1
z3=(2b+2a)*w0−(2a−2b)*w1;
d2=2(c+2)*z2+(2c+2)*z4;
d4=(2c+2)*z4+2c*z2.
Consequently the outputs may be represented as:
X0=z2+z4+x2;
X1=z1−2*X0;
X2=d2−z1;
X3=z3−d2;
X4=−d4+2*x2−X3.
Consequently, this DCT-IV transform 1702 uses only fifteen (15) additions, eleven (11) multiplications, and two (2) shifts. Note that the “2” factors in this transform are a dyadic rational, and so such “multiplication” by 2 is just a binary shift operation (i.e., a shift). The longest path length in this implementation is only five (5) operations.
X0=(f*x0+i*x4)+(h*x3+g*x1)+x2;
X1=[2a*(f*x0−i*x4)+2b*(g*x1−h*x3)]−[2*X0];
X2=[2(c+2)*(g*x1+h*x3)+(2c+2)*(f*x0+i*x4)]−[2a*(f*x0−i*x4)+2b*(g*x1−h*x3)];
X3=[(2b+2a)*(f*x0−i*x4)−(2a−2b)*(g*x1−h*x3)]−[2(c+2)*(g*x1+h*x3)+(2c+2)*(f*x0+i*x4)];
X4=[−4(c+1)*(f*x0+i*x4)−4(c+1)*(g*x1+h*x3)+2*x2]-[(2b+2a)*(f*x0−1*x4)−(2a−2b)*(g*x1−h*x3)].
Note that intermediate results may be computed as:
w0=f*x0−i*x4;
w1=g*x1−h*x3;
z2=g*x1+h*x3;
z4=f*x0+i*x4;
z1=2a*w0+2b*w1
z3=(2b+2a)*w0−(2a−2b)*w1;
r2=(2c+2)*z2+(2c+2)*z4;
r4=4(c+1)*z2+4(c+1)*z4.
Consequently the outputs may be represented as:
X0=z2+z4+x2;
X1=z1−2*X0;
X2=d2−z1;
X3=z3−r2;
X4=−r4+2*x2−z3.
This transform 1802 uses only fifteen (15) additions, twelve (12) multiplications, and two (2) shifts. Note that the multiplications by “2” are considered shifts. The longest path length in this implementation is only five (5) operations.
Note that the DCT and DST transforms illustrated in
Computing Transforms of Sizes N=5*2K
According to one implementation, an N-sized transform, where N=5*2K, may be recursively split in a chain of smaller N/2-sized transforms, which can be based on DCT-II, DCT-IV, DST-IV, or similar kernels, and where the last 5-point cascade is implemented by using one of the described fast algorithms for computing 5-point transforms.
MDCT Filterbank with Asymmetric Windowing Stage
According to another feature, an asymmetric windowing stage may be implemented as part of an MDCT Filterbank. In some applications, the MDCT Filterbank may be implemented in a scalable speech codec having multiple layers, where some such layers may use the MDCT to transform an error signal from a previous layer. The MDCT of a weighted error signal werr_sp(k) with a 40 millisecond windowing stage is given by:
As opposed to traditional MDCT windows, this window 2302 is not symmetric; as such the second half of the window is different from the time reversed version of the first half. The analysis asymmetric window shape is given by the following equation:
where
and D(n) is defined by
where M=320 denotes the number of MDCT frequency components, and Mz=M/4 is the amount of trailing zeros.
The computation of the MDCT coefficients werr_sp(k) is done by applying window and normalization factors
on input signal werr(n) first, and then computing product by an M×2M matrix T:
by using its decomposition in
T=CMIVSPT
where
is the M×M matrix of DCT-IV transform,
and where IN/2 and JN/2 denote N/2×N/2 identity and order reversal matrices correspondingly.
Computation of DCT-IV
The computation of DCT-IV of sizes M=5*2k (k=1, . . . , 6) is done by splitting it into DCT-II transforms of twice-smaller sizes:
where:
PM is a permutation matrix producing reordering
DM is the diagonal sign-alteration matrix
RM is the Givens rotation matrix:
and CMII denotes matrices of the remaining DCT-II transforms:
An example implementation of such a process of splitting a DCT-IV transform of size M=10 into DCT-II transforms of twice-smaller (M=5) is illustrated in
The computation of DCT-II transforms of sizes M=5*2k (k=1, . . . , 5) may also done by splitting it into smaller transforms:
An example implementation of such a process of splitting a DCT-II transform of size M=10 into smaller transforms (M=5) is illustrated in
The above process may be repeated recursively until only 5-point transforms remain. The remaining 5-point transforms may be efficiently implemented by
The computation of a 5-point DCT-IV is done via DCT-II as follows:
Finally, the computation of the 5-point DCT-II for an input vector x=[x0, x1, x2, x3, x4]T
y=C5IIx
is done as follows:
The DCT-II transform 802 may include intermediate results such that:
a0=x0+x4;
a4=x4−x0;
a3=x3−x1;
a1=x3+x1;
b0=a0+a1;
b1=δ(a0−a1);
b2=x2−αb0;
y0=b0+x2;
y1=γa3−βa4;
y2=b1−b2;
y3=βa3+γa4;
where
An example of the flow-graph for this transform is illustrated in
Example of Encoding Using MDCT Transform
The transform module 2414 may transform the windowed input values 2412 into spectral coefficients 2416 using, for example, a Modified Discrete Cosine Transform (MDCT). The MDCT may be recursively split into at least one of a Discrete Cosine Transform type IV (DCT-IV), a Discrete Cosine Transform type II (DCT-II), or both the DCT-IV and DCT-II, where each such transform is of smaller dimension than the MDCT. In one example, the DCT-II may be a 5-point transform that implements MDCTs of different sizes. The MDCT may implement at least two of 320, 160, 80, 40-point transforms using the same core DCT-II. The components of the device 2402 may be implemented as hardware, software, and/or a combination of the thereof. For example, the device 2402 may be a processor and/or circuit that implements the functions of its components or modules.
Example of Decoding Using IMDCT Transform
The window module 2612 may produce a modified windowing function that implements an asymmetric window function on the outputs values 2610 to produce the windowed output values 2614. The components of the device 2602 may be implemented as hardware, software, and/or a combination of the thereof. For example, the device 2602 may be a processor and/or circuit that implements the functions of its components or modules.
In addition to the examples provided herein, the algorithms described herein that implement decimated transforms may be used to implement any other transform that is a multiple of two. Additionally, it should be noted that the techniques described herein may be applied to various types of signals, including audio, voice, video, data, etc.
It should be understood that the intermediate results for the transforms illustrated in herein may change if a different point in the flow diagram of the transform is selected. Consequently, greater or fewer intermediate results and/or different intermediate results (e.g., at different points in the flow diagram) are contemplated and within the scope of the transform flow diagrams described and claimed herein.
Information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals and the like that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles or any combination thereof.
The various illustrative logical blocks, modules and circuits and algorithm steps described herein may be implemented or performed as electronic hardware, software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. It is noted that the configurations may be described as a process that is depicted as a flowchart, a flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be rearranged. A process is terminated when its operations are completed. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination corresponds to a return of the function to the calling function or the main function.
When implemented in hardware, various examples may employ a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array signal (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core or any other such configuration.
When implemented in software, various examples may employ firmware, middleware or microcode. The program code or code segments to perform the necessary tasks may be stored in a computer-readable medium such as a storage medium or other storage(s). A processor may perform the necessary tasks. A code segment may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.
As used in this application, the terms “component,” “module,” “system,” and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems by way of the signal).
In one or more examples herein, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media. Software may comprise a single instruction, or many instructions, and may be distributed over several different code segments, among different programs and across multiple storage media. An exemplary storage medium may be coupled to a processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor.
The methods disclosed herein comprise one or more steps or actions for achieving the described method. The method steps and/or actions may be interchanged with one another without departing from the scope of the claims. In other words, unless a specific order of steps or actions is required for proper operation of the embodiment that is being described, the order and/or use of specific steps and/or actions may be modified without departing from the scope of the claims.
One or more of the components, steps, and/or functions illustrated in the Figures may be rearranged and/or combined into a single component, step, or function or embodied in several components, steps, or functions. Additional elements, components, steps, and/or functions may also be added. The apparatus, devices, and/or components illustrated in some Figures may be configured or adapted to perform one or more of the methods, features, or steps described in other Figures. The algorithms described herein may be efficiently implemented in software and/or embedded hardware for example.
It should be noted that the foregoing configurations are merely examples and are not to be construed as limiting the claims. The description of the configurations is intended to be illustrative, and not to limit the scope of the claims. As such, the present teachings can be readily applied to other types of apparatuses and many alternatives, modifications, and variations will be apparent to those skilled in the art.
The Present Application for Patent claims priority to U.S. Provisional Application No. 61/013,579 entitled “Fast Algorithms for Computation of 5-Point DCT-II, DCT-IV, and DST-V, and Architecture for Design of Transforms of Size N=5*2K” filed Dec. 13, 2007, U.S. Provisional Application No. 61/016,106, entitled “Fast Algorithms for Computation of 5-Point DCT-III, DCT-IV, and DST-V, and Architecture for Design of Transforms of Size N=5*2K”, filed on Dec. 21, 2007, and U.S. Provisional Application No. 61/039,194 entitled “G.EV-VBR MDCT Module” filed Mar. 25, 2008, both assigned to the assignee hereof and hereby expressly incorporated by reference herein.
Number | Name | Date | Kind |
---|---|---|---|
6496795 | Malvar | Dec 2002 | B1 |
7216140 | Chen et al. | May 2007 | B1 |
20020040299 | Makino et al. | Apr 2002 | A1 |
20090103825 | Wang et al. | Apr 2009 | A1 |
Number | Date | Country |
---|---|---|
1886737 | Dec 2006 | CN |
1175030 | Jan 2002 | EP |
2131169 | May 1999 | RU |
2216791 | Nov 2003 | RU |
2289858 | Dec 2006 | RU |
507194 | Oct 2002 | TW |
533405 | May 2003 | TW |
546630 | Aug 2003 | TW |
I321810 | Mar 2010 | TW |
9501680 | Jan 1995 | WO |
03063135 | Jul 2003 | WO |
2005073959 | Aug 2005 | WO |
Entry |
---|
Krishnan et al., “Fast and Lossless Implemetnation of the Forward and Inverse MDCT Computation in MPEG Audio Coding”, IEEE, 2002, pp. 181-184. |
Muddhasani et al., “Bininear Algorithms for Discrete Cosine Transforms of Pime lengths”, Signal processing, 2006, pp. 2391-2406. |
Kim et al., “A new optimized algorithm for computation of MDCT and its inverse transform,” Proceedings of the 2004 International Symposium on Intelligent Signal Processing and Communications Systems, Seoul, Korea, Nov. 18, 2004, pp. 528-530. |
Britanak, Vladimir et al., “An efficient implementation of the forward and inverse MDCT in MPEG audio coding,” IEEE Signal Processing Letters, vol. 8, No. 2, Feb. 1, 2001. |
Cheng, Mu-Huo et al., “Fast IMDCT and MDCT algorithms—A Matrix Approach,” IEEE Transactions on Signal Processing, vol. 51, No. 1, Jan. 1, 2003. |
Kok C.W., “Fast algorithm for computing discrete cosine transform,” IEEE Transactions on Signal Processing, vol. 45, No. 3, Mar. 1, 1997. |
Heideman M.T., “Computation of an odd-length DCT from a real-valued DFT of the same length,” IEEE Transactions on Signal Processing, vol. 40, No. 1, Jan. 1, 1992, pp. 54-61. |
Silverman H.F., “An introduction to programming the Winograd Fourier transform algorithm (WFTA).” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 25, No. 2, Apr. 1, 1977, pp. 152-165. |
Sivlerman H.F., “Corrections and addendum to an introduction to programming the Winograd Fourier transform algorithm (WFTA),” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. assp-26, No. 3, Jun. 1, 1978, p. 268. |
Chivukula, Ravi K. et al.. “Efficient Implementation of a class of MDCT/IMDCT filterbanks for speech and audio coding applications,” Acoustics, Speech and Signal Processing, 2008, Mar. 31, 2008, pp. 213-216. |
Chivukula et al., “Fast algorithms for MDCT and low delay fliterbanks used in audio coding,” Internet Citation, Aug. 8, 2008, p. Complete. |
International Search Report—PCT/US08/086739. International Search Authority—European Patent Office, Feb. 23, 2009. |
Written Opinion—PCT/US08/086739, International Search Authority—European Patent Office, Feb. 23, 2009. |
Taiwan Search Report—TW097148864—TIPO—Dec. 28, 2012. |
Number | Date | Country | |
---|---|---|---|
20090157785 A1 | Jun 2009 | US |
Number | Date | Country | |
---|---|---|---|
61013579 | Dec 2007 | US | |
61016106 | Dec 2007 | US | |
61039194 | Mar 2008 | US |