An audio signal 109 which, for instance, is provided by a microphone 110 and which is digitalized by a analog-to-digital converter 111 is provided to the audio encoder 100. The audio signal 109 comprises a plurality of data symbols. The audio signal 109 is divided into a plurality of blocks, wherein each block comprises a plurality of data symbols of the digital signal and each block is transformed by a modified discrete cosine transform (MDCT) device 101. The MDCT coefficients are quantized by a quantizer 103 with the help of a perceptual model 102. The perceptual model controls the quantizer 103 in such a way that the audible distortions resulting from the quantization error are low. The quantized MDCT coefficients are subsequently encoded by a bitstream encoder 104 which produces the lossy perceptually coded output bitstream 112.
The bitstream encoder 104 losslessly compresses its input to produce an output which has a lower average bit-rate than its input by standard methods such as Huffman-Coding or Run-Length-Coding. The input audio signal 109 is also fed into an IntMDCT device 105 which produces IntMDCT coefficients. The quantized MDCT coefficients, which are the output of the quantizer 103, are used to predict the IntMDCT coefficients. The quantized MDCT coefficients are fed into an inverse quantizer 106 and the output (restored or non-quantized MDCT coefficients) is fed into a rounding unit 107.
The rounding unit rounds to an integer value the supplied MDCT coefficients, and the residual IntMDCT coefficients, which are the difference between the integer value MDCT and the IntMDCT coefficients, are entropy coded by an entropy coder 108. The entropy encoder, analogous to the bitstream encoder 104, losslessly reduces the average bit-rate of its input and produces a lossless enhancement bitstream 113. The lossless enhancement bit stream 113 together with the perceptually coded bitstream 112, carries the necessary information to reconstruct the input audio signal 109 with minimal error.
The lossless enhancement bitstream 208 is supplied to an entropy decoder 204, which performs the inverse operations to the operations of the entropy encoder 108 of
As explained above, the DCT-IV-algorithm plays an important role in lossless audio coding.
The transformation function of the DCT-IV comprises the transformation matrix CNIV. According to this embodiment of the invention, the transforming element corresponds to a block-diagonal matrix comprising two blocks, wherein each block comprises the transformation matrix CNIV.
So, in this embodiment, the transformation matrix corresponding to the transforming element according to the invention is:
CNIV shall in the context of this embodiment be referred to as the transformation matrix henceforth.
The number of lifting matrices, and hence the number of lifting stages in the transformation element, in this embodiment of the invention, wherein DCT-IV is the transformation function, is three.
The DCT-IV of an N-point real input sequence x(n) is defined as follows:
Let CNIV be the transformation matrix of DCT-IV, that is
The following relation holds for the inverse DCT-IV matrix:
(CNIV)−1=CNIV (3)
In particular, the matrix CNIV is involutory.
With x=[x(n)]n=0,1, . . . ,N−1 and y=[y(m)]m=0,1, . . . ,N−1, equation (1) can be expressed as
y=CNIVx (4)
Now, let x1, x2 be two integer N×1 column vectors. The column vectors x1, x2 correspond to two blocks of the digital signal which, according to the invention, are transformed by one transforming element. The DCT-IV transforms of x1, x2 are y1, y2, respectively.
y1=CNIVx1 (5)
y2=CNIVx2 (6)
Combining (5) and (6):
The above diagonal matrix is the block-diagonal matrix that the transforming element, according to the invention, corresponds to.
It is within the scope of the invention if the above equation is changed by simple algebraic modifications like the one leading to
Let T2N be the counter diagonal matrix in (8), that is
The matrix T2N can be factorised as follows:
where IN is the N×N identity matrix.
Equation (10) can be easily verified using the DCT-IV property in (3). Using (10), Equation (8) can be expressed as
The three lifting matrices in equation (11) correspond to the three lifting stages shown in
From (11), the following integer DCT-IV algorithm that computes two integer DCT-IVs with one transforming element is derived.
The three lifting stages illustrated in
As illustrated by
In the first stage 401, x2 is transformed by a DCT-IV transformation 402 and the DCT-IV coefficients are rounded 403. The rounded DCT-IV coefficients are then added to x1 404. Thus, the intermediate signal z is generated. So, the intermediate signal z fulfils the equation:
z
=└CNIVx2┘+x1 (12a)
In the second stage 405, z is transformed by a DCT-IV transformation 406 and the DCT-IV coefficients are rounded 407. From the rounded DCT-IV coefficients x1 is then subtracted. Thus, the output signal y1 is generated. So, the output signal y1 fulfils the equation:
y
1
=└CNIVx┘−x2 (12b)
In the third stage 409, y1 is transformed by a DCT-IV transformation 410 and the DCT-IV coefficients are rounded 411. The rounded DCT-IV coefficients are then subtracted from z. Thus, the output signal y2 is generated. So, the output signal y2 fulfils the equation:
y
2
=−└CNIVy1┘+z (12c)
where └*┘ denotes rounding operation.
As illustrated by
In the first stage 501, y1 is transformed by a DCT-IV transformation 502 and the DCT-IV coeffcients are rounded 503. The rounded DCT-IV coefficients are then added to y2 504. Thus, the intermediate signal z is generated. So, the intermediate signal z fulfils the equation:
z
=└CNIVy1┘+y2 (13a)
In the second stage 505, z is transformed by a DCT-IV transformation 506 and the DCT-IV coefficients are rounded 507. From the rounded DCT-IV coefficients y1 is then subtracted. Thus, the signal x2 is generated. So, the signal x2 fulfils the equation:
x
2
=└CNIVz┘−y1 (13b)
In the third stage 509, x2 is transformed by a DCT-IV transformation 510 and the DCT-IV coefficients are rounded 511. The rounded DCT-IV coefficients are then subtracted from z. Thus, the signal x1 is generated. So, the signal x1 fulfils the equation:
x
1
=−└CNIVx2┘+z (13c)
It can be seen that the algorithm according to the equations (13a) to (13c) is inverse to the algorithm according to the equations (12a) to (12c). Thus, if used in the encoder and decoder illustrated in
In an embodiment of the invention explained below, the method described above is used for an image archiving system.
The equations (12a) to (12c) and (13a) to (13c) further show that to compute two N×N integer DCT-IVs, three N×N DCT-IVs, three N×1 roundings, and three N×1 additions are needed. Therefore, for one N×N integer DCT-IV, the average is:
RC(N)=1.5N (14)
AC(N)=1.5AC(CNIV)+1.5N (15)
where RC(.) is the total rounding number, and AC(.) is the total number of arithmetic operations. Compared to the directly converted integer DCT-IV algorithms, the proposed integer DCT-IV algorithm reduces RC from level Nlog2N to N.
As indicated by (15), the arithmetic complexity of the proposed integer DCT-IV algorithm is about 50 percent more than that of a DCT-IV algorithm. However, if RC is also considered, the combined complexity (AC+RC) of the proposed algorithm does not much exceed that of the directly converted integer algorithms. Exact analysis of the algorithm complexity depends on the DCT-IV algorithm used.
As shown in
In
Such lossless archiving of image signals is important, for example, in the case that the images are error maps of semiconductor wafers and have to be stored for later analysis.
In this embodiment of the invention, the embodiment of the method illustrated in
The method according to the invention is not limited to audio are image signals. Other digital signals, for example video signals, can as well be transformed by the method according to the invention.
In the following, a further embodiment of the method for the transformation of a digital signal from the time domain to the frequency domain and vice versa according to the invention is explained.
In this embodiment of the present invention, the domain transformation is a DCT transform, whereby the block size N is some integer. In one embodiment, N is a power of two.
Let CNII be the N×N transform matrix of DCT (also called Type-II DCT):
and N is the transform size. m and n are matrix indices.
Let CNIV be the N×N transform matrix of type-IV DCT, as already defined above:
C
N
IV=√{square root over (2/N[)}cos ((m+½)π/N)]m,n=0,1, . . . ,N−1 (18)
As above, a a plurality of lifting matrices will be used, which lifting matrices are in this embodiment 2N×2N matrices of the following form:
where IN is the N×N identity matrix, ON is the N×N zero matrix, AN is an arbitrary N×N matrix.
For each lifting matrix L2N, a lifting stage reversible integer to integer mapping is realized in the same way as the 2×2 lifting step described in the incorporated reference “Factoring Wavlet Transforms into Lifting Steps,” Tech. Report, I. Daubechies and W. Sweldens, Bell Laboratories, Lucent Technologies, 1996. The only difference is that rounding is applied to a vector instead of a single variable.
In the above description of the other embodiments, it was already detailed how a lifting stage is realized for a lifting matrix, so the explanation of the lifting stages corresponding to the lifting matrices will be omitted in the following.
One sees that the transposition of L2N, L2NT is also a lifting matrix.
In this embodiment, the transforming element corresponds to a matrix, T2N which is defined as a 2N×2N matrix in the following way:
The decomposition of the matrix T2N into lifting matrices has the following form:
T
2N
=P3·L8·L7·L6·P2·L5·L4·L3·L2·L1·P1 (21)
The matrices constituting the right hand side of the above equation will be explained in the following.
P1 is a first permutation matrix given by the equation
where JN is the N×N counter index matrix given by
and DN is a N×N diagonal matrix with diagonal element being 1 and −1 alternatively:
P2 is a second permutation matrix, an example of which is generated by the following MATLAB script:
P3 is a third permutation matrix, an example of which is generated by the following MATLAB script:
As an example, when N is 4, P3 is a 8×8 matrix given as
L1 is a first lifting matrix
where Z1N is a N×N counter diagonal matrix given as:
L2 is a second lifting matrix:
where Z2N is a N×N counter diagonal matrix given as:
L3 is a third lifting matrix:
where:
Z3N=√{square root over (2)}CNIV+IN+Z1N (32)
L4 is a fourth lifting matrix:
where:
Z4N=CNIV/√{square root over (2)} (34)
L5 is a fifth lifting matrix:
where:
Z5N=−(√{square root over (2)}CNIV+IN) (36)
L6 is a sixth lifting matrix:
where Z6N is a N×N counter diagonal matrix given as:
L7 is a seventh lifting matrix:
where Z7N is a N×N counter diagonal matrix given as:
L8 is an eighth lifting matrix:
L8=L6 (41)
thus, resulting in the factorization as shown in (42):
T
2N
=P3·L8·L7·L6P2·L5·L4·L3·L2·L1·P1 (42)
where P1, P2, and P3 are three permutation matrices. Lj, j from 1 to 8, are eight lifting matrices.
The lifting matrices L3, L4 and L5 comprise an auxiliary transformation matrix, which is, in this case, the transformation matrix CNIV itself.
From Eq. (42), it is possible to compute the integer DCT for two input signals of dimension N×1.
As Eq. (42) provides a lifting matrix factorization which describes the DCT-IV transformation domain, its lifting matrices can be used in the manner shown herein to compute the domain transformation of an applied input signal.
The equation (42) can be derived in the following way.
The following decomposition can be derived using the disclosure from Wang, Zhongde, “On Computing the Discrete Fourier and Cosine Transforms”, IEEE Transactions on Acoustics, Speech and Singal Processing, Vol. ASSP-33, No. 4 October 1985:
is known, wherein SN/2II denotes the transformation matrix of the discrete sine transform of type 2,
Equation (85) can be combined with the equation
wherein PEO is an even-odd permutation matrix,
After transposition equation (45) converts to:
The combination of (43) and (46) yields:
where:
In this embodiment, the computation of the domain transformation requires only 4N rounding operations, as will now be explained:
Let α(*) be the number of real additions, μ(*) be the number of real multiplications, and γ(*) be the number of real roundings, respectively. For the proposed IntDCT algorithm, one gets:
α(IntDCT)=11N+3α(DCT−IV)
μ(IntDCT)=9N+3μ(DCT−IV)
γ(IntDCT)=8N
The above results are for two blocks of data samples, because the proposed IntDCT algorithm processes them together. Thus for one block of data samples, the numbers of calculations are halved, which are
α1(IntDCT)=55N+1.5α(DCT−IV)
μ1(IntDCT)=45N+1.5μ(DCT−IV)
γ1(IntDCT)=4N
where α1, μ1, and γ1 are the number of real additions, number of real multiplications, and number of real roundings, for one block of samples, respectively.
For DCT-IV calculation, the FFT-based algorithm described in the incorporated reference “Signal Processing with lapped Transforms,” H. S. Malvar, Norwood, Mass. Artech House, 1992, pp. 199-201 can be used, for which
α(DCT−IV)=1.5Nlog2N
μ(DCT−IV)=0.5Nlog2N+N
Consequently:
α1(IntDCT)=2.25Nlog2N+5.5N
μ1(IntDCT)=0.75Nlog2N+6N
In the following, a further embodiment of the method for the transformation of a digital signal from the time domain to the frequency domain and vice versa according to the invention is explained.
In this embodiment a discrete fast fourier transform (FFT) is used as the domain transformation.
Let F be the N×N transform matrix of the normalized FFT
where N is the transform size. m and n are matrix indices.
Under this embodiment, a permutation matrix P of dimension N×N is a matrix which includes indices 0 or 1. After multiplying it with a N×1 vector (the matrix representation of the input signal), the order of elements in the vector are changed.
In this embodiment, lifting matrices are defined as 2N×2N matrices of the following form:
where P1 and P2 are two permutation matrices, O is the N×N zero matrix, A is an arbitrary N×N matrix. For lifting matrix L, reversible integer to integer mapping is realized in the same way as the 2×2 lifting step in the aforementioned incorporated reference of I. Daubechies. As above, however, rounding is applied to a vector instead of a single variable. It is apparent that the transposition of L, LT is also a lifting matrix.
Further, let T be a 2N×2N transform matrix:
Accordingly, the modified transform matrix T (and accordingly the domain transformation itself) can be expressed as the lifting matrix factorization:
where I is the N×N identity matrix, and Q is a N×N permutation matrix given as:
and O1×N−1 and ON−1×1 are row and column vectors of N−1 zeros respectively.
J is the (N−1)×(N−1) counter index matrix given by
In Eq. (53), blank space in the square bracket represents all zeros matrix elements.
A can be seen from Eq. (51), the lifting matrix factorization can be use to compute the integer FFT for two N×1 complex vectors using the methods as described herein.
Under this embodiment, the computation of the domain transformation requires only 3N rounding operations, as will now be explained:
Let: α(*) be the number of real additions,
For the proposed IntFFT algorithm, we have
α(IntFFT)=6N+3α(FFT)
μ(IntFFT)=3μ(FFT)
γ(IntFFT)=6N
The above results are for two blocks of data samples, because the proposed IntFFT algorithm processes them together. Thus for one block of data samples, the numbers of calculations are halved, which are
α1(IntFFT)=3N+1.5α(FFT)
μ1(IntFFT)=1.5μ(FFT)
γ1(IntFFT)=3N
where α1, μ1, and γ1 are the number of real additions, number of real multiplications, and number of real rounding operations for one block of samples, respectively.
For FFT calculation, the split-radix FFT (SRFFT) algorithm can be used, for which:
α(SRFFT)=3Nlog2N−3N+4
μ(SRFFT)=Nlog2N−3N+4
Consequently, we have:
α1(IntFFT)=4.5Nlog2N−1.5N+6
μ1(IntFFT)=1.5Nlog2N−4.5N+6
Specifically, the MSEs for IntDCT and integer inverse DCT (IntIDCT) are given as:
where the error signal e is ef for IntDCT, and et for IntIDCT as in
The MSEs for IntFFT and integer inverse FFT (IntIFFT) are given as
where the error signal e is ef for IntFFT, and et for IntIFFT as in
represents norm of a complex value. K is the total number of sample blocks used in the evaluation.
For both domain transformations, a total of 450 seconds with 15 different types of music files are used in the 48 kHz/16-bit test set. Table I shows the test results.
As can be seen from Table 1, the MSE generated using the systems and methods of the present invention is very minimal, and unlike conventional systems, is substantially independent of the processing block size. Referring to the DCT-IV domain transformation, the MSE only slightly increases with increasing block size N up to 4096 bits. The MSEs of the FFT are even better, exhibiting a constant MSE of 0.4 for block sizes up to 4096 bits. When the demonstrated performance of the present invention is viewed in light of the present capabilities and increasing need for longer block sizes, the advantages of the present invention become clear.
The following documents are herein incorporated by reference:
“Coding of Moving Pictures and Audio: Work plan for Evaluation of Integer MDCT for FGS to Lossless Experimentation Framework”, ISO/IEC JTC 1/SC 29/WG 11 N5578, Pattaya, Thailand, March 2003.
This application claims the benefit of priority of U.S. Provisional Application No. 60/507,210, filed 29 Sep. 2003, and U.S. Provisional Application No. 60/507,440, filed 29 Sep. 2003, the contents of each being hereby incorporated by reference in its entirety for all purposes. Further, the following commonly-owned applications are concurrently-filed herewith, and herein incorporated in its entirety: “Method for Performing a Domain Transformation of a Digital Signal from the Time Domain into the Frequency Domain and Vice Versa,” Atty. Docket No. P100442, and“Process and Device for Determining a Transforming Element for a Given Transformation Function, Method and Device for Transforming a Digital Signal from the Time Domain into the Frequency Domain and vice versa and Computer Readable Medium,” Atty. Docket No. P100452.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/SG04/00121 | 5/6/2004 | WO | 00 | 4/16/2007 |
Number | Date | Country | |
---|---|---|---|
60507210 | Sep 2003 | US | |
60507440 | Sep 2003 | US |