1. Field
The present disclosure relates generally to processing, and more specifically to techniques for performing transforms on data.
2. Background
Transforms are commonly used to convert data from one domain to another domain. For example, discrete cosine transform (DCT) is commonly used to transform data from spatial domain to frequency domain, and inverse discrete cosine transform (IDCT) is commonly used to transform data from frequency domain to spatial domain. DCT is widely used for image/video compression to spatially decorrelate blocks of picture elements (pixels) in images or video frames. The resulting transform coefficients are typically much less dependent on each other, which makes these coefficients more suitable for quantization and encoding. DCT also exhibits energy compaction property, which is the ability to map most of the energy of a block of pixels to only few (typically low order) transform coefficients. This energy compaction property can simplify the design of encoding algorithms.
Transforms such as DCT and IDCT may be performed on large quantity of data. Hence, it is desirable to perform transforms as efficiently as possible. Furthermore, it is desirable to perform computation for transforms using simple hardware in order to reduce cost and complexity.
There is therefore a need in the art for techniques to efficiently perform transforms on data.
Techniques for efficiently performing transforms on data are described herein. According to an aspect, an apparatus performs multiplication of a first group of at least one data value with a first group of at least one rational dyadic constant that approximates a first group of at least one irrational constant scaled by a first common factor. The apparatus further performs multiplication of a second group of at least one data value with a second group of at least one rational dyadic constant that approximates a second group of at least one irrational constant scaled by a second common factor. Each rational dyadic constant is a rational number with a dyadic denominator. The first and second groups of at least one data value have different sizes. For example, the first group may include two data values, and the second group may include four data values.
According to another aspect, an apparatus performs multiplication of at least one data value with at least one rational dyadic constant that approximates at least one irrational constant scaled by a common factor. The common factor is selected based on the number of logical and arithmetic operations for the multiplication of the at least one data value with the at least one rational dyadic constant. The logical and arithmetic operations may comprise of shift, subtract, and add operations. The common factors may be selected further based on the precision of the results.
Various aspects and features of the disclosure are described in further detail below.
The techniques described herein may be used for various types of transforms such as DCT, IDCT, discrete Fourier transform (DFT), inverse DFT (IDFT), modulated lapped transform (MLT), inverse MLT, modulated complex lapped transform (MCLT), inverse MCLT, etc. The techniques may also be used for various applications such as image, video, and audio processing, communication, computing, data networking, data storage, graphics, etc. In general, the techniques may be used for any application that uses a transform. For clarity, the techniques are described below for DCT and IDCT, which are commonly used in image and video processing.
A one-dimensional (1D) N-point DCT and a 1D N-point IDCT of type II may be defined as follows:
x[n] is a 1D spatial domain function, and
X[k] is a 1D frequency domain function.
The 1D DCT in equation (1) operates on N spatial domain values x[0] through x[N−1] and generates N transform coefficients X[0] through X[N−1]. The 1D IDCT in equation (2) operates on N transform coefficients and generates N spatial domain values. Type II DCT is one type of transform and is commonly believed to be one of the most efficient transforms among several energy compacting transforms proposed for image/video compression.
The 1D DCT may be used for a two 2D DCT, as described below. Similarly, the 1D IDCT may be used for a 2D IDCT. By decomposing the 2D DCT/IDCT into a cascade of 1D DCTs/IDCTs, the efficiency of the 2D DCT/IDCT is dependent on the efficiency of the 1D DCT/IDCT. In general, 1D DCT and 1D IDCT may be performed on any vector size, and 2D DCT and 2D IDCT may be performed on any block size. However, 8×8 DCT and 8×8 IDCT are commonly used for image and video processing, where N is equal to 8. For example, 8×8 DCT and 8×8 IDCT are used as standard building blocks in various image and video coding standards such as JPEG, MPEG-1, MPEG-2, MPEG-4 (P.2), H.261, H.263, etc.
The 1D DCT and 1D IDCT may be implemented in their original forms shown in equations (1) and (2), respectively. However, substantial reduction in computational complexity may be realized by finding factorizations that result in as few multiplications and additions as possible. A factorization for a transform may be represented by a flow graph that indicates specific operations to be performed for that transform.
Cπ/4=cos(π/4)≈0.707106781,
C3π/8=cos(3π/8)≈0.382683432, and
S3π/8=sin(3π/8)≈0.923879533.
Flow graph 100 receives eight scaled transform coefficients A0·X[0] through A7·X[7], performs an 8-point IDCT on these coefficients, and generates eight output samples x[0] through x[7]. A0 through A7 are scale factors and are given below:
Flow graph 100 includes a number of butterfly operations. A butterfly operation receives two input values and generates two output values, where one output value is the sum of the two input values and the other output value is the difference of the two input values. For example, the butterfly operation on input values A0·X[0] and A4·X[4] generates an output value A0·X[0]+A4·X[4] for the top branch and an output value A0·X[0]−A4·X[4] for the bottom branch.
The flow graphs for the IDCT and DCT in
The factorization shown in
The multiplications in
In an aspect, common factors are used to reduce the total number of operations for a transform and/or to improve the precision of the transform results. A common factor is a constant that is applied to one or more intermediate variables in a transform. An intermediate variable may also be referred to as a data value, etc. A common factor may be absorbed with one or more transform constants and may also be accounted for by altering one or more scale factors. A common factor may improve the approximation of one or more (irrational) transform constants by one or more rational dyadic constants, which may then result in a fewer total number of operations and/or improved precision.
In general, any number of common factors may be used for a transform, and each common factor may be applied to any number of intermediate variables in the transform. In one design, multiple common factors are used for a transform and are applied to multiple groups of intermediate variables of different sizes. In another design, multiple common factors are applied to multiple groups of intermediate variables of the same size.
A first common factor F1 is applied to a first group of two intermediate variables X1 and X2, which is generated based on transform coefficients X[2] and X[6]. The first common factor F1 is multiplied with X1, is absorbed with transform constant Cπ/4, and is accounted for by altering scale factors A2 and A6. A second common factor F2 is applied to a second group of four intermediate variables X3 through X6, which is generated based on transform coefficients X[1], X[3], X[5] and X[7]. The second common factor F2 is multiplied with X4, is absorbed with transform constants Cπ/4, C3π/8 and S3π/8, and is accounted for by altering scale factors A1, A3, A5 and A7.
The first common factor F1 may be approximated with a rational dyadic constant α1, which may be multiplied with X1 to obtain an approximation of the product X1·F1. A scaled transform factor F1·Cπ/4 may be approximated with a rational dyadic constant β1, which may be multiplied with X2 to obtain an approximation of the product X2·F1·Cπ/4. An altered scale factor A2/F1 may be applied to transform coefficient X[2]. An altered scale factor A6/F1, may be applied to transform coefficient X[6].
The second common factor F2 may be approximated with a rational dyadic constant α2, which may be multiplied with X4 to obtain an approximation of the product X4·F2. A scaled transform factor F2·Cπ/4 may be approximated with a rational dyadic constant β2, which may be multiplied with X3 to obtain an approximation of the product X3·F2·Cπ/4. A scaled transform factor F2·C3π/8 may be approximated with a rational dyadic constant γ2, and a scaled transform factor F2·S3π/8 may be approximated with a rational dyadic constant δ2. Rational dyadic constant γ2 may be multiplied with X5 to obtain an approximation of the product X5·F2·C3π/8 and also with X6 to obtain an approximation of the product X6·F2·C3π/8. Rational dyadic constant δ2 may be multiplied with X5 to obtain an approximation of the product X5·F2·S3π/8 and also with X6 to obtain an approximation of the product X6·F2·S3π/8. Altered scale factors A1/F2, A3/F2, A5/F2 and A7/F2 may be applied to transform coefficients X[1], X[3], X[5] and X[7], respectively.
Six rational dyadic constants α1, β1, α2, β2, γ2 and δ2 may be defined for six constants, as follows:
α1≈F1, β1≈F1·cos(π/4),
α2≈F2, β2≈F2·cos(π/4), γ2≈F2·cos(3π/8), and δ2≈F2·sin(3π/8). Eq (3)
A first common factor Fa is applied to a first group of two intermediate variables Xa and Xb, which is used to generate transform coefficients X[2] and X[6]. The first common factor Fa is multiplied with Xa, is absorbed with transform constant 1/Cπ/4, and is accounted for by altering scale factors A2 and A6. A second common factor Fb is applied to a second group of four intermediate variables Xc through Xf, which is used to generate transform coefficients X[1], X[3], X[5] and X[7]. The second common factor Fb is multiplied with Xd, is absorbed with transform constants 1/Cπ/4, 2C3π/8 and 2S3π/8, and is accounted for by altering scale factors A1, A3, A5 and A7.
The first common factor Fa may be approximated with a rational dyadic constant αa, which may be multiplied with Xa to obtain an approximation of the product Xa·Fa. A scaled transform factor fa/Cπ/4 may be approximated with a rational dyadic constant βa, which may be multiplied with Xb to obtain an approximation of the product Xb·Fa/Cπ/4. Altered scale factors A2/Fa and A6/Fa may be applied to transform coefficients X[2] and X[6], respectively.
The second common factor Fb may be approximated with a rational dyadic constant αb, which may be multiplied with Xd to obtain an approximation of the product Xd·Fb. A scaled transform factor Fb/Cπ/4 may be approximated with a rational dyadic constant βb, which may be multiplied with Xc to obtain an approximation of the product Xc·Fb/Cπ/4. A scaled transform factor 2Fb·C3π/8 may be approximated with a rational dyadic constant γb, and a scaled transform factor 2Fb·S3π/8 may be approximated with a rational dyadic constant δb. Rational dyadic constant γb may be multiplied with Xe to obtain an approximation of the product 2Xe·Fb·C3π/8 and also with Xf to obtain an approximation of the product 2Xf·Fb·C3π/8. Rational dyadic constant δb may be multiplied with Xe to obtain an approximation of the product 2Xe·Fb·S3π/8 and also with Xf to obtain an approximation of the product 2Xf·Fb·S3π/8. Altered scale factors A1/Fb, A3/Fb, A5/Fb and A7/Fb may be applied to transform coefficients X[1], X[3], X[5] and X[7], respectively.
Six rational dyadic constants αa, βa, αb, βb, γb and δb may be defined for six constants, as follows:
αa≈Fa, βa≈Fa/cos(π/4),
αb≈Fb, βb≈Fb/cos(π/4), γb≈2Fb·cos(3π/8), and δb≈2Fb·sin(3π/8). Eq (4)
Multiple common factors may be applied to multiple groups of intermediate variables, and each group may include any number of intermediate variables. The selection of the groups may be dependent on various factors such as the factorization of the transform, where the transform constants are located within the transform, etc. Multiple common factors may be applied to multiple groups of intermediate variables of the same size (not shown in
Multiplication of an intermediate variable x with a rational dyadic constant u may be performed in various manners in fixed-point integer arithmetic. The multiplication may be performed using logical operations (e.g., left shift, right shift, bit-inversion, etc.), arithmetic operations (e.g., add, subtract, sign-inversion, etc.) and/or other operations. The number of logical and arithmetic operations needed for the multiplication of x with u is dependent on the manner in which the computation is performed and the value of the rational dyadic constant u. Different computation techniques may require different numbers of logical and arithmetic operations for the same multiplication of x with u. A given computation technique may require different numbers of logical and arithmetic operations for the multiplication of x with different values of u.
A common factor may be selected for a group of intermediate variables based on criteria such as:
In general, it is desirable to minimize the number of logical and arithmetic operations for multiplication of an intermediate variable with a rational dyadic constant. On some hardware platforms, arithmetic operations (e.g., additions) may be more complex than logical operations, so reducing the number of arithmetic operations may be more important. In the extreme, computational complexity may be quantified based solely on the number of arithmetic operations, without taking into account logical operations. On some other hardware platforms, logical operations (e.g., shifts) may be more expensive, and reducing the number of logical operations (e.g., reducing the number of shift operations and/or the total number of bits shifted) may be more important. In general, a weighted average number of logical and arithmetic operations may be used, where the weights may represent the relative complexities of the logical and arithmetic operations.
The precision of the results may be quantified based on various metrics such as those given in Table 6 below. In general, it is desirable to reduce the number of logical and arithmetic operations (or computational complexity) for a given precision. It may also be desirable to trade off complexity for precision, e.g., to achieve higher precision at the expense of some additional operations.
As shown in
Multiplication of an integer variable x with an irrational constant μ in fixed-point integer arithmetic may be achieved by approximating the irrational constant with a rational dyadic constant, as follows:
μ≈c/2b, Eq (5)
where μ is the irrational constant to be approximated, c/2b is the rational dyadic constant, b and c are integers, and b>0.
Given integer variable x and rational dyadic constant u=c/2b, an integer-valued product
y=(x·c)/2b Eq (6)
may be approximated using a series of intermediate values
y0,y1,y2, . . . ,yt, Eq (7)
where y0=0, y1=x, and for all 2≦i≦t values, yi is obtained as follows:
yi=±yj±yk·2s
where yk·2s
In equation (8), yi may be equal to yj+yk·2s
yt≈y. Eq (9)
As shown in equations (5) through (9), the multiplication of integer variable x with irrational constant μ may be approximated with a series of intermediate values generated by shift and add operations and using intermediate results (or prior generated intermediate values) to reduce the total number of operations.
Multiplication of an integer variable x with two irrational constants μ and η in fixed-point integer arithmetic may be achieved by approximating the irrational constants with rational dyadic constants, as follows:
μ≈c/2b and η≈e/2d, Eq (10)
where c/2b and e/2d are two rational dyadic constants, b, c, d and e are integers, b>0 and d>0.
Given integer variable x and rational dyadic constants u=c/2b and v=e/2d, two integer-valued products
y=(x·c)/2b and z=(x·e)/2d Eq (11)
may be approximated using a series of intermediate values
w0,w1,w2, . . . ,wt, Eq (12)
where w0=0, w1=x, and for all 2≦i≦t values, wi is obtained as follows:
wi=±wj±wk·2s
where wk·2s
wm≈y and wn≈z, Eq (14)
where m,n≦t and either m or n is equal to t.
As shown in equations (10) through (14), the multiplication of integer variable x with irrational constants μ and η may be approximated with a common series of intermediate values generated by shift and add operations and using intermediate results to reduce the total number of operations.
In the computation described above, trivial operations such as additions and subtractions of zeros and shifts by zero bits may be omitted. The following simplifications may be made:
yi=±y0±yk·2s
yi=±yj±yk·20yi=±yj±yk. Eq (16)
In equation (15), the expression to the left of involves an addition or subtraction of zero (denoted by y0) and may be performed with one shift, as shown by the expression to the right of . In equation (16), the expression to the left of involves a shift by zero bits (denoted by 20) and may be performed with one addition, as shown by the expression to the right of . Equations (15) and (16) may be applied to equation (8) in the computation of yi as well as to equation (13) in the computation of wi.
The multiplications in
where Cπ/48 is a rational dyadic constant that is an 8-bit approximation of Cπ/4.
Multiplication of integer variable x by constant Cπ/48 may be expressed as:
y=(x·181)/256. Eq (18)
The multiplication in equation (18) may be achieved with the following series of operations:
y1=x, //1
y2=y1+(y1>>2), //101
y3=y1−(y2>>2), //01011
y4=y3+(y2>>6), //010110101. Eq (19)
The binary value to the right of “//” is an intermediate constant that is multiplied with variable x.
The desired product is equal to y4, or y4=y. The multiplication in equation (18) may be performed with three additions and three shifts to generate three intermediate values y2, Y3 and Y4.
In
where C3π/87 is a rational dyadic constant that is a 7-bit approximation of C3π/8, and
S3π/89 is a rational dyadic constant that is a 9-bit approximation of S3π/8.
Multiplication of integer variable x by constants C3π/87 and C3π/89 may be expressed as:
y=(x·49)/128 and z=(x·473)/512. Eq (22)
The multiplications in equation (22) may be achieved with the following series of operations:
w1=x, //1
w2=w1−(w1>>2), //011
w3=w1>>6, //0000001
w4=w2+w3, //0110001
w5=w1−w3, //0111111
w6=w4>>1, //00110001
w7=w5−(w1>>4), //0111011
w8=w7+(w1>>9), //0111011001. Eq (23)
The desired products are equal to w6 and w8, or w6=y and w8=z. The two multiplications in equation (22) may be jointly performed with five additions and five shifts to generate seven intermediate values w2 through w8. Additions of zeros are omitted in the generation of w3 and w6. Shifts by zero are omitted in the generation of w4 and w5.
For the 8-point IDCT shown in
For the 8-point DCT shown in
For the IDCT shown in
For a given value of F1, rational dyadic constants α1 and β1 may be obtained for F1 and F1·Cπ/4, respectively. The numbers of logical and arithmetic operations may then be determined for multiplication of X1 with α1 and multiplication of X2 with β1. For a given value of F2, rational dyadic constants α2, β2, γ2 and δ2 may be obtained for F2, F2·Cπ/4, F2·C3π/8 and F2·S3π/8 respectively. The numbers of logical and arithmetic operations may then be determined for multiplication of X4 with α2, multiplication of X3 with β2, and multiplications of X5 with both γ2 and δ2. The number of operations for multiplications of X6 with γ2 and δ2 is equal to the number of operations for multiplications of X5 with γ2 and δ2.
To facilitate the evaluation and selection of the common factors, the number of logical and arithmetic operations may be pre-computed for multiplication with different possible values of rational dyadic constants. The pre-computed numbers of logical and arithmetic operations may be stored in a look-up table or some other data structure.
The entry in the i-th column and j-th row of look-up table 500 contains the number of logical and arithmetic operations for joint multiplication of intermediate variable x with both ci for the first rational dyadic constant C1 and Cj for the second rational dyadic constant C2. The value for each entry in look-up table 500 may be determined by evaluating different possible series of intermediate values for the joint multiplication with ci and cj for that entry and selecting the best series, e.g., the series with the fewest operations. The entries in the first row of look-up table 500 (with c0=0 for the second rational dyadic constant C2) contain the numbers of operations for multiplication of intermediate variable x with just ci for the first rational dyadic constant C1. Since the look-up table is symmetrical, entries in only half of the table (e.g., either above or below the main diagonal) may be filled. Furthermore, the number of entries to fill may be reduced by considering the irrational constants being approximated with the rational dyadic constants C1 and C2.
For a given value of F1, rational dyadic constants α1 and β1 may be determined. The numbers of logical and arithmetic operations for multiplication of X1 with α1 and multiplication of X2 with β1 may be readily determined from the entries in the first row of look-up table 500, where α1 and β1 correspond to C1. Similarly, for a given value of F2, rational dyadic constants α2, β2, γ2 and δ2 may be determined. The numbers of logical and arithmetic operations for multiplication of X4 with α2 and multiplication of X3 with β2 may be determined from the entries in the first row of look-up table 500, where α2 and β2 correspond to C1. The number of logical and arithmetic operations for joint multiplication of X5 with γ2 and δ2 may be determined from an appropriate entry in look-up table 500, where γ2 may correspond to C1 and δ2 may correspond to C2, or vice versa.
For each possible combination of values for F1 and F2, the precision metrics in Table 6 may be determined for a sufficient number of iterations with different random input data. The values of F1 and F2 that result in poor precision (e.g., failure of the metrics) may be discarded, and the values of F1 and F2 that result in good precision (e.g., pass of the metrics) may be retained.
Tables 1 through 5 show five fixed-point approximations for the IDCT in
Table 1 gives the details of algorithm A, which uses a common factor of 1/1.0000442471 for each of the two groups.
Table 2 gives the details of algorithm B, which uses a common factor of 1/1.0000442471 for the first group and a common factor of 1/1.02053722659 for the second group.
Table 3 gives the details of algorithm C, which uses a common factor of 1/0.87734890555 for the first group and a common factor of 1/1.02053722659 for the second group.
Table 4 gives the details of algorithm D, which uses a common factor of 1/0.87734890555 for the first group and a common factor of 1/0.89062054308 for the second group.
Table 5 gives the details of algorithm E, which uses a common factor of 1/0.87734890555 for the first group and a common factor of 1/1.22387468002 for the second group.
The precision of the output samples from an approximate IDCT may be quantified based on metrics defined in IEEE Standard 1180-1190 and its pending replacement. This standard specifies testing a reference 64-bit floating-point DCT followed by the approximate IDCT using data from a random number generator. The reference DCT receives random data for a block of input pixels and generates transform coefficients. The approximate IDCT receives the transform coefficients (appropriately rounded) and generates a block of reconstructed pixels. The reconstructed pixels are compared against the input pixels using five metrics, which are given in Table 6. Additionally, the approximate IDCT is required to produce all zeros when supplied with zero transform coefficients and to demonstrate near-DC inversion behavior. All five algorithms A through E given above pass all of the metrics in Table 6.
The 1D IDCT shown in
First 1D IDCT stage 614 performs an 8-point IDCT on each column of the block of scaled transform coefficients. Second 1D IDCT stage 616 performs an 8-point IDCT on each row of an intermediate block generated by first 1D IDCT stage 614. The 1D IDCTs for the first and second stages may operate directly on their input data without doing any internal pre- or post scaling. After both the rows and columns are processed, output scaling stage 618 may shift the resulting quantities from second 1D IDCT stage 616 by P bits to the right to generate the output samples for the 2D IDCT. The scale factors and the precision constant P may be chosen such that the entire 2D IDCT may be implemented using registers of the desired width.
The 2D DCT may be performed in similar manner as the 2D IDCT. The 2D DCT may be performed by (a) pre-multiplying a block of spatial domain samples, (b) performing 1D DCT on each column (or row) of the block of scaled samples to generate an intermediate block, (c) performing 1D DCT on each row (or column) of the intermediate block, and (d) scaling the output of the second 1D DCT stage to generate a block of transform coefficients for the 2D DCT.
For clarity, much of the description above is for an 8-point scaled IDCT and an 8-point scaled DCT. The techniques described herein may be used for any type of transform such as DCT, IDCT, DFT, 1DFT, MLT, inverse MLT, MCLT, inverse MCLT, etc. The techniques may also be used for any factorization of a transform, with several example factorizations being given in
The number of operations for a transform may be dependent on the manner in which multiplications are performed. The computation techniques described above unroll multiplications into series of shift and add operations, use intermediate results to reduce the number of operations, and perform joint multiplication with multiple constants using a common series. The multiplications may also be performed with other computation techniques, which may influence the selection of the common factors.
The transforms with common factors described herein may provide certain advantages such as:
Transforms with common factors may be used for various applications such as image and video processing, communication, computing, data networking, data storage, graphics, etc. Example use of transforms for video processing is described below.
At a decoding system 760, a decoder 760 receives the compressed data from storage unit or communication channel 740 and reconstructs the transform coefficients. An IDCT unit 770 receives the reconstructed transform coefficients and generates an output data block. The output data block may be an N×N block of reconstructed pixels, an N×N block of reconstructed pixel difference values, etc. The output data block may be an estimate of the input data block provided to DCT unit 720 and may be used to reconstruct the source signal.
A storage unit 840 may store the compressed data from processor 820. A transmitter 842 may transmit the compressed data. A controller/processor 850 controls the operation of various units in encoding system 800. A memory 852 stores data and program codes for encoding system 800. One or more buses 860 interconnect various units in encoding system 800.
A display unit 940 displays reconstructed images and video from processor 920. A controller/processor 950 controls the operation of various units in decoding system 900. A memory 952 stores data and program codes for decoding system 900. One or more buses 960 interconnect various units in decoding system 900.
Processors 820 and 920 may each be implemented with one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), and/or some other type of processors. Alternatively, processors 820 and 920 may each be replaced with one or more random access memories (RAMs), read only memory (ROMs), electrical programmable ROMs (EPROMs), electrically erasable programmable ROMs (EEPROMs), magnetic disks, optical disks, and/or other types of volatile and nonvolatile memories known in the art.
The techniques described herein may be implemented in hardware, firmware, software, or a combination thereof. For example, the logical (e.g., shift) and arithmetic (e.g., add) operations for multiplication of a data value with a constant value may be implemented with one or more logics, which may also be referred to as units, modules, etc. A logic may be hardware logic comprising logic gates, transistors, and/or other circuits known in the art. A logic may also be firmware and/or software logic comprising machine-readable codes.
In one design, an apparatus comprises a first logic to perform multiplication of a first group of at least one data value with a first group of at least one rational dyadic constant that approximates a first group of at least one irrational constant scaled by a first common factor. The apparatus further comprises a second logic to perform multiplication of a second group of at least one data value with a second group of at least one rational dyadic constant that approximates a second group of at least one irrational constant scaled by a second common factor. The first and second groups of at least one data value have different sizes. The first and second logic may be separate logics, the same common logic, or shared logic.
For a firmware and/or software implementation, multiplication of a data value with a constant value may be achieved with machine-readable codes that perform the desired logical and arithmetic operations. The codes may be hardwired or stored in a memory (e.g., memory 852 in
The techniques described herein may be implemented in various types of apparatus. For example, the techniques may be implemented in different types of processors, different types of integrated circuits, different types of electronics devices, different types of electronics circuits, etc.
Those of skill in the art would understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The various illustrative logical blocks, modules, and circuits described in connection with the disclosure may be implemented or performed with a general-purpose processor, a DSP, an ASIC, a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
In one or more exemplary embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to the disclosure may be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other designs without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the examples shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The present application claims priority to provisional U.S. Application Ser. No. 60/758,464, filed Jan. 11, 2006, entitled “Efficient Multiplication-Free Implementations of Scaled Discrete Cosine Transform (DCT) and Inverse Discrete Cosine Transform (IDCT),” assigned to the assignee hereof and incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
4864529 | Shah et al. | Sep 1989 | A |
5233551 | White | Aug 1993 | A |
5285402 | Keith | Feb 1994 | A |
5642438 | Babkin | Jun 1997 | A |
5701263 | Pineda | Dec 1997 | A |
5930160 | Mahant-Shetti | Jul 1999 | A |
6084913 | Kajiki et al. | Jul 2000 | A |
6189021 | Shyu | Feb 2001 | B1 |
6223195 | Tonomura | Apr 2001 | B1 |
6308193 | Jang et al. | Oct 2001 | B1 |
6473534 | Merhav et al. | Oct 2002 | B1 |
6757326 | Prieto et al. | Jun 2004 | B1 |
6766341 | Trelewicz et al. | Jul 2004 | B1 |
6917955 | Botchev | Jul 2005 | B1 |
7007054 | Brady et al. | Feb 2006 | B1 |
7421139 | Hinds et al. | Sep 2008 | B2 |
20010031096 | Schwartz et al. | Oct 2001 | A1 |
20020009235 | Schwartz et al. | Jan 2002 | A1 |
20020038326 | Pelton et al. | Mar 2002 | A1 |
20030020732 | Jasa et al. | Jan 2003 | A1 |
20030074383 | Murphy | Apr 2003 | A1 |
20040117418 | Vainsencher et al. | Jun 2004 | A1 |
20040236808 | Chen et al. | Nov 2004 | A1 |
20050256916 | Srinivasan et al. | Nov 2005 | A1 |
20060008168 | Lee | Jan 2006 | A1 |
20060080373 | Hinds et al. | Apr 2006 | A1 |
20070168410 | Reznik | Jul 2007 | A1 |
20070200738 | Reznik et al. | Aug 2007 | A1 |
20070233764 | Reznik et al. | Oct 2007 | A1 |
20070271321 | Reznik et al. | Nov 2007 | A1 |
Number | Date | Country |
---|---|---|
1421102 | May 2003 | CN |
1719435 | Jan 2006 | CN |
1311975 | May 2003 | EP |
2304946 | Mar 1997 | GB |
04141158 | May 1992 | JP |
05108820 | Apr 1993 | JP |
H09204417 | Aug 1997 | JP |
10322219 | Dec 1998 | JP |
1175186 | Mar 1999 | JP |
2000099495 | Apr 2000 | JP |
2002197075 | Jul 2002 | JP |
2003528668 | Sep 2003 | JP |
2005501462 | Jan 2005 | JP |
2005327298 | Nov 2005 | JP |
2189120 | Sep 2002 | RU |
2003114715 | Nov 2004 | RU |
2273112 | Mar 2006 | RU |
WO9613780 | Sep 1996 | WO |
WO0135673 | May 2001 | WO |
WO02101650 | Dec 2002 | WO |
WO02104039 | Dec 2002 | WO |
03019787 | Mar 2003 | WO |
Entry |
---|
Qi H. et al., “High accurate and multiplierless fixed-point DCT,” ISO/IEC JTC1/SC29/WG11 M12322, Jul. 2005, Poznan, Poland, Jul. 10, 2005, pp. 1-17. |
Puschel M. et al., “Custom-optimized multiplierless implementations of DSP algorithms,” 2004 IEEE/ACM International Conference on Computer Aided Design, Nov. 7-11, 2004, pp. 175-182. |
Hinds A.T. et al., “A Fast and Accurate Inverse Discrete Cosine Transform,” IEEE Workshop on Signal Processing Systems, Nov. 2005, pp. 87-92. |
Dempster A.G. et al., “Use of minimun-adder multiplier blocks in FIR digital filters,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 42, No. 9, Sep. 1995, pp. 569-577. |
Hinds A.T. et al., “A fast and accurate Inverse Discrete Cosine Transform,” IEEE Workshop on Signal Processing Systems, 204 Nov. 2005, Athens, Greece, pp. 87-92. |
IEEE Std 1180-1190, “IEEE Standard Specifications for the Implementations of 8×8 Inverse Discrete Cosine Transform”, CAS Standards Committee of the IEEE Circuits and Systems Society, Dec. 6, 1990. |
Puschel M. et al., “Custom-optimized multiplierless implementations of DSP algorithms,” 2004 IEEE/ACM International Conference on Computer Aided Design, Nov. 7-11, 2004, San Jose, CA, Nov. 7, 2004, pp. 175-182. |
Qi H. et al., “An example of fixed-point IDCT for CFP on fixed-point 8×8 IDCT and DCT standard,” ISO/IEC JTC1/SC29/WG11 M12324, Jul. 2005, Poznan, Poland, Jul. 20, 2005, pp. 1-16. |
Qi H. et al., “High accurate and multiplierless fixed-point DCT,” ISO/IEC JTC1/SC29/WG11 M12322, Jul. 2005, Poznan, Poland, Jul. 20, 2005, pp. 1-17. |
Reznik Y. et al., “Fixed point multiplication-free 8×8 DCT/IDCT approximation,” ISO/IEC JTC1/SC29/WG11 M12607, Oct. 2005, Nice, France, pp. 1-37. |
Reznik Y.A. et al., “Efficient fixed-point approximations of the 8×8 inverse dicrete cosine transform,” Applications of Digital Image Processing XXX, Proceedings of SPIE, vol. 6696, Sep. 24, 2007, pp. 669617-1-669671-17. |
Reznik Y.A. et el., “Improved proposal for MPEG fixed point 8×8 IDCT standard,” ISO/IEC JTC1/SC29WG11 M13001, Jan. 2006, Bangkok, Thailand, pp. 1-22. |
Reznik Y.A. et al., “Low Complexity fixed-point approximation of inverse discrete cosine transform,” Proceedings of the 2007 IEEE International Conference on Acoustics. Speech, and Signal Processing, Apr. 15-20, 2007, Honolulu, Hawaii, vol. 1, pp. 1109-1112. |
Rijavec N. et al., “Multicriterial optimization approach to eliminating multiplications,” 2006 IEEE International Workshop on Multimedia Signal Processing, Oct. 3-6, 2006. Victoria, BC. Canada, pp. 368-371. |
Sullivan G.L., “Standardization of IDCT approximation behavior for video compression: the history and the new MPEG-C parts 1 and 2 standards,” Applications of Digital Image Processing. Proceedings of SPIE, vol. 6696, Sep. 24, 2007, pp. 669611-1-669611-22. |
Trelewicz J.G. et al., “Efficient integer implementations for faster linear transforms,” Conference Record of the 35th Asilomar Conference on Signals, Systems, & Computers, Nov. 4-7, 2001, Pacific Grove, CA, vol. 2, pp. 1161-1165. |
International Search Report—PCT/US07/060405—International Search Authority, European Patent Office—Aug. 4, 2008. |
Written Opinion—PCT/US07/060405—International Search Report, European Patent Office—Aug. 4, 2008. |
International Preliminary Report on Patentability—PCT/US07/060405—The International Bureau of WIPO, Geneva, Switzerland—Aug. 26, 2008. |
“Working Draft 1.0 of ISO/IEC WD 23002-2,” Information Technology—MPEG video technologies—Part2: Fixed-point 8×8 IDCT and DCT transforms, ISO/IEC JTC1/SC29/WG11 N7817, Feb. 17, 2006, pp. 1-21. |
A. Karatsuba, and Y. Ofman, “Multiplication of Multidigit Numbers on Automata”, Soviet Phys. Doklady, vol. 7, No. 7, 595-596, Jan. 1963. |
Arai, Y., et al., “A Fast DCT-SQ Scheme for Images”, Transactions of the IEICE E 71(11):1095, Nov. 1988, pp. 1095-1097. |
Boullis N. et al., “Some optimizations of hardware multiplication by constant matrices,” IEEE Transactions on Computers, vol. 54, No. 10, Oct. 2005, pp. 1271-1282. |
Bracamonte J et al., “A multiplierless implementation scheme for the JPEG image coding algorithm,” Proceedings of the 2000 IEEE Nordic Signal Processing Symposium, Jun. 13-15, 20000, Kolmarden, Sweden, pp. 1-4. |
E. Feig and S. Winograd, “Fast Algorithms for the Discrete Cosine Transform”, IEEE Transactions on Signal Processing, vol. 40, pp. 2174-2193, Sep. 1992. |
E. Feig and S. Winograd, “On the Multiplicative Complexity of Discrete Cosine Transforms (Corresp.)” IEEE Transactions on Information Theory, vol. 38, No. 4, pp. 1387-1391, Jul. 1992. |
Feig, E., “A Fast Scaled DCT algorithm”, SPIE vol. 1244 Image Processing Algorithms and Techniques (1990), pp. 2-13. |
G. Plonka and M. Tasche, “Integer DCT-II by Lifting Steps”, International Series of Numerical Mathematics (W. Haussmann, K. Jelter, M. Reimer, J. Stockler (eds.)), vol. 145, Birkhauser, Basel, 2003, pp. 1-18. |
G. Plonka and M. Tasche, “Invertible Integer DCT Algorithms”, Applied Computational Harmonic Analysis, No. 15 (2003), pp. 70-88. |
Harltey R.I. “Subexpression sharing in filters using canonic signed digit multipliers,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 43, No. 10, Oct. 1996, pp. 677-688. |
ISO/IEC JTC1/SC29/WG11N7292 [Study on FCD] Information Technology—Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to About 1.5 Mbit/s—Part 6: Specification of Accuracy Requirements for Implemenation of Inverse Discrete Cosine Transform, pp. 1-8. |
ISO/IEC JTCI/SC29/WG11 N7335, “Call for Proposals on Fixed-Point 8×8 IDCT and DCT Standard”, Poznan, Poland, Jul. 2005, pp. 1-18. |
J. Liang and T.D. Tran, “Fast Multiplierless Approximations of the DCT with Lifting Scheme”, IEEE Transactions on Signal Processing, vol. 49, No. 12, Dec. 2001, pp. 3032-3044. |
J. Ziv and A. Lempel, “Compression of Individual Sequences via Variable-rate Coding”, IEEE Transactions on Information Theory, vol. IT-24, No. 5, pp. 530-536, Sep. 1978. |
Mitchell J.L. et al., “Enhanced parallel processing in wide registers,” Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, Apr. 3-8, 2005, Denver, Colorado, pp. 1-10. |
N. Ahmed, T. Natarajan and K.R. Rao, “Discrete Cosine Transform”, IEEE Transactions on Computers, vol. C-23, pp. 90-93, Jan. 1974. |
R. Bernstein, “Multiplication by integer Constants”, Software-Practice and Experience, vol. 16, No. 7, pp. 641-652, Jul. 1986. |
T.M. Cover and J.A. Thomas, “Elements of Information Theory”, Wiley, New York, 1991, pp. 1-36. |
Voronenko Y. et al., “Multiplierless multiple constant multiplication,” ACM Transactions on Algorithms, vol. 3, No. 2, May 2007, pp. 1-38. |
W. Chen, C.H. Smith and S.C. Fralick, “A Fast Computational Algorithm for the Discrete Cosine Transform”, IEEE Transactions on Communications, vol. com-25, No. 9, pp. 1004-1009, Sep. 1977. |
Zelinski A.C. et al., “Automatic cost minimization for multiplierless implementations of discrete signal transforms,” Proceedings of the 2004 IEEE International Conference on Acoustics, Speech and Signal Processing, May 17-21, 2004, Montral, Quebec, Canada, pp. 221-224. |
International Preliminary Report on Patentability—PCT/US07/060405—European Patent Office—Berlin—Apr. 16, 2009. |
C. Loeffler, A. Ligtenberg, and G.S. Moschytz. “Algorithm-Architecture Mapping for Custom DSP Chips,” Proc. Int. Symp. Circuits Syst. (Helsinki, Finland), Jun. 1988, pp. 1953-1956. |
Dempster, A.G. et al., “Constant integer multiplication using minimum adders,” IEEE Proceedings—Circuits, Devices and Systems, vol. 141, No. 5, pp. 407-413, Oct. 1994. |
Hung A C et al., “A Comparison of fast inverse discrete cosine transform algorithms” Multimedia Systems, vol. 2. No. 5, Dec. 1994 pp. 204-217. |
K.R. Rao, and P.Yip, “Discrete Cosine Transform: Algorithms, Advantages, Applications,” Academic Press, San Diego, 1990, pp. 88-106. |
Linzer, E., et al.,: “New scaled DCT algorithms for fused multiply/add architectures” Proceedings of the 1991 IEEE Intr. Conf. on Acqustics, Speech and Signal Processing (ICASSP91), May 14-17, 1991 vol. 3, May 1991 pp. 2201-2204. |
Pai et al., Low-power data-dependent 8×8 DCTIIDCT for video compression, 2003, IEEE proceedings online No. 20030564, pp. 245-255. |
Rao, et al.: “Discrete cosine transform: algorithms, advantages, applications,” Academic Press Professional, Inc., San Diego, CA, pp. 490, ISBN: 0-12-580203-X, 1990, CH. 3-4. |
Reznik, Y A et al., “Response to CE on Convergence of scaled and non-scaled IDCT architectures” ISO/IEC JTC1/SC29/WG11 M13650, Jul. 2006 Klagenfurt, Austria. |
Reznik Y A et al.: “Summary of responses on CE on convergence of IDCT architectures” ISO/IEC CTC1/SC29/WG11 M13467, Jul. 2006, Klagenfurt, Austria. |
Taiwan Search Report—TW096101164—TIPO—Dec. 28, 2010. |
V. Lefevre, “Moyens Arithmetiques Pour un Calcul Fiable”, PhD Thesis, Ecole Normale Superieure de Lyon, Lyon, France, Jan. 2000. |
Yuriy A, et al., “Proposed Core Experiment (on Exploration) on Convergence of Scaled and Non-Scaled IDCT Architectures”, Apr. 1, 2006, Montreux, Switzerland. |
Sho Kikkawa, “Prospects of the Theory of Time-Frequency Analysis [IV] : Wavelets and Their Classification”, The Journal of the Institute of Electronics,Information and Communication Engineers, Japan, The Institute of Electronics, Information and Communication Engineers, Aug. 1996, vol. 79, p. 820-830. |
Number | Date | Country | |
---|---|---|---|
20070168410 A1 | Jul 2007 | US |
Number | Date | Country | |
---|---|---|---|
60758464 | Jan 2006 | US |