Method and system for performing two-dimensional transform on data value array with reduced power consumption

Information

  • Patent Application
  • 20080050036
  • Publication Number
    20080050036
  • Date Filed
    August 25, 2006
    18 years ago
  • Date Published
    February 28, 2008
    16 years ago
Abstract
In some embodiments, a method for performing and a system configured to perform a 2D transform (for example, an inverse discrete cosine transform) on each block of a sequence of data blocks, where the 2D transform includes a row transform and a column transform. To perform the row or column transform on a row or column of data, these embodiments determine whether each of different subsets of the data values comprising a partition of the row (column) includes at least one zero value, whether each of different subsets of a first subset of the partition includes at least one zero value, and whether each of different subsets of at least one other subset of the partition includes at least one zero value. When performing the row (column) transform on each row or column that includes at least one zero value and at least one non-zero value, at least one transformation operation on at least one zero value is bypassed or performed in a reduced-power manner, where such transformation operation would otherwise be performed in a manner consuming full power if the zero value were a non-zero value. In some embodiments, the system is a pipelined video decoding system or other video processing system (or a video processing subsystem of a portable media player or other system) including a transform engine implemented in accordance with the invention. Other aspects are transform engines and transform engine circuitry for use in any embodiment of the system.
Description

BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a block diagram of a pipelined video decoding system including a transform engine that can be implemented in accordance with embodiments of the present invention.



FIG. 2 is a block diagram of a processing system (having video decoding capability) that can be implemented in accordance with the invention.



FIG. 3 is a diagram of an input data block, a version of the block that has undergone a row transform, and a version of the block that has undergone a row transform and a column transform.



FIG. 4 is a flow chart of steps of an embodiment of the inventive method (which can be performed by a class of embodiments of the inventive system).



FIG. 5 is a block diagram of transform circuitry employed in a typical implementation of circuitry 7 (or circuitry 11) of FIG. 2 that embodies the invention.



FIG. 6 is a block diagram of elements of an embodiment of the inventive pipelined video decoding system.





DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

Embodiments of the inventive system will be described with reference to FIGS. 1, 2, 3, 4, 5, and 6.



FIG. 1 is a system level block diagram of a pipelined video decoding system including bitstream engine 1, transform engine 3 (which can be implemented in accordance the present invention), and pixel reconstruction engine 5, connected along a system bus as shown. Bitstream engine 1 receives encoded video data that comprise processed blocks of frequency domain coefficients, said frequency domain coefficients having been generated by performing a 2D transform (e.g., a two-dimensional DCT) on blocks of video pixels. Bitstream engine 1 processes the encoded video data, typically by performing unpacking and inverse quantization thereon, and asserts the resulting sequence of blocks of processed frequency domain coefficients to transform engine 3. Transform engine 3 performs a 2D inverse transformation on each block of the sequence to generate blocks of partially decoded video pixels, and asserts the partially decoded video pixels to pixel reconstruction engine 5 for further video decoding. Pixel reconstruction engine 5 generates fully decoded pixels in response to the partially decoded video pixels, and asserts the fully decoded pixels to the system bus.


The system of FIG. 1 can be a subsystem of a portable media player or another portable data processing system (or a data processing system not intended to be portable) having video decoding capability, and such a subsystem can be implemented in accordance with the invention by implementing transform engine 3 in accordance with the invention. Such a processing system can and typically would include additional elements that are not shown in FIG. 1.


It is contemplated that some embodiments of the invention are implemented by systems that do not have the structure shown in FIG. 1, or by processing systems that do not have the structure shown in FIG. 2.



FIG. 2 is a block diagram of a typical implementation of transform engine 3 of FIG. 1, or another apparatus for performing a two dimensional transform (e.g., a 2D inverse transform) on blocks of data values. FIG. 2 includes row/column transform circuitry 7, storage buffer 9, and column/row transform circuitry 11, connected as shown. In some implementations, circuitry 7 of FIG. 2 is configured to perform a row transform on all rows of each block of a sequence of blocks of input data received thereby and to assert the resulting data to storage buffer 9, and circuitry 11 is configured to read columns of such data from buffer 9 and to perform a column transform on all columns of each block of data resulting from the row transform. In other implementations, circuitry 7 of FIG. 2 is configured to perform a column transform on all columns of each block of a sequence of blocks of input data received thereby and to assert the resulting data to storage buffer 9, and circuitry 11 is configured to reads rows of such data from buffer 9 and to perform a row transform on all rows of each block of data resulting from the column transform.



FIG. 3 is a diagram of an input data block (labeled “I”) of a type that can be input to circuitry 7 of an implementation of FIG. 2, a version (labeled block “II”) of block I that has undergone a row transform (e.g., in circuitry 7 of such implementation of FIG. 2), and a version (labeled block “III”) of block I that has undergone a row transform and a column transform (e.g., in circuitry 7 and circuitry 11 of such implementation of FIG. 2). Block I of FIG. 3 is a 4×4 block of frequency domain coefficients, comprising four rows and four columns of coefficients, with each coefficient having zero value labeled as “0” and each coefficient having non-zero value labeled as “X.” The rows are identified by reference symbols “R1” through “R4” and the columns are identified by reference symbols “a” through “d.”


A typical conventional transform engine is configured to identify full rows or columns (of each block to be transformed) that consist entirely of zero values (i.e., “zero-rows” or “zero-columns”) and to bypass normal transform operations that would otherwise be performed on each identified zero-row or zero-column. Such a conventional transform engine would identify rows R1 and R3 of block I as zero-rows and bypass transform computations that it would otherwise perform on rows R1 and R3.


In contrast, a typical embodiment of the inventive transform engine (e.g., an implementation of the FIG. 2 engine that embodies the invention) is configured not only to identify zero-rows and zero-columns of each block to be transformed (e.g., rows R1 and R3 of block I), to bypass row transform computations that it would otherwise perform on a zero-row (if it did not identify the row as a zero-row), and to bypass column transform computations that it would otherwise perform on a zero-column (if it did not identify the column as a zero-column), but also to save additional power during transform operations on non-zero-rows and non-zero-columns (e.g., row transform computations on rows R2 and R4 of block I). The additional power saving is achieved by identifying subsets of rows (and subsets of columns) that consist entirely of zero data values (e.g., the subset of row R2 of block I that consists of the coefficients in columns c and d, or the subset of row R4 of block I that consists of the coefficients in columns b and d), and avoiding (or performing in a reduced-power manner) transform computations that would otherwise be performed on each such subset if the subset did not consist entirely of zero data values.


Consider the case that such an embodiment of the inventive transform engine has the structure shown in FIG. 2 and is configured to perform an IDCT (inverse DCT) transform on each block of a sequence of 4×4 blocks of DCT transform coefficients. Each row of each such block consists of two low frequency coefficients (e.g., those in columns “a” and “b” of each block in FIG. 3) and two high frequency coefficients (e.g., those in columns “c” and “d” of each block in FIG. 3). Each column of such a block consists of two low frequency coefficients (e.g., those in rows “R1” and “R2” of each block in FIG. 3) and two high frequency coefficients (e.g., those in rows “R3” and “R4” of each block in FIG. 3). In this case, higher frequency coefficients of each row and each column have a higher probability of consisting only of zero values than do the lower frequency coefficients.


Circuitry 7 of one such embodiment of the inventive transform engine is configured to determine whether the two lowest frequency coefficients of each row of each block of DCT transform coefficients consist entirely of zero values, to determine whether the two highest frequency coefficients of each such row consist entirely of zero values, to bypass row transform computations on the two lowest frequency coefficients of each row if they consist entirely of zero values (where such computations would otherwise be performed in a manner consuming full power on these coefficients if they did not consist entirely of zero values) or to perform such computations in a reduced-power manner, and to bypass row transform computations on the two highest frequency coefficients of each row if they consist entirely of zero values (where such computations would otherwise be performed in a manner consuming full power on these coefficients if they did not consist entirely of zero values) or to perform such computations in a reduced-power manner. For example, circuitry 7 of such embodiment can be configured to bypass row transform computations on the two highest frequency coefficients of each of rows R1, R2, R2, and R4 of block I.


Circuitry 7 of such embodiment of the inventive transform engine is also configured to determine whether the even coefficient of the two lowest frequency components of each row of each block of DCT transform coefficients asserted thereto (e.g., each coefficient in column “b” of block I of FIG. 3) is a zero value, to determine whether the odd coefficient of the two lowest frequency components of each row of each block of DCT transform coefficients asserted thereto (e.g., each coefficient in column “a” of block I of FIG. 3) is a zero value, to determine whether the even coefficient of the two highest frequency components of each row of each block of DCT transform coefficients asserted thereto (e.g., each coefficient in column “d” of block I of FIG. 3) is a zero value, and to determine whether the odd coefficient of the two highest frequency components of each row of each block of DCT transform coefficients asserted thereto (e.g., each coefficient in column “c” of block I of FIG. 3) is a zero value. Circuitry 7 of such embodiments is also configured to perform in a reduced-power manner a multiplication computation that would otherwise be performed (in a manner consuming more power) on each such coefficient that is determined to be a zero value to transform the row. Typically, each such reduced-power multiplication is performed by operating a multiplication circuit having a first input and a second input to multiply the zero value (asserted to the first input) by a second value (asserted to the second input) without updating the value asserted to the second input, thereby avoiding the consumption of power that would otherwise be consumed to toggle the second input in the case that value asserted to the first input were a non-zero value (and an updated second value at the second input needs to be multiplied by such non-zero value).


Circuitry 11 of such embodiment of the inventive transform engine is preferably also configured to determine whether the first half (i.e., the first two data values) of each column of each 4×4 block of partially transformed coefficients read from buffer 9 consists entirely of zero values, to determine whether the second half (the last two data values) of each such column consists entirely of zero values, to bypass column transform computations on the first half of each column if it consists entirely of zero values (where such computations would otherwise be performed in a manner consuming full power on these values if they did not consist entirely of zero values) or to perform such computations in a reduced-power manner, and to bypass column transform computations on the second half of each column if it consists entirely of zero values (where such computations would otherwise be performed in a full power manner on these values if they did not consist entirely of zero values) or to perform such computations in a reduced-power manner.


Circuitry 11 of such embodiment of the inventive transform engine is also configured to determine whether the even data value in the first half of each column of each of 4×4 block of partially transformed coefficients read from buffer 9 (e.g., each value in row R2 of block II of FIG. 3) is a zero value, to determine whether the odd data value in the first half of each such column (e.g., each coefficient in row R1 of block II of FIG. 3) is a zero value, to determine whether the even data value in the second half of each such column (e.g., each coefficient in row R4 of block II of FIG. 3) is a zero value, and to determine whether the odd data value in the second half of each such column (e.g., each coefficient in row R3 of block II of FIG. 3) is a zero value. Circuitry 11 of such embodiments is also configured to perform in a reduced-power manner a multiplication computation that would otherwise be performed (in a manner consuming more power) on each such data value that is determined to be a zero value to transform the column. Typically, each such reduced-power multiplication is performed by operating a multiplication circuit having a first input and a second input to multiply the zero value (asserted to the first input) by a second value (asserted to the second input) without updating the value asserted to the second input, thereby avoiding the consumption of power that would otherwise be consumed to toggle the second input in the case that value asserted to the first input were a non-zero value (and an updated second value at the second input needs to be multiplied by such non-zero value).


In variations on the above example in which each row and column comprises 8 data values, each step of determining whether the odd (or even) data value in the first (or second) half of a row (or column) is a zero value, is replaced by a step of determining whether the data values in the first (or second) half of said half of the row (or column) consist entirely of zero values, and the engine is configured to bypass row (or column) transform computations on each half of each half row (or half column) consisting entirely of zero values (where such computations would otherwise be performed in a full power manner on these values if they did not consist entirely of zero values) or to perform such computations in a reduced-power manner. Preferably, the engine is also configured to determine whether the odd data value in each such half row (or half column) is a zero value, to determine whether the even data value of each such half row (or half column) is a zero value, and to perform in a reduced-power manner a multiplication computation that would otherwise be performed (in a manner consuming more power) on each such data value that is determined to be a zero value to transform the relevant row or column.


In variations on either above-described example, data values comprising each row or column (or a subset of a row or column) of a block to be transformed by circuitry 7 or 11 are reordered prior to or during the determination as to whether each distinct subset comprising a partition of the row or column (or subset thereof) consists entirely of zero values (or is a zero value). For example, if a row (or column) consists of values xi, where i is an integer in the range 0≦i≦N−1, where N is an even integer, the partition consists of distinct first and second subsets of the row (column), the first subset consists of distinct third and fourth subsets of the row (column), and the second subset consists of distinct fifth and sixth subsets of the row (column), the first subset consists of values xi, where i is an integer in the range 0≦i≦(N/2)−1), the second subset consists of values xi, where i is an integer in the range N/2≦i≦N−1,


the third subset can consist of the values xi, where i is in the range 0≦i≦(N/4)−1), or it can consist of the even values of the first subset, or it can consist of the even values of a reordered version of the first subset,


the fourth subset can consist of the values xi, where i is in the range N/4≦i≦(N/2)−1), or it can consist of the odd values of the first subset, or it can consist of the odd values of a reordered version of the first subset,


the fifth subset can consist of the values xi, where i is in the range N/2≦i≦(3N/4)−1), or it can consist of the even values of the second subset, or it can consist of the even values of a reordered version of the second subset, and


the sixth subset can consist of the values xi, where i is in the range 3N/4≦i≦N−1), or it can consist of the odd values of the second subset, or it can consist of the odd values of a reordered version of the second subset.


With reference again to row R4 of block I of FIG. 3, the even coefficients in such row R4 are zero values and the odd coefficients in such row do not consist entirely of zero values. To transform such row R4, some embodiments of the inventive transform engine would reduce power consumption by preventing the inputs to the multipliers that perform row transform operations on the even coefficients of the row from toggling during the performance of such row transform operations. This helps to reduce the switching power that is consumed by the multipliers during the transform (i.e., to reduce dynamic power consumption).


With reference again to FIG. 3, in row R4 of block I, only the first coefficient in the entire row is a non-zero (coded) coefficient. Preferred embodiments of the inventive transform engine are configured to recognize a row or column of this type (i.e., one in which only the first data value is non-zero) and to transform such a row or column in a manner consuming less power than would be required to transform a row or column consisting entirely of non-zero values. For example, preferred implementations of circuitry 7 (of an embodiment of the inventive transform engine having the structure shown in FIG. 2) are configured to perform a row transform on an input data row of which only the first value is non-zero in a reduced-power manner, in response only to the non-zero coefficient of the row and without performing a normal row transform on all data values of the row (which would consume full power), and to store in buffer 9 the resulting partially transformed data values.


Preferred embodiments of the inventive transform engine reduce power consumption by detecting whether individual data values of a block to be transformed (e.g., performing zero-detection on individual coefficients of a block of DCT coefficients to undergo an inverse direct cosine transform, on a coefficient by coefficient basis), and for all data values that are determined to be zero values, ensuring that at least one (and preferably each) input to at least one multiplier to be employed to transform at least one said zero value does not switch from its earlier value. This reduces dynamic power consumption in the engine.


In preferred implementations of an embodiment of the inventive transform engine having the structure shown in FIG. 2, circuitry 7 stores in internal buffer 9 the partially transformed data value generated thereby. Circuitry 11 reads this data from buffer 9 and performs a column (or row) transform on each column (or row) of the data. Typically, circuitry 11 can perform a column (row) transform computation as soon as there is at least one column (row) of data in buffer 9 and available for transform. Thus, in preferred embodiments, circuitry 7 is configured to reorder the row (column) transform operations that it performs to generate (and store in buffer 9) as rapidly as possibly just enough partially transformed data for circuitry 11 to perform a column (row) transform on a complete column (row) of the data. By configuring circuitry 7 to perform the row (column) transform in such a reordered (e.g., interleaved) manner, power consumption by the system can be reduced (e.g., by minimizing power consumed for internal buffer storage).


Alternatively, when a symmetric transform being performed in accordance with an embodiment of the inventive transform engine having the structure shown in FIG. 2, circuitry 7 is configured, in a manner that exploits the symmetric nature of the transform, to perform row transform computations sufficient to generate (and store in buffer 9) two columns of data before performing additional row transform computations, or to perform enough column transform computations to generate (and store in buffer 9) two rows of data before performing additional column transform computations. The row (or column) transform performed by such embodiment of circuitry 7 proceeds to generate (and store in buffer 9) as rapidly as possibly just enough partially transformed data for circuitry 11 to perform a column (row) transform on additional complete columns (row) of the buffered data. The decision as to whether to implement circuitry 7 to perform row transform computations to so as to output one column of data, or two columns of data, before performing additional row transform computations, depends on a tradeoff between performance and buffer memory requirements. It should be appreciated that if circuitry 7 is implemented to perform a row transform in such a manner that buffer 9 need only have capacity to store one column of partially transformed data, the advantage of minimizing the required capacity of buffer 9 would typically come at the cost of reduced performance (e.g., increased number of cycles for the inventive transform engine to perform an inverse transform on a complete block of data values).


In order to perform a conventional H264 based transform (an integer transform that can be performed without real multiplication) in accordance with the invention, the inventive transform engine can be implemented without multipliers. The H264 based transform can be implemented using shifters rather than multipliers. Bypassing transform circuitry in accordance with the invention can speed up the transform computations (and reduce the energy consumed thereby) even in these implementations.


We next describe a class of embodiments of the invention in more detail with reference to the flow chart of FIG. 4. The flow chart of FIG. 4 pertains to a 2D transform (e.g., an inverse discrete cosine transform) performed on each block of a sequence of input data blocks, where each block comprises rows and columns of input data values, and the 2D transform includes a row transform and a column transform. In accordance with the invention, the 2D transform is performed either by performing the row transform on all rows of each block to generate a block of partially transformed data and then the column transform on each column of the block of partially transformed data, or by performing the column transform on all columns of each block to generate a block of partially transformed data and then the row transform on each row of the block of partially transformed data. The flow chart of FIG. 4 describes such a row transform on rows of data values or such a column transform on columns of data values (e.g., it describes transformation of rows or columns of frequency coefficients by an implementation of circuitry 7 of FIG. 2 that embodies the invention, or transformation of rows or columns of partially transformed data values read from buffer 9 by an implementation of circuitry 11 of FIG. 2 that embodies the invention). For simplicity, the following description of FIG. 4 will assume that rows of data values are being transformed. It should be understood that all references to “row” in the description can be replaced by references to “column” to describe other embodiments of the invention.


A typical block to be inverse transformed in accordance with the invention has four coefficients per row (e.g., if the inverse transform is an H264 based transform) or eight coefficients per row (e.g., if the inverse transform is an IDCT on 8×8 blocks of frequency coefficients). FIG. 4 is applicable in either case.


Before performing the steps shown in FIG. 4, the transform engine of typical embodiments of the invention determines whether a block of data to be transformed consists entirely of zero values. Upon determining that the block consists entirely of zero values, the transform engine causes transformation operations (both row and column transform operations) on the block to be bypassed (e.g., it outputs zeroes in response to the block without performing actual transformation operations on the values of the block). If the transform engine determines that a block of data to be transformed includes at least one non-zero value, it initiates a row transform (i.e., it performs step 20 of FIG. 4).


Upon initiation (step 20) of a row transform to be performed in accordance with the FIG. 4 flow chart, a first row of data values to be transformed is read (in step 22) from a buffer memory. The buffer memory can either be one coupled to an input of circuitry 7 of FIG. 2 (if circuitry 7 implements the FIG. 4 flow chart) or it can be buffer memory 9 of FIG. 2 (if circuitry 11 implements the FIG. 4 flow chart).


In step 24, the transform engine determines whether the current row (the row read during the most recent performance of step 22) includes at least one non-zero value. If not, the transform engine outputs zeroes (step 25) indicative of a row transformed version of the row (without performing actual row transform computations on the data values of the row) and reads the next row from the buffer memory (the next performance of step 22). If the current row includes at least one non-zero value, the engine determines (in step 26) whether the first half of the current row (e.g., the first two data values of a row consisting of four data values) includes at least one non-zero value.


If the first half of the current row does not include at least one non-zero value, the engine outputs zeroes (step 28) indicative of a row transformed version of the first half of the row (without performing actual row transform computations on the data values of the first half of the row) and determines (in step 40) whether the second half of the current row (e.g., the second two data values of a row consisting of four data values) includes at least one non-zero value.


If the first half of the current row includes at least one non-zero value, the engine determines (in step 30) whether the first half of the first half of the current row includes at least one non-zero value and determines (in step 32) whether the second half of the first half of the current row includes at least one non-zero value.


If the first half of the current row's first half includes at least one non-zero value (e.g., if it consists of a single non-zero value, or consists of two values including at least one non-zero value), the engine performs (in step 36) row transform computations on the data values of the first half of the first half of the row and stores (in step 39) the resulting transformed data in a buffer memory. The buffer memory employed to perform step 39 can either be a buffer memory coupled to an output of circuitry 11 of FIG. 2 (if circuitry 11 implements the FIG. 4 flow chart) or buffer memory 9 of FIG. 2 (if circuitry 7 implements the FIG. 4 flow chart).


If the first half of the current row's first half does not include a non-zero value, the engine performs step 37 in which it:


(a) outputs one or more zeroes indicative of a row transformed version of the first half of the current row's first half (without performing actual row transform computations on the data value or values of the first half of the current row's first half). Each such zero value is then stored (in step 39) in a buffer memory; or


(b) outputs at least one data value generated by performing row transform computations in a reduced-power manner on the data value(s) of the first half of the current row's first half. Each such output value is then stored (in step 39) in a buffer memory. For example, the engine can do this by operating multiplication circuitry having a first input set (comprising at least one input) to which at least one constant is asserted and a second input set (comprising at least one input) to which each data value of the first half of the current row's first half is asserted, to perform at least multiplication operation on each data value of the first half of the current row's first half without updating the value asserted to at least one of the first input set and the second input set, to avoid consuming power that would otherwise be consumed to toggle the relevant input(s) of the multiplication circuitry. In the case that the engine is performing an IDCT, each constant asserted to the first input set is a cosine constant, and in one implementation, whenever a data value asserted to the second input set is a zero value the engine prevents the cosine constant to be multiplied with said zero value from being updated (changed from its previous value) since the result of the multiplication would be a zero regardless of the cosine constant's value. In another implementation, the engine prevents the updating of a cosine constant asserted to the first input set (to be multiplied with a zero data value) and prevents the updating of the data value being asserted to the corresponding input of the second input set (to prevent toggling of either multiplier input), and asserts a zero value (e.g., multiplexes a zero value into output of the processing pipeline) indicative of the result of multiplying the zero data value with the cosine constant (without actually multiplying together these two operands); or


(c) outputs one or more zeroes indicative of a row transformed version of at least one value in the first half of the current row's first half (without performing actual row transform computations on such data value or values), and outputs at least one data value generated by performing row transform computations in a reduced-power manner on at least one other data value of the first half of the current row's first half. Each such zero value and output value is then stored (in step 39) in a buffer memory.


If the second half of the current row's first half includes at least one non-zero value (e.g., if it consists of a single non-zero value or consists of two values including at least one non-zero value), the engine performs (in step 34) row transform computations on the data values of the second half of the first half of the row and stores (in step 39) the resulting transformed data in a buffer memory.


If the second half of the current row's first half does not include a non-zero value, the engine performs step 35 in which it:


(a) outputs one or more zeroes indicative of a row transformed version of the second half of the current row's first half (without performing actual row transform computations on the data value or values of the second half of the current row's first half). Each such zero value is then stored (in step 39) in a buffer memory; or


(b) outputs at least one data value generated by performing row transform computations in a reduced-power manner on the data value(s) of the second half of the current row's first half. Each such output value is then stored (in step 39) in a buffer memory. For example, the engine can do this by operating multiplication circuitry having a first input set (comprising at least one input) to which at least one constant is asserted and a second input set (comprising at least one input) to which each data value of the second half of the current row's first half is asserted, to perform at least multiplication operation on each data value of the second half of the current row's first half without updating the value asserted to at least one of the first input set and the second input set, to avoid consuming power that would otherwise be consumed to toggle the relevant input(s) of the multiplication circuitry. In the case that the engine is performing an IDCT, each constant asserted to the first input set is a cosine constant, and in one implementation, whenever a data value asserted to the second input set is a zero value the engine prevents the cosine constant to be multiplied with said zero value from being updated (changed from its previous value) since the result of the multiplication would be a zero regardless of the cosine constant's value. In another implementation, the engine prevents the updating of a cosine constant asserted to the first input set (to be multiplied with a zero data value) and prevents the updating of the data value being asserted to the corresponding input of the second input set (to prevent toggling of either multiplier input), and asserts a zero value (e.g., multiplexes a zero value into the output of the processing pipeline) indicative of the result of multiplying the zero data value with the cosine constant (without actually multiplying together these two operands); or


(c) outputs one or more zeroes indicative of a row transformed version of at least one value in the second half of the current row's first half (without performing actual row transform computations on such data value or values), and outputs at least one data value generated by performing row transform computations in a reduced-power manner on at least one other data value of the second half of the current row's first half. Each such zero value and output value is then stored (in step 39) in a buffer memory.


After all data values generated or asserted in steps 34 and 36, steps 34 and 37, steps 35 and 36, or steps 35 and 37 have been stored (step 39) in the buffer memory, the engine determines (in step 40) whether the second half of the current row includes at least one non-zero value. If the second half of the current row does not include at least one non-zero value, the engine outputs zeroes (step 41) indicative of a row transformed version of the second half of the row (without performing actual row transform computations on the data values of said second half of the row) and reads the next row to be transformed from buffer memory (another performance of step 22).


If the second half of the current row includes at least one non-zero value, the engine determines (in step 42) whether the first half of the second half of the current row includes at least one non-zero value and determines (in step 44) whether the second half of the second half of the current row includes at least one non-zero value.


If the first half of the second half of the current row includes at least one non-zero value (e.g., if it consists of a single non-zero value, or consists of two values including at least one non-zero value), the engine performs (in step 48) row transform computations on the data values of the first half of the second half of the row and stores (in step 50) the resulting transformed data in a buffer memory. The buffer memory employed to perform step 50 can either be a buffer memory coupled to an output of circuitry 11 of FIG. 2 (if circuitry 11 implements the FIG. 4 flow chart) or buffer memory 9 of FIG. 2 (if circuitry 7 implements the FIG. 4 flow chart).


If the first half of the current row's second half does not include a non-zero value, the engine performs step 49 in which it:


(a) outputs one or more zeroes indicative of a row transformed version of the first half of the current row's second half (without performing actual row transform computations on the data value or values of the first half of the current row's second half). Each such zero value is then stored (in step 50) in a buffer memory; or


(b) outputs at least one data value generated by performing row transform computations in a reduced-power manner on the data value(s) of the first half of the current row's second half. Each such output value is then stored (in step 50) in a buffer memory. For example, the engine can do this by operating multiplication circuitry having a first input set (comprising at least one input) to which at least one constant is asserted and a second input set (comprising at least one input) to which each data value of the first half of the current row's second half is asserted, to perform at least multiplication operation on each data value of the first half of the current row's second half without updating the value asserted to at least one of the first input set and the second input set, to avoid consuming power that would otherwise be consumed to toggle the relevant input(s) of the multiplication circuitry. In the case that the engine is performing an IDCT, each constant asserted to the first input set is a cosine constant, and in one implementation, whenever a data value asserted to the second input set is a zero value the engine prevents the cosine constant to be multiplied with said zero value from being updated (changed from its previous value) since the result of the multiplication would be a zero regardless of the cosine constant's value. In another implementation, the engine prevents the updating of a cosine constant asserted to the first input set (to be multiplied with a zero data value) and prevents the updating of the data value being asserted to the corresponding input of the second input set (to prevent toggling of either multiplier input), and asserts a zero value (e.g., multiplexes a zero value into the output of the processing pipeline) indicative of the result of multiplying the zero data value with the cosine constant (without actually multiplying together these two operands); or


(c) outputs one or more zeroes indicative of a row transformed version of at least one value in the first half of the current row's second half (without performing actual row transform computations on such data value or values), and outputs at least one data value generated by performing row transform computations in a reduced-power manner on at least one other data value of the first half of the current row's second half. Each such zero value and output value is then stored (in step 50) in a buffer memory.


If the second half of the second half of the current row includes at least one non-zero value (e.g., if it consists of a single non-zero value, or consists of two values including at least one non-zero value), the engine performs (in step 46) row transform computations on the data values of the second half of the second half of the row and stores (in step 50) the resulting transformed data in a buffer memory.


If the second half of the current row's second half does not include a non-zero value, the engine performs step 47 in which it:


(a) outputs one or more zeroes indicative of a row transformed version of the second half of the current row's second half (without performing actual row transform computations on the data value or values of the second half of the current row's second half). Each such zero value is then stored (in step 50) in a buffer memory; or


(b) outputs at least one data value generated by performing row transform computations in a reduced-power manner on the data value(s) of the second half of the current row's second half. Each such output value is then stored (in step 50) in a buffer memory. For example, the engine can do this by operating multiplication circuitry having a first input set (comprising at least one input) to which at least one constant is asserted and a second input set (comprising at least one input) to which each data value of the second half of the current row's second half is asserted, to perform at least multiplication operation on each data value of the second half of the current row's second half without updating the value asserted to at least one of the first input set and the second input set, to avoid consuming power that would otherwise be consumed to toggle the relevant input(s) of the multiplication circuitry. In the case that the engine is performing an IDCT, each constant asserted to the first input set is a cosine constant, and in one implementation, whenever a data value asserted to the second input set is a zero value the engine prevents the cosine constant to be multiplied with said zero value from being updated (changed from its previous value) since the result of the multiplication would be a zero regardless of the cosine constant's value. In another implementation, the engine prevents the updating of a cosine constant asserted to the first input set (to be multiplied with a zero data value) and prevents the updating of the data value being asserted to the corresponding input of the second input set (to prevent toggling of either multiplier input), and asserts a zero value (e.g., multiplexes a zero value into the output of the processing pipeline) indicative of the result of multiplying the zero data value with the cosine constant (without actually multiplying together these two operands); or


(c) outputs one or more zeroes indicative of a row transformed version of at least one value in the second half of the current row's second half (without performing actual row transform computations on such data value or values), and outputs at least one data value generated by performing row transform computations in a reduced-power manner on at least one other data value of the second half of the current row's second half. Each such zero value and output value is then stored (in step 50) in a buffer memory.


After all data values generated or asserted in steps 46 and 48, steps 46 and 49, steps 47 and 48, or steps 47 and 49 have been stored (step 50) in the buffer memory, the engine determines (step 52) whether the row transform has been performed on all rows of the current block. If the row transform has been performed on all rows of the current block, the engine enters a state (step 54) in which it stops row transform operations. It can then perform step 20 again to begin processing of the next block of data values to be transformed. If the row transform has not been performed on all rows of the current block, the engine reads the next row (of the current block to be transformed) from buffer memory (another performance of step 22).



FIG. 5 is a block diagram of transform circuitry employed in a typical implementation of circuitry 7 (or circuitry 11) of FIG. 2 that embodies the invention. Implementations of circuitry 7 (or 11) designed to transform 4×4 blocks of data would typically include one unit of the FIG. 5 circuitry, for transforming the four data values I0, I1, I2, and I3 of each row or column to be transformed. Implementations of circuitry 7 (or 11) designed to transform 8×8 blocks of data would typically include two units of the FIG. 5 circuitry: one for transforming the four data values I0, I1, I2, and I3 of the first half of each row or column to be transformed; and an identical one for transforming the four data values (I4, I5, I6, and I7) of the second half of each row or column to be transformed (using a second set of cosine constants C4, C5, C6, and C7 in place of the constants C0, C1, C2, and C3 shown in FIG. 5).


Each of multiplication units M1, M2, M3, and M4 of FIG. 5 includes a first input (to which one of cosine constants C0, C1, C2, and C3 is asserted), a second input (to which one of data values I0, I1, I2, and I3) is asserted, and an output at which it asserts a data value indicative of the product of the two operands. The FIG. 5 circuit also includes preprocessing and control logic 100 (to be referred to as preprocessing logic 100), addition circuits A1, A2, A3, A4, and A5, subtraction circuit S1, and delay circuits A and B (implemented as flip-flops), connected as shown, and is configured to perform a pipelined row transform or column transform on each set of four input values asserted thereto, to generate an output value Y0 (or Y1) in response each input value. During operation, delay circuit A typically delays, by one cycle of the pipelined transform, assertion of the output of addition circuit A3 to each selected one (as determined by control signals asserted by logic 100) of addition circuit A5, subtraction circuit S1, and addition circuit A3. During operation, delay circuit B typically delays, by one cycle of the pipelined transform, assertion of the output of addition circuit A4 to each selected one (as determined by control signals asserted by logic 100) of addition circuit A5, subtraction circuit S1, and addition circuit A4.


When the FIG. 5 circuitry is included in an implementation of circuitry 7 (or 11) that is designed to transform 4×4 blocks of data, preprocessing logic 100 is configured to perform the zero (or non-zero) value determination operations of FIG. 4. When two units of the FIG. 5 circuitry are included in an implementation of circuitry 7 (or 11) designed to transform 8×8 blocks of data, each of the units 100 is configured to perform the zero (or non-zero) value determination operations of FIG. 4 on a different subset (e.g., half) of each row or column of data values to be transformed. In any implementation of circuitry 7 (or 11), preprocessing logic 100 is configured to read input data values to be transformed from the relevant buffer memory (e.g., buffer 9 of FIG. 2 or a buffer memory coupled to the input of circuitry 7 of FIG. 2) and to generate (and assert to the appropriate ones of the other elements of FIG. 5) control signals needed to implement the other operations required by FIG. 4. For example, if logic 100 determines that coefficient I0 is a zero value (e.g., during step 30 of FIG. 4), it may assert a control signal to cause the coefficient C0 that was asserted to multiplier M1 during the previous operating cycle to continue to be asserted to multiplier M1 during each subsequent cycle in which multiplier M1 is to multiply the zero-valued coefficient I0 by a constant.


In a typical implementation, preprocessing logic 100 resets the output values Y0 and Y1 to zero upon receiving (e.g., from buffer 9) each set of four new input data values to be transformed. In this case, when preprocessing logic 100 determines that a zero value should be output (i.e., a zero value of Y0 or Y1) in response to one of the input values (e.g., in step 25, 28, 41, 35, 37, 47, or 49 of FIG. 4), it may prevent all toggling of inputs of circuit elements M1, M2, M3, M4, A1, A2, A3, A4, A, B, A5, and S1 during the cycle(s) in which these elements would otherwise operate to transform this input value, and instead cause the reset zero value of Y0 (or Y1) to be asserted to the relevant buffer (e.g., to buffer memory 9 or a buffer memory coupled to the output of circuitry 11).


In embodiments which require reordering of data values to be transformed, preprocessing unit 100 is preferably configured to perform such reordering. For example, to perform transforms similar to that described with reference to FIG. 4 but in which each step of determining whether data values in a first (or second) half of a half row or half column (or half of a half row or half column) include at least one non-zero value, is replaced by a step of determining whether the even (or odd) data values in a half row or half column (or half of a half row or half column) include at least one non-zero value, preprocessing unit 100 could reorder the data values to be assessed in order to perform the required determination. For example, unit 100 could determine whether data values in a first (or second) half of a reordered half row or reordered half column (or half of a reordered half row or reordered half column) include at least one non-zero value, thereby determining whether the even (or odd) data values in a non-reordered version of the half row or half column (or half of a non-reordered version of a half row or half column) include at least one non-zero value. It is contemplated that some implementations of preprocessing unit 100 may perform reordering of subsets of rows or columns (or partial rows or partial columns) that are not half rows (or half partial rows) or half columns (or half partial columns).



FIG. 6 is a block diagram of elements of an embodiment of the inventive pipelined video decoding system. Numerous variations on the design shown in FIG. 6 are contemplated for implementation in other embodiments of the invention (e.g., zero detection circuitry 132, 134, and 135-138 can be omitted and its functions performed by zero detection circuitry 130 or other zero detection circuitry). The system of FIG. 6 includes pre-processing unit 120, transform engine 122, and post-processing unit 124, connected as shown.


Unit 120 is coupled to receive a sequence of blocks of input data values (e.g., input frequency domain coefficients that have been generated by performing a two-dimensional DCT or inverse DCT or other 2D transform on blocks of video pixels). Unit 120 is also coupled to receive data values (identified in FIG. 6 as “output from 1st stage” and “first stage output”) output from post-processing unit 124 after a first pass through the FIG. 6 system so that these data values (which are optionally buffered in a buffer memory after output from unit 124 and before assertion to unit 120) can undergo a second stage of processing during a second pass through the FIG. 6 circuitry (e.g., to perform a row transform on rows of the data values when columns of the data values have undergone a column transform during a first pass through the FIG. 6 circuitry).


Multiplexer 129 of unit 120 selects either input data values (that have not undergone processing in FIG. 6) or data values output from post-processing unit 124 (that have undergone processing in a previous pass through the FIG. 6 system) and asserts the selected values to zero detection logic 130. The selected values are passed through zero detection logic 130 (except, optionally, when logic 130 determines that they consist entirely of zero values). Optionally then, other circuitry within unit 120 performs preliminary processing operations on the values passed through logic 130 to generate preprocessed data values (these operations are performed when required to implement a particular transform to be implemented using the FIG. 6 system). Unit 120 asserts the preprocessed data values (which are identical to the output of logic 130 if no preliminary processing is performed on the output of logic 130) to transform engine 122.


Preprocessing unit 120 is typically configured to separate each set of data values asserted thereto into subsets appropriate for engine 122 to perform the transform to be implemented. The manner in which unit 120 accomplishes the separation into subsets will depend on the transform to be implemented (e.g., 8×8 DCT, 8×8 IDCT, or 8×8 Hadamard), and can be determined by control signals asserted to unit 120 from an external unit so that the subset selection can be customized on the basis of the transform to be implemented.


Transform engine 122 performs a 2D transform on each block of data values that it receives from unit 120 to generate blocks of transformed data values (e.g., blocks of partially decoded video pixels), and asserts the transformed data values to post-processing unit 124 for optional further processing. More specifically, transform engine 122 is configured to perform a pipelined row transform or column transform on each set of four data values asserted thereto from unit 120 to generate an output value in response each of these four data values. Each such set of four data values is typically a row (or column) of a block of data values, or a subset of a row (or column) of a block of data values.


Post-processing unit 124 optionally (i.e., when appropriate for the particular transform being performed by the FIG. 6 system) performs post-processing operations on the transformed values received from engine 122 to generate output data values. The output data values may be asserted from unit 124 to a system bus and/or fed back to at least one input of unit 120 (or to a buffer memory from which they are asserted to at least one input of unit 120). Typically, unit 124 performs rounding, saturation, and final packing (if necessary). The specific post-processing operations performed by unit 124 depend on whether the FIG. 6 circuit is implementing first stage or second stage processing on the input to unit 122.


Transform engine 122 of FIG. 6 is designed to transform 4×4 blocks of data, including by transforming four data values (I0, I1, I2, and I3) of each row or column to be transformed using a set of constants C0, C1, C2, and C3. In some other embodiments of the invention, circuitry for transforming 8×8 blocks of data would include two transform engines 122: one engine for transforming four data values I0, I1, I2, and I3 of a first subset (e.g., the first half) of each row or column to be transformed; and an identical engine for transforming four data values (I4, I5, I6, and I7) of a second subset (e.g., the second half) of each row or column to be transformed (using a second set of constants C4, C5, C6, and C7 in place of the constants C0, C1, C2, and C3 shown in block 122 of FIG. 6). The constants (e.g., C0, C1, C2, and C3) to be employed can be determined by control signals asserted to engine 122 from an external unit so that the choice of constants can be customized on the basis of the transform to be performed.


Typically, zero detection logic 130 of preprocessing unit 120 is configured to determine whether an entire block of data to be transformed consists (all four rows or columns of a 4×4 block of data to be transformed consist) entirely of zero values. For example, it may include a shift register providing sufficient latency to perform such a determination on all rows (columns) of a block before passing the first row (column) of the block to downstream circuitry. Upon determining that the block consists entirely of zero values, logic 130 asserts appropriate control bits to other elements of the FIG. 6 system that cause bypassing (e.g., as described above) of transformation operations that would otherwise be performed subsequently on the block by engine 122, and typically also cause predetermined values (e.g., zeroes) to be output from multiplexer 139 of unit 124 in response to the block without performance of actual transformation operations by one or both of engine 122 and unit 124 on the values of the block.


Whether or not zero detection logic 130 of preprocessing unit 120 is configured to determine whether an entire block of data consists entirely of zero values as described in the previous paragraph, logic 130 is configured to perform zero detection on the data values of each individual row (or column) asserted to logic 130 from multiplexer 129 to determine whether all data values of such a row (or column) are zero values (i.e., it performs step 24 of FIG. 4). When each such row (or column) consists of four data values, logic 130 determines whether the row (column) consists of four zero bits. If logic 130 determines that a row (or column) consists entirely of zero bits, it asserts appropriate control bits that cause transformation operations that would otherwise be performed subsequently by engine 122 on the row (column) to be bypassed, and typically also causes predetermined values (e.g., zeroes) to be output from multiplexer 139 of unit 124 in response to the row (column) without performance of actual transformation operations by one or both of engine 122 and unit 124 on the values of the row (column).


In some implementations, logic 130 determines whether the first data value of each individual row (or column) of data values asserted to it from multiplexer 129 is a non-zero value and all other values of the row (or column) are zero values. In response to identifying a row (column) consisting of zero values except for an initial non-zero value, such an implementation of logic 130 asserts appropriate control bits that cause transformation operations that would otherwise be performed subsequently by engine 122 on the row (column) to be bypassed, and typically also causes predetermined values (e.g., zeroes) to be output from multiplexer 139 of unit 124 in response to the row (column) without performance of actual transformation operations by one or both of engine 122 and unit 124 on the values of the row (column).


Zero detection logic 132 of engine 122 is configured to perform zero detection on a first subset (e.g., the first half, which are the first two data values) of each row (or column) asserted to engine 122 from unit 120 to determine whether all the data values of such first subset are zero values (e.g., it performs step 26 of FIG. 4). If logic 132 determines that the first subset (e.g. half row or half column) consists entirely of zero bits, it asserts appropriate control bits that cause transformation operations that would otherwise be performed subsequently by engine 122 on the first subset to be bypassed, and typically causes predetermined values (e.g., zeroes) to be output from multiplexer 140 of engine 122 in response to the first subset without performance of actual transformation operations by engine 122 on the values of the first subset. If logic 132 determines that the first subset (e.g., the first half) of the current row (column) includes at least one non-zero value, zero detection logic units 135 and 136 perform zero detection on the first and second values of the first subset.


If logic 135 determines that a first data value of the first subset of the current row (column) is a zero value, it asserts a control bit to multiplication circuit 141 (to whose first input the transform constant C0 is asserted, and to whose second input the zero data value of the current row or column is asserted) to cause circuit 141 to perform a multiplication operation on the value asserted to its second input without updating the previous value asserted to its first input, to avoid consuming power that would otherwise be consumed to toggle the first input to an updated value of the constant C0.


Similarly, if logic 136 determines that a second data value of the first subset of the current row (column) is a zero value, it asserts a control bit to multiplication circuit 142 (to whose first input the transform constant C1 is asserted, and to whose second input such zero data value of the current row or column is asserted) to cause circuit 142 to perform a multiplication operation on the value asserted to its second input without updating the previous value asserted to its first input, to avoid consuming power that would otherwise be consumed to toggle the first input to an updated value of the constant C1.


Zero detection logic 134 of engine 122 is configured to perform zero detection on a second subset (e.g., the second half, which are the second two data values) of each row (or column) asserted to engine 122 from unit 120 to determine whether all the data values of such second subset are zero values (i.e., it performs step 40 of FIG. 4). If logic 134 determines that the second subset consists entirely of zero bits, it asserts appropriate control bits that cause transformation operations that would otherwise be performed subsequently by engine 122 on the second subset (e.g., half row or half column) to be bypassed, and typically causes predetermined values (e.g., zeroes) to be output from multiplexer 141 of engine 122 in response to the second subset without performance of actual transformation operations by engine 122 on the values of the second subset. If logic 134 determines that the second subset of the current row (column) includes at least one non-zero value, zero detection logic units 137 and 138 perform zero detection on the first and second values of the second subset.


If logic 137 determines that a first data value of the second subset of the current row (column) is a zero value, it asserts a control bit to multiplication circuit 143 (to whose first input the transform constant C2 is asserted, and to whose second input the zero data value of the current row or column is asserted) to cause circuit 143 to perform a multiplication operation on the value asserted to its second input without updating the previous value asserted to its first input, to avoid consuming power that would otherwise be consumed to toggle the first input to an updated value of the constant C2.


Similarly, if logic 138 determines that a second data value of the second subset of the current row (column) is a zero value, it asserts a control bit to multiplication circuit 144 (to whose first input the transform constant C3 is asserted, and to whose second input such zero data value of the current row or column is asserted) to cause circuit 144 to perform a multiplication operation on the value asserted to its second input without updating the previous value asserted to its first input, to avoid consuming power that would otherwise be consumed to toggle the first input to an updated value of the constant C3.


Each of multiplexers 129, 140, 141, 150, 151, 152, and 153 of FIG. 6 is controlled during operation to pass through to its output one of the values being asserted to its inputs during the current cycle of pipelined operation of the FIG. 6 system. Each of circuits 156 and 157 of FIG. 6 is an addition circuit that is controlled during operation to assert at its output the sum of the values being asserted to its inputs during the current operating cycle of the FIG. 6 system. Circuits 154 and 155 of FIG. 6 are delay circuits (implemented as flip-flops) having latency appropriate for the specific transform being performed by the FIG. 6 system.


Some alternative embodiments of the invention consist of or include circuitry identical to FIG. 6, except in that logic circuitry 132, 134, 135, 136, 137, and 138 of FIG. 6 is omitted, and the functions of the omitted circuitry are performed by a modified version of logic circuitry 130 of FIG. 6. Specifically, such a modified version of logic 130 would typically be configured to determine whether an entire block of data to be transformed consists (i.e., all four rows or columns of a 4×4 block of data to be transformed consist) entirely of zero values, whether each row (or column) of data of a block to be transformed consists entirely of zero values, whether each half of each row (or column) of data of a block to be transformed consists entirely of zero values, and whether each data value of each half row (or column) of data of a block to be transformed is a zero value, and asserts (to other elements of the system) in response to such determinations control signals of the types described above that are asserted by logic circuitry 130, 132, 134, 135, 136, 137, and 138 of FIG. 6.


The system of FIG. 6 can be a subsystem of a portable media player or another portable data processing system (or a data processing system not intended to be portable), e.g., one having video decoding capability. Such a media player or other processing system can and typically would include additional elements that are not shown in FIG. 6.


It should be understood that in order to implement various embodiments of the invention to perform any of many different 2D transforms on blocks of data having any of many different formats, variations on the specific steps shown and described with reference to FIG. 4 and on the specific structure shown and described with reference to FIGS. 5 and 6 would be implemented.


It should also be understood that while some embodiments of the present invention are illustrated and described herein, the invention is defined by the claims and is not to be limited to the specific embodiments described and shown.

Claims
  • 1. A system configured to perform a 2D transform on each input data block of a sequence of input data blocks, wherein each said input data block comprises rows and columns of input data values and the 2D transform includes a row transform and a column transform, said system including: row transform circuitry configured to generate a block of partially transformed data in response to each said input data block, by performing the row transform on each row of the input data block; andcolumn transform circuitry configured to perform the column transform on each column of the block of partially transformed data,wherein the row transform circuitry is configured to determine, when performing the row transform on each said row, whether each of different subsets of the data values comprising a partition of the row includes at least one zero value, whether each of different subsets of a first one of the subsets of the partition of the row includes at least one zero value, and whether each of different subsets of a second one of the subsets of the partition of the row includes at least one zero value, andthe row transform circuitry is configured to bypass or perform in a reduced-power manner at least one transformation operation on at least one of the data values that is determined to be a zero value.
  • 2. The system of claim 1, wherein the row transform circuitry is configured to bypass the at least one transformation operation on said at least one of the data values that is determined to be a zero value.
  • 3. The system of claim 1, wherein the row transform circuitry is configured to perform in a reduced-power manner the at least one transformation operation on said at least one of the data values that is determined to be a zero value.
  • 4. The system of claim 3, wherein the row transform circuitry includes at least one circuit having an input set, and the row transform circuitry is configured to perform the at least one transformation operation in the reduced-power manner by causing the circuit to operate without updating at least one input in said input set during the transformation operation on said one of the data values that is determined to be a zero value.
  • 5. The system of claim 1, wherein the 2D transform is an inverse discrete cosine transform.
  • 6. The system of claim 1, also including: a buffer coupled to the row transform circuitry and to the column transform circuitry, wherein the row transform circuitry is configured to store the block of partially transformed data in the buffer and the column transform circuitry is configured to retrieve columns of the block of partially transformed data from the buffer.
  • 7. The system of claim 6, wherein the column transform circuitry is configured to determine, when performing the column transform on each said column, whether each of different subsets of the data values comprising a first partition of the column includes at least one zero value, whether each of different subsets of a first one of the subsets of the first partition of the column includes at least one zero value, and whether each of different subsets of a second one of the subsets of the first partition of the column includes at least one zero value, and the column transform circuitry is configured to bypass at least one transformation operation on at least one of the data values that is determined to be a zero value.
  • 8. The system of claim 6, wherein the column transform circuitry is configured to determine, when performing the column transform on each said column, whether each of different subsets of the data values comprising a first partition of the column includes at least one zero value, whether each of different subsets of a first one of the subsets of the first partition of the column includes at least one zero value, and whether each of different subsets of a second one of the subsets of the first partition of the column includes at least one zero value, and the column transform circuitry is configured to perform in a reduced-power manner at least one transformation operation on at least one of the data values that is determined to be a zero value.
  • 9. The system of claim 8, wherein the column transform circuitry includes at least one circuit having an input set, and the column transform circuitry is configured to perform the at least one transformation operation in the reduced-power manner by causing the circuit to operate without updating at least one input in said input set during the transformation operation on said one of the data values that is determined to be a zero value.
  • 10. The system of claim 1, wherein each said row comprises values xi, where i is an integer in the range 0≦i≦N−1, and N is an even integer, the partition of the row includes a first subset of data values and a second subset of data values distinct from the first subset, a partition of the first subset includes a third subset of data values and a fourth subset distinct from the third subset, and a partition of the second subset includes a fifth subset of data values and a sixth subset distinct from the fifth subset, and wherein the row transform circuitry is configured to determine when performing the row transform on each said row whether the first subset consists entirely of zero values and whether the second subset consists entirely of zero values, and upon determining that the first subset consists entirely of zero values to bypass at least one said transformation operation on the data values in the first subset, and upon determining that the second subset consists entirely of zero values, to bypass at least one said transformation operation on the data values in the second subset.
  • 11. The system of claim 10, wherein the row transform circuitry is configured to determine, upon determining that each of the first subset and the second subset includes at least one non-zero value when performing the row transform on each said row, whether each of the third subset, the fourth subset, the fifth subset, and the sixth subset consists entirely of zero values.
  • 12. The system of claim 11, wherein the third subset consists of even values of the first subset, the fourth subset consists of odd values of the first subset, the fifth subset consists of even values of the second subset, and the sixth subset consists of odd values of the second subset.
  • 13. The system of claim 1, wherein the row transform circuitry is configured to perform a reordering operation when performing the row transform on each said row, wherein the reordering operation generates reordered data values by reordering data values of the row, and the row transform circuitry is configured to determine whether a first subset of the reordered data includes at least one zero value and whether a second subset of the reordered data includes at least one zero value, wherein the second subset of the reordered data is distinct from the first subset of said reordered data.
  • 14. A system configured to perform a 2D transform on each input data block of a sequence of input data blocks, wherein each said input data block comprises rows and columns of input data values and the 2D transform includes a row transform and a column transform, said system including: column transform circuitry configured to generate a block of partially transformed data in response to each said input data block, by performing the column transform on each column of the input data block; androw transform circuitry configured to perform the row transform on each row of the block of partially transformed data,wherein the column transform circuitry is configured to determine, when performing the column transform on each said column, whether each of different subsets of the data values comprising a partition of the column includes at least one zero value, whether each of different subsets of a first one of the subsets of the partition of the column includes at least one zero value, and whether each of different subsets of a second one of the subsets of the partition of the column includes at least one zero value, andthe column transform circuitry is configured to bypass or perform in a reduced-power manner at least one transformation operation on at least one of the data values that is determined to be a zero value.
  • 15. The system of claim 14, wherein the column transform circuitry is configured to bypass the at least one transformation operation on said at least one of the data values that is determined to be a zero value.
  • 16. The system of claim 14, wherein the column transform circuitry is configured to perform in a reduced-power manner the at least one transformation operation on said at least one of the data values that is determined to be a zero value.
  • 17. The system of claim 16, wherein the column transform circuitry includes at least one circuit having an input set, and the column transform circuitry is configured to perform the at least one transformation operation in the reduced-power manner by causing the circuit to operate without updating at least one input in said input set during the transformation operation on said one of the data values that is determined to be a zero value.
  • 18. The system of claim 14, wherein the 2D transform is an inverse discrete cosine transform.
  • 19. The system of claim 14, also including: a buffer coupled to the row transform circuitry and to the column transform circuitry, wherein the column transform circuitry is configured to store the block of partially transformed data in the buffer and the row transform circuitry is configured to retrieve columns of the block of partially transformed data from the buffer.
  • 20. The system of claim 14, wherein each said column comprises values xi, where i is an integer in the range 0≦i≦N−1, and N is an even integer, the partition of the column includes a first subset of data values and a second subset of data values distinct from the first subset, a partition of the first subset includes a third subset of data values and a fourth subset distinct from the third subset, and a partition of the second subset includes a fifth subset of data values and a sixth subset distinct from the fifth subset, and wherein the column transform circuitry is configured to determine when performing the column transform on each said column whether the first subset consists entirely of zero values and whether the second subset consists entirely of zero values, and upon determining that the first subset consists entirely of zero values to bypass at least one said transformation operation on the data values in the first subset, and upon determining that the second subset consists entirely of zero values, to bypass at least one said transformation operation on the data values in the second subset.
  • 21. The system of claim 20, wherein the column transform circuitry is configured to determine, upon determining that each of the first subset and the second subset includes at least one non-zero value when performing the column transform on each said column, whether each of the third subset, the fourth subset, the fifth subset, and the sixth subset consists entirely of zero values.
  • 22. The system of claim 21, wherein the third subset consists of even values of the first subset, the fourth subset consists of odd values of the first subset, the fifth subset consists of even values of the second subset, and the sixth subset consists of odd values of the second subset.
  • 23. The system of claim 14, wherein the column transform circuitry is configured to perform a reordering operation when performing the column transform on each said column, wherein the reordering operation generates reordered data values by reordering data values of the column, and the column transform circuitry is configured to determine whether a first subset of the reordered data includes at least one zero value and whether a second subset of the reordered data includes at least one zero value, wherein the second subset of the reordered data is distinct from the first subset of said reordered data.
  • 24. A method for performing a 2D transform on a sequence of input data blocks, wherein each of the blocks comprises rows and columns of input data values, and the 2D transform includes a row transform and a column transform, said method including the steps of: (a) performing the row transform on each row of one of the input data blocks, including by performing transformation operations on input data values of each said row, to generate a partially transformed data block; and(b) performing the column transform on each column of the partially transformed data block, including by performing additional transformation operations on data values of each said column, wherein step (a) includes the steps of:(c) determining whether each of different subsets of the data values comprising a partition of each said row includes at least one zero value, determining whether each of different subsets of a first subset of the partition includes at least one zero value, and determining whether each of different subsets of at least one other subset of the partition includes at least one zero value; and(d) when performing the row transform on each said row that is determined to include at least one zero value and at least one non-zero value, bypassing or performing in a reduced-power manner at least one of the transformation operations on at least one of the input data values that is determined to be a zero value, where said one of the transformation operations would otherwise be performed in a manner consuming full power if the zero value were a non-zero value.
  • 25. The method of claim 24, wherein step (a) includes the step of operating a multiplication circuit having a first input and a second input to perform a multiplication operation in which a first value asserted to the first input is multiplied by a second value asserted to the second input, and wherein step (d) includes the step of performing the multiplication operation for said at least one of the input data values that is determined to be a zero value in a reduced-power manner, by preventing toggling of at least one of the first input and the second input.
  • 26. The method of claim 24, wherein step (d) includes the step of, when performing the row transform on each said row that is determined to include at least one zero value and at least one non-zero value, bypassing said at least one of the transformation operations on said at least one of the input data values that is determined to be a zero value.
  • 27. The method of claim 24, wherein step (b) includes the steps of: determining whether each of different subsets of the data values comprising a first partition of each said column includes at least one zero value, determining whether each of different subsets of a first subset of the first partition includes at least one zero value, and determining whether each of different subsets of at least one other subset of the first partition includes at least one zero value; andwhen performing the column transform on each said column that is determined to include at least one zero value and at least one non-zero value, bypassing at least one of the additional transformation operations on at least one of the data values that is determined to be a zero value, where said one of the additional transformation operations would otherwise be performed in a manner consuming full power if the zero value were a non-zero value.
  • 28. The method of claim 24, wherein step (b) includes the steps of: determining whether each of different subsets of the data values comprising a first partition of each said column includes at least one zero value, determining whether each of different subsets of a first subset of the first partition includes at least one zero value, and determining whether each of different subsets of at least one other subset of the first partition includes at least one zero value; andwhen performing the column transform on each said column that is determined to include at least one zero value and at least one non-zero value, performing in a reduced-power manner at least one of the additional transformation operations on at least one of the data values that is determined to be a zero value, where said one of the additional transformation operations would otherwise be performed in a manner consuming full power if the zero value were a non-zero value.
  • 29. The method of claim 24, wherein each said block is a block of DCT coefficients that have been generated by performing a DCT on video data and the 2D transform is an inverse discrete cosine transform.
  • 30. The method of claim 24, wherein at least substantially all of the blocks in the sequence are blocks of sparse data.
  • 31. A method for performing a 2D transform on a sequence of input data blocks, wherein each of the blocks comprises rows and columns of input data values, and the 2D transform includes a row transform and a column transform, said method including the steps of: (a) performing the row transform on each row of one of the input data blocks, including by performing transformation operations on input data values of each said row, to generate a partially transformed data block; and(b) performing the column transform on each column of the partially transformed data block, including by performing additional transformation operations on data values of each said column, wherein step (b) includes the steps of:(c) determining whether each of different subsets of the data values comprising a partition of each said column includes at least one zero value, determining whether each of different subsets of a first subset of the partition includes at least one zero value, and determining whether each of different subsets of at least one other subset of the partition includes at least one zero value; and(d) when performing the column transform on each said column that is determined to include at least one zero value and at least one non-zero value, bypassing or performing in a reduced-power manner at least one of the additional transformation operations on at least one of the data values that is determined to be a zero value, where said one of the additional transformation operations would otherwise be performed in a manner consuming full power if the zero value were a non-zero value.
  • 32. The method of claim 31, wherein step (b) includes the step of operating a multiplication circuit having a first input and a second input to perform a multiplication operation in which a first value asserted to the first input is multiplied by a second value asserted to the second input, and wherein step (d) includes the step of performing the multiplication operation for said at least one of the input data values that is determined to be a zero value in a reduced-power manner, by preventing toggling of at least one of the first input and the second input.
  • 33. The method of claim 31, wherein step (d) includes the step of, when performing the column transform on each said column that is determined to include at least one zero value and at least one non-zero value, bypassing said at least one of the additional transformation operations on said at least one of the data values that is determined to be a zero value.
  • 34. The method of claim 31, wherein each said block is a block of DCT coefficients that have been generated by performing a DCT on video data and the 2D transform is an inverse discrete cosine transform.
  • 35. The method of claim 31, wherein at least substantially all of the blocks in the sequence are blocks of sparse data.
  • 36. A video processing system, including a transform engine configured to perform a 2D transform on each input data block of a sequence of input data blocks, wherein each said input data block comprises rows and columns of input data values and the 2D transform includes a row transform and a column transform, said transform engine including: row transform circuitry configured to generate a block of partially transformed data in response to each said input data block, by performing the row transform on each row of the input data block; andcolumn transform circuitry configured to perform the column transform on each column of the block of partially transformed data,wherein the row transform circuitry is configured to determine, when performing the row transform on each said row, whether each of different subsets of the data values comprising a partition of the row includes at least one zero value, whether each of different subsets of a first subset of the partition of the row includes at least one zero value, and whether each of different subsets of at least one other subset of the partition of the row includes at least one zero value, andthe row transform circuitry is configured to bypass or perform in a reduced-power manner at least one transformation operation on at least one of the data values that is determined to be a zero value.
  • 37. The system of claim 36, wherein the row transform circuitry is configured to bypass the at least one transformation operation on said at least one of the data values that is determined to be a zero value.
  • 38. The system of claim 36, wherein the row transform circuitry is configured to perform in a reduced-power manner the at least one transformation operation on said at least one of the data values that is determined to be a zero value.
  • 39. The system of claim 38, wherein the row transform circuitry includes at least one circuit having an input set, and the row transform circuitry is configured to perform the at least one transformation operation in the reduced-power manner by causing the circuit to operate without updating at least one input in said input set during the transformation operation on said one of the data values that is determined to be a zero value.
  • 40. The system of claim 36, wherein the 2D transform is an inverse discrete cosine transform.
  • 41. The system of claim 36, wherein the transform engine also includes: a buffer coupled to the row transform circuitry and to the column transform circuitry, wherein the row transform circuitry is configured to store the block of partially transformed data in the buffer and the column transform circuitry is configured to retrieve columns of the block of partially transformed data from the buffer.
  • 42. The system of claim 41, wherein the column transform circuitry is configured to determine, when performing the column transform on each said column, whether each of different subsets of the data values comprising a first partition of the column includes at least one zero value, whether each of different subsets of a first subset of the first partition of the column includes at least one zero value, and whether each of different subsets of at least one other subset of the first partition of the column includes at least one zero value, and the column transform circuitry is configured to bypass at least one transformation operation on at least one of the data values that is determined to be a zero value.
  • 43. The system of claim 42, wherein the column transform circuitry is configured to determine, when performing the column transform on each said column, whether each of different subsets of the data values comprising a first partition of the column includes at least one zero value, whether each of different subsets of a first subset of the first partition of the column includes at least one zero value, and whether each of different subsets of at least one other subset of the first partition of the column includes at least one zero value, and the column transform circuitry is configured to perform in a reduced-power manner at least one transformation operation on at least one of the data values that is determined to be a zero value.
  • 44. A video processing system, including a transform engine configured to perform a 2D transform on each input data block of a sequence of input data blocks, wherein each said input data block comprises rows and columns of input data values and the 2D transform includes a row transform and a column transform, said transform engine including: column transform circuitry configured to generate a block of partially transformed data in response to each said input data block, by performing the column transform on each column of the input data block; androw transform circuitry configured to perform the row transform on each row of the block of partially transformed data,wherein the column transform circuitry is configured to determine, when performing the column transform on each said column, whether each of different subsets of the data values comprising a partition of the column includes at least one zero value, whether each of different subsets of a first subset of the partition of the column includes at least one zero value, and whether each of different subsets of at least one other subset of the partition of the column includes at least one zero value, andthe column transform circuitry is configured to bypass or perform in a reduced-power manner at least one transformation operation on at least one of the data values that is determined to be a zero value.
  • 45. The system of claim 44, wherein the column transform circuitry is configured to bypass the at least one transformation operation on said at least one of the data values that is determined to be a zero value.
  • 46. The system of claim 44, wherein the column transform circuitry is configured to perform in a reduced-power manner the at least one transformation operation on said at least one of the data values that is determined to be a zero value.
  • 47. The system of claim 46, wherein the column transform circuitry includes at least one circuit having an input set, and the column transform circuitry is configured to perform the at least one transformation operation in the reduced-power manner by causing the circuit to operate without updating at least one input in said input set during the transformation operation on said one of the data values that is determined to be a zero value.
  • 48. The system of claim 44, wherein the 2D transform is an inverse discrete cosine transform.
  • 49. The system of claim 44, also including: a buffer coupled to the row transform circuitry and to the column transform circuitry, wherein the column transform circuitry is configured to store the block of partially transformed data in the buffer and the row transform circuitry is configured to retrieve columns of the block of partially transformed data from the buffer.
  • 50. A transform engine configured to perform a row transform of a 2D transform on each input data block of a sequence of input data blocks, wherein each said input data block comprises rows and columns of input data values and the 2D transform includes the row transform and a column transform, said transform engine including: first circuitry coupled to receive each row of each said input data block and configured to determine whether each of different subsets of the data values comprising a partition of each said row includes at least one zero value, whether each of different subsets of a first one of the subsets of the partition of the row includes at least one zero value, and whether each of different subsets of at least one other one of the subsets of the partition of the row includes at least one zero value; andtransform circuitry coupled to the first circuitry and configured to perform the row transform on the data values of each said row, wherein the transform circuitry is configured to operate under control of the first circuitry to bypass or perform in a reduced-power manner at least one transformation operation on at least one of the data values that is determined by the first circuitry to be a zero value.
  • 51. The transform engine of claim 50, wherein the transform circuitry is configured to bypass the at least one transformation operation on said at least one of the data values that is determined by the first circuitry to be a zero value.
  • 52. The transform engine of claim 50, wherein the transform circuitry is configured to perform in a reduced-power manner the at least one transformation operation on said at least one of the data values that is determined by the first circuitry to be a zero value.
  • 53. The transform engine of claim 50, wherein the 2D transform is an inverse discrete cosine transform.
  • 54. A transform engine configured to perform a column transform of a 2D transform on each input data block of a sequence of input data blocks, wherein each said input data block comprises rows and columns of input data values and the 2D transform includes the column transform and a row transform, said transform engine including: first circuitry coupled to receive each column of each said input data block and configured to determine whether each of different subsets of the data values comprising a partition of each said column includes at least one zero value, whether each of different subsets of a first one of the subsets of the partition of the column includes at least one zero value, and whether each of different subsets of at least one other one of the subsets of the partition of the column includes at least one zero value; andtransform circuitry coupled to the first circuitry and configured to perform the column transform on the data values of each said column, wherein the transform circuitry is configured to operate under control of the first circuitry to bypass or perform in a reduced-power manner at least one transformation operation on at least one of the data values that is determined by the first circuitry to be a zero value.
  • 55. The transform engine of claim 54, wherein the transform circuitry is configured to bypass the at least one transformation operation on said at least one of the data values that is determined by the first circuitry to be a zero value.
  • 56. The transform engine of claim 54, wherein the transform circuitry is configured to perform in a reduced-power manner the at least one transformation operation on said at least one of the data values that is determined by the first circuitry to be a zero value.
  • 57. The transform engine of claim 54, wherein the 2D transform is an inverse discrete cosine transform.