Video applications can be computationally expensive. Designers may attempt to compress video data to reduce the workload associated with video data. For example, designers may use compression algorithms that take advantage of the high degree of correlation between successive video frames. One such technique is motion estimation. With motion estimation, a reference image (e.g., a previously encoded frame) is sub-divided into macroblocks of 16×16 pixels. The encoding algorithm attempts to match this macroblock to another macroblock that is in a search window in another image (e.g., a current frame). When the best match is obtained, the motion vector that captures the movement of the macroblock from the reference frame to the current frame is encoded and transmitted in place of the actual block.
A method used for determining whether two blocks match one another is the Sum-of-Absolute Differences (SAD). For every search step within the search window of a macroblock in the reference frame, the SAD for the 256 pixels of the block (Σ256|ai−bi|) is computed. The search may continue until the best match (i.e., lowest SAD) is obtained. This operation may repeat for every macroblock in the reference frame. For high resolution video (e.g., 1920×1080 pixels at 30 frame/sec), the method requires computing the motion vector for 243,000 macroblocks/second. Consequently, motion estimation is computationally expensive.
The accompanying drawings, incorporated in and constituting a part of this specification, illustrate one or more implementations consistent with the principles of the invention and, together with the description of the invention, explain such implementations. The drawings are not necessarily to scale, the emphasis instead being placed upon illustrating the principles of the invention. In the drawings:
The following description refers to the accompanying drawings. Among the various drawings the same reference numbers may be used to identify the same or similar elements. While the following description provides a thorough understanding of the various aspects of the claimed invention by setting forth specific details such as particular structures, architectures, interfaces, techniques, etc., such details are provided for purposes of explanation and should not be viewed as limiting. Moreover, those of skill in the art will, in light of the present disclosure, appreciate that various aspects of the invention claimed may be practiced in other examples or implementations that depart from these specific details. At certain junctures in the following disclosure descriptions of well known devices, circuits, and methods have been omitted to avoid clouding the description of the present invention with unnecessary detail.
A method for computing the SAD of, for example, 8 pairs of operands (e.g., pairs of pixels from different video frames) entails using a subtractor circuit for computing the differences and 1's complement for the pairs of operands. The sign, which is determined by the carry out of the subtractor circuit, selects between the calculated difference and its complement to obtain a positive number at the output of a 2:1 multiplexor. A half adder stage (HA) may be used to sum up the extra 1's for generating the 2's complement for the case where both outputs from the subtractors are negative. The sums after the HA stage are the SADs for 2 input pairs in carry-save format. An adder tree is used to sum up all such SADs to generate the SAD for the 8 input pairs.
One embodiment of the invention incorporates redundant binary (RB) arithmetic to improve methods of signal processing such as, for example, video processing. RB uses a pair of bits {a+,a−} to represent the set {−1,0,1} using (a+)−(a−) operation. In contrast, a pair of bits {a,b} in the conventional binary system represents the set {0,1,2} using the a+b operation. In
The absolute difference of pairs of numbers may first be computed and then the absolute differences of all pairs of numbers may be summed up using an adder tree. However, as seen in one embodiment of the invention shown in
Negation by reversing the order of the signals in RB format in 4:2 compressor 140 may allow the use of a higher degree of speculation (i.e., 4 possible outputs) with only a 2× increase of compressors, with much lower circuit delay. Though the hardware cost for speculation may increase by 2×, the impact on total power may be much less since the RB 4:2 compressor circuits 140, 141, 142, 143, 144 used in one embodiment of the invention (
The final conventional binary number is computed by performing the operation (d+)−(d−) on the final RB number. The subtractor 170 in the last stage of the RB based SAD circuit of
At nominal power supply (e.g., 1.2V), one embodiment of the RB SAD invention may be implemented with 8 SADs. This may achieve a performance that is higher than the minimum required for real-time encoding of the highest HDTV resolution, assuming an exhaustive search in a search space of, for example, 15×15 pixels. The performance (or headroom) may improve drastically when common motion estimation algorithms such as a three-step search are applied. The SAD circuitry may use, in one embodiment of the invention, 8-bit data (for pixels) and may sum 8 pairs. However, in other embodiments of the invention, the SAD circuitry is easily scalable to other operand widths and number of summations that need to be performed. Furthermore, power supplies of different voltages may be used in other embodiments of the invention.
Various embodiments of the invention have applications in media applications. Media applications are highly parallelizable workloads and the energy efficiency (GOPS/Watt) of computing these workloads can be increased considerably by operating the circuits at ultra-low power supplies and using parallelism to maintain the same throughput as operating at the nominal supply. To allow improved GOPS/W and to minimize power consumption for lower throughput video constraints, embodiments of the inventive SAD circuit may allow for robust operation at ultra-low and sub-threshold voltage supplies.
In one embodiment of the invention, the 4:1 multiplexor 150 in the SAD2 stage 110 of
The overall SAD circuit delay penalty for using the flip-flop (
Certain embodiments of the invention may be directed to areas other than video processing or video compression. For example, various embodiments of the invention may be related to signal processing (e.g., sum of errors) or the general calculation of SADs using RB.
While the present invention has been described with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations that fall within the true spirit and scope of this present invention.
Number | Name | Date | Kind |
---|---|---|---|
3991307 | Peddle et al. | Nov 1976 | A |
4839850 | Noll et al. | Jun 1989 | A |
5931896 | Kawaguchi | Aug 1999 | A |
6240433 | Schmookler et al. | May 2001 | B1 |
6605981 | Bryant et al. | Aug 2003 | B2 |
6609189 | Kuszmaul et al. | Aug 2003 | B1 |
7274824 | Lee et al. | Sep 2007 | B2 |
7720154 | Zhou et al. | May 2010 | B2 |
7843462 | Poon | Nov 2010 | B2 |
20020083396 | Azadet et al. | Jun 2002 | A1 |
20020129075 | Park et al. | Sep 2002 | A1 |
20030053542 | Seok | Mar 2003 | A1 |
20040202373 | Lee et al. | Oct 2004 | A1 |
20040247029 | Zhong et al. | Dec 2004 | A1 |
20050265454 | Muthukrishnan et al. | Dec 2005 | A1 |
20060064455 | Schulte et al. | Mar 2006 | A1 |
20060213766 | You | Sep 2006 | A1 |
20070297694 | Kurata | Dec 2007 | A1 |
Number | Date | Country | |
---|---|---|---|
20080181295 A1 | Jul 2008 | US |