The present invention relates generally to digital video signal processing, and more specifically to devices for video coding.
Video coding standards that make use of several advanced video coding tools and techniques to provide high compression performance are well known in the art. In the past, standards such as MPEG-2, MPEG4, and H.263 have been widely adopted. More recently, H.264 has been widely adopted as it offers better compression performance than other video compression standards. At the core of all these video compression standards are the techniques of motion compensation and transform coding.
Motion compensation schemes basically assume, for most sequences of video frames, the amount of change from one frame to the next is small. Thus, compression can be achieved by transmitting or storing information in a frame as a difference, or delta, from a previous frame, rather than as an independent image. In this way, only the changes between a new frame and a previous frame need to be captured. The frame used for comparison is called a reference frame. The frame that is being encoded is called a current frame.
The specific type of motion compensation schemes used by many video encoding standards, such as H.264 AVC, is called block motion compensation. Block motion compensation schemes typically decompose a frame into macroblocks where each macroblock contains 16×16 luminance values (Y) and two 8×8 chrominance values (Cb and Cr), although other block sizes are also used. These macroblocks are typically processed one at a time. The compression mechanism in a video encoder would attempt to find a macroblock in the reference frame that closely matches the current macroblock of the current frame (motion estimation), and the differences between these two blocks would be transformed and quantized. The transform of a macroblock converts the pixel values of the block from the spatial domain into a frequency domain for quantization. This transformation step may use a two-dimensional discrete cosine transform (DCT) or other transformation methods. The residual macroblock data generated by the transformation step is then quantized, and then coded by using variable length coding.
During motion estimation, the video encoder would attempt to find a macroblock in the reference frame that best matches the current macroblock of the current frame by comparing the current macroblock against macroblocks in the reference frame. The best match is determined for instance by choosing the macroblock with the lowest SAD (Sum of Absolute Differences). In a typical implementation, in order to find the best match, a search area of the reference frame may be stepped through one pixel at a time. Thus, even for a small search area, many comparisons are necessary. A “search” herein refers to the process of determining a best match (e.g., a macroblock in a predetermined search area of a reference frame) for a current macroblock.
High-resolution frames have a large number of macroblocks and thus require a large number of searches performed per frame. Since each search requires comparison of the current macroblock against many reference macroblocks, which are stored in the frame memory, a large amount of macroblock data would have to be moved from the frame memory to the encoder per frame. If the encoder is designed for real-time encoding of video data, which may require numerous frames per second, an even higher memory throughput would be needed. Such a high memory throughput is difficult to achieve, particularly in a compact encoder design suitable for mobile devices where there may not be sufficient room for routing data paths and where power consumption is a important design constraint.
Accordingly, what is desired is a method and system for motion estimation that is highly efficient in terms of memory throughput.
A method and system for motion estimation that is highly efficient in terms of memory throughput is provided. Sub-macroblocks are used as basic unit for motion search. The comparison results are saved and the saved results can be used to compute best matches for larger partition sizes.
According to an embodiment of the invention, a 16×16 macroblock is sub-divided into a group of four quadrants of 8×8 sub-macroblocks. Motion searches are accomplished by stepping the current macroblock through the macroblocks in a search area of the reference frame row-by-row with no overlap. Each quadrant of the reference sub-macroblock is compared against all four quadrants of the current macroblock to produce four sub-macroblock Sum of Absolute Difference (SAD) values. The sub-macroblock SAD values are temporarily stored and selectively summed to produce one or more macroblock-level SAD values. When all the macroblocks of the search area have been scanned, the resulting SAD values are compared to generate the best match macroblock.
The invention will now be described with reference to the accompanying drawings, which are provided to illustrate various example embodiments of the invention. Throughout the description, similar reference names may be used to identify similar elements.
The present invention provides a method and system for motion search that is highly efficient in terms of memory throughput.
The macroblocks 206-1 to 206-6 may be retrieved one macroblock at a time, row-by-row, from the frame memory (not illustrated) where the macroblock data is stored. Values for determining best match (e.g., SAD values) are determined for each sub-macroblock and stored by the video encoder integrated circuit. More specifically, sub-block A of macroblock 202 is compared against sub-block A of macroblock 206-1 to produce a first sub-macroblock SAD value SAW1, and sub-block B of macroblock 202 is compared against sub-block B of macroblock 206-1 to produce a second sub-macroblock SAD value SBX1, etc. This process is repeated for each of the macroblocks 206-2 to 206-6 to generate a set of sub-macroblock SAD values 300. In an implementation of the present invention, the set of sub-macroblock level SAD values 300 are stored in the video encoder integrated circuit until at least after a next row of macroblocks in the same row has been processed. For instance, sub-macroblock SAD values for macroblocks 206-1 to 206-3 stored locally within the video encoder integrated circuit until at least after sub-macroblock SAD values for macroblock 206-4 to 206-6 have been calculated.
Once the set of sub-macroblock SAD values 300 are calculated, macroblock-level SAD values can be easily generated by summing appropriate sub-macroblock SAD values.
In
With reference now to
Attention is directed now to a preferred embodiment of the present invention. According to this embodiment of the invention, motion search is accomplished by stepping a current macroblock through the macroblocks in a search area of the reference frame row-by-row with no overlap. Each sub-macroblock (e.g., A, B, C, D) of a current macroblock is compared against each sub-macroblock (e.g., W1, X1, Y1, Z1) of the reference frame. For example, macroblocks 406-1, 406-2 and 406-3 of row 402 are retrieved from the frame memory one macroblock at a time, starting with macroblock 406-1. When all the macroblocks of row 402 have been processed, macroblocks 4064, 406-5, and 406-6 of row 403 are retrieved and processed.
Macroblock-level SAD values can be easily generated by summing appropriate sub-macroblock SAD values.
As mentioned earlier, according to a preferred embodiment of the invention, macroblocks are retrieved and processed one macroblock at a time from left to right and row-by-row. In this embodiment, sub-macroblock SAD values corresponding to two rows of macroblocks (e.g., rows 402 and 403) are stored in a local memory of the video encoder. There is no need to store sub-macroblock SAD values for the entire frame. This is because, in the present embodiment, macroblock-level SAD values are generated as the sub-macroblock SAD values are generated. Once an entire row of macroblocks have been processed, the sub-macroblock SAD values for the immediately previous row would no longer be needed, and the memory can be reused to store sub-macroblock SAD values for the following row.
As an example, consider macroblock 406-6 in
It should also be noted that SAD values for macroblock of various sizes (e.g., 8×16) can be obtained by selectively summing the appropriate sub-macroblock SAD values. For example, SAD value for a 16×8 macroblock consisting of X4 and W5 would be obtained by summing the sub-macroblock SAD values SBX4 and SAW5. In the preferred embodiment where sub-macroblock SAD values are kept for two consecutive rows, macroblock values for some macroblocks may not be obtained, however.
According to an embodiment of the invention, sub-macroblock SAD generator 604 compares a each sub-macroblock of the current macroblock against each sub-macroblock of a reference sub-macroblock to generate a set of sub-macroblock SAD values 606, such as those depicted in
Macroblock SAD generator 608 then uses the sub-macroblock SAD values 606 to generate macroblock-level SAD values 610, which are stored in the local buffer memory. According to an embodiment of the invention, macroblock-level SAD values may be generated by selectively summing the appropriate SAD values, such as the sub-macroblock SAD values shown in
Various embodiments of the invention provide an advantageous method and system to access data for performing motion search. The system includes a sub-macroblock SAD generator for calculating SAD values at a sub-macroblock level, a temporary memory such as a buffer memory or cache for storing the sub-macroblock SAD values, a macroblock SAD generator for generating the macroblock-level SAD values by summing SAD values of immediately neighboring sub-macroblocks. The resulting macroblock-level SAD values may then be used for motion search. As discussed above, non-overlapping reference macroblocks are compared against the current macroblocks to produce the sub-macroblock level SAD values, which are temporarily stored and selectively combined to produce macroblock-level SAD values. In this way, access to the frame memory that stores the reference macroblocks and current macroblocks may be reduced without sacrificing the performance of the video encoder.
While the preferred embodiments of the invention have been illustrated and described, it will be clear that the invention is not limited to these embodiments only. Numerous modifications, changes, variations, substitutions and equivalents will be apparent to those skilled in the art without departing from the spirit and scope of the invention as described in the claims. For example, while there are several references to implementation as an FPGA, alternate implementations in Application Specific Integrated Circuits (ASICs) and other types of integrated circuit devices are possible. As another example, while Sum of Absolute Difference (SAD) values are used in the described embodiments of the invention, it will be apparent to those skilled in the art having the benefit of this disclosure that other criteria may be used for searching for the “best match” macroblock.
Furthermore, throughout this specification (including the claims if present), unless the context requires otherwise, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element or group of elements but not the exclusion of any other element or group of elements. The word “include,” or variations such as “includes” or “including,” will be understood to imply the inclusion of a stated element or group of elements but not the exclusion of any other element or group of elements. Claims that do not contain the terms “means for” and “step for” are not intended to be construed under 35 U.S.C. §112, paragraph 6.