Aspects of embodiments of the invention relate to the field of video graphics; and more specifically, to motion estimation in video graphics.
Video coding based on Motion Estimation and Motion Compensation may be used in various Video Coding standards such as variants of the Motion Picture Experts Group (MPEG). Two mechanisms may be used to perform the motion estimation. Full search algorithms may perform a complete search in a region of interest to obtain best motion estimation match amongst all of the blocks of pixels in the region of interest. Fast search algorithms may perform techniques/search patterns to reduce the scope of the block of pixel data analyzed and try to obtain an approximate best motion estimation match amongst the selected blocks of pixels analyzed. A disadvantage of the fast search algorithm approach may be that the best match block of pixel data may not really be close to being the best motion estimation match.
The drawings refer to embodiments of the invention in which:
a-5c illustrate diagrams of embodiments of various search patterns that emanate from the central region of the search window and a raster order search.
While the invention is subject to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and will herein be described in detail. The embodiments of the invention should be understood to not be limited to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.
In the following description, numerous specific details are set forth, such as examples of specific data signals, named components, connections, number of blocks of pixel data, etc., in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one of ordinary skill in the art that the embodiments of the invention may be practiced without these specific details. However, the specific numeric reference should not be interpreted as a literal sequential order but rather interpreted that the first video frame is different than a second video frame. Thus, the specific details set forth are merely exemplary. The specific details may be varied from and still be contemplated to be within the spirit and scope of the present invention.
In general, various methods, systems, and apparatuses are described that determine motion vectors for motion estimation when video encoding. An N×N block of pixel data may be selected from a video frame, such as the current video frame, to be a current block of pixel data. The pixel light value data of the current block from the current video frame may be about to be encoded into equivalent bit representations. A search window composed of a plurality of blocks of pixel data from a reference video frame, such as a previous frame, may be created with the motion vector coordinates of the current block of pixel data set as a central region, i.e. center, of that search window. The pixel data of the current block may be matched with a plurality of blocks of pixel data in the search window from the reference video frame. A matching algorithm operation, such as a Sum of Absolute Difference operation, may be performed on each block of pixel data in the search window using a full search algorithm in a raster order to determine a difference, such as an absolute value, between that block of pixel data in the search window and the current block of pixel data. A first motion vector (X, Y and possibly Z) corresponding to first block of pixel data that is a closest match to the current block of pixel data amongst the blocks of pixel data in the search window may be stored. Further matching algorithm operations may be performed by reusing two or more sets of pixel data from the plurality of blocks of pixel data from previous motion vector calculations on the remaining blocks of pixel data without requesting a fetch operation to obtain the two or more sets of pixel data because they are already loaded in the engine performing the matching algorithm operation. A block of pixel data that is the closest match in pixel data and with motion vectors least in value from the current block of pixel data may be detected with a search pattern emanating from the central region of the search window, such as spiral search pattern. If two or more blocks of pixel data from the reference blocks of pixel data tie for being the closest match in pixel data, then the reference block of pixel data with motion vectors with the least radial distance to the central region of the search window may be stored.
Reference blocks of pixel data for a current block of pixel data 104 in a current video frame 102 may come from a reference video frame 106. The reference video frame 106 may be a previous video frame. The current block of pixel data 104 may be the block of pixel data from the current frame 102 that is about to be encoded. In Moving Pictures Experts Group (MPEG) technology, this current block of pixel data 104 may be called a macro-block and usually the size of this macro-block is 16×16. A search window 108 may be a region of pixel data from the previous frame 106 that the should be searched to determine the best match of pixel data for motion estimation.
In operation, an imaging processing engine that includes a full search motion estimation engine may attempt to match the current block of pixel data 104 in the current frame 102 with a block of pixel data from in the reference frame 106. The imaging processing engine selects the current N×N block of pixel data 104 about to be encoded from the current video frame 102. The light values contained in the pixel data of the current block as well as its motion vectors are about to be encoded in a video coding technique into equivalent bit representations. For example, if the motion vector represents a coordinate of row 2 column 2, then the bit representation would have to be four digits long (binary 10, 10). If the motion vector indicates row 100 column 100, then the binary equivalent bit motion vector representation would need to be 14 bits long. A similar binary encoding process may occur for the light values, etc. contained in the pixel data. Thus, sending a few bits of code that correlate the current location of this block of pixel data to a block of pixel data already transmitted and decoded can result in a lower overall bit rate.
The imaging processing engine tries to match the current block of pixel data 104 with a block of pixel data, for example, a first reference block 110, in the reference video frame 106 that has the closest pixel data and motion vectors. The image processing engine creates the search window 108 composed of a plurality of blocks of pixel data from the reference video frame 106. The reference video frame has motion vector coordinates associated with that search window 108. The motion vector coordinates of the current block of pixel data 104 from the current frame 102 can be set as the central region 112 (i.e. center 0, 0) of the search window 108 in the reference video frame 106. In this example, the search window 108 is composed of a 2×2 reference block of pixel data surrounding the central region 112 of the search window 108. Accordingly, the motion vector coordinates (2, 2) of the current block of pixel data 104 correspond to the motion vector coordinates of the N×N block of pixel in the central region 112 (0, 0) of the search window.
The raster order search starts at the top left corner and examines a block of N×N, i.e. 8×8 in this example case, of pixel data. The initial block of pixel data examined, such as a second N×N reference block 214 indicated by a hashed marking, are columns −8 through −1 extending down from rows 9 through 2. The full search engine determines a difference, such as a Sum of the Absolute Difference, between that N×N reference block of pixel data in the search window 208 and the current block of pixel data. After that determination, the full search engine now shifts one column to the right for each clock cycle and analyzes the next new reference block of pixel data. Ten clock cycles later and ten difference determinations later, the full search engine determines a difference between a third N×N reference block of pixel data 216, and the current block of pixel data. During these ten clock cycles, the full search engine has merely had to fetch the pixel data for each new column of pixel data under analysis while being able to preserve and reuse the pixel data from previous columns in the calculation.
An example image processing engine that includes a full search engine motion estimation is shown in
The image processing engine 318 utilizes the full search motion estimation engine calculation pipeline 332 to perform a full search in a raster order on the search window as well as to perform a matching algorithm operation, such as a Sum of Absolute Difference operation, on each block of pixel data in the search window. The full search motion estimation engine calculation pipeline 332 calculates a difference, such as an absolute value, between a particular reference block of pixel data in the search window and the current block of pixel data. For example in a simplified calculation, the light values of each pixel data in the current block of pixel data may be subtracted from the value of pixel data in reference block of pixel data. The absolute value of that difference may be determined. The sum of all the absolute differences for all of the points of pixel data in the N×N block of pixel data may compute to a numeric value of, for example 65. The smaller sum of absolute difference value between the reference block and current block represents a better match. Thus, the minimum sum of absolute difference value between those two blocks might be zero. The fifth N×N register 328 stores the sum of absolute difference value between the current block and this reference block of pixel data under analysis in the reference video frame.
The use of the N×N shift register 320 in the full search motion estimation engine calculation pipeline 332 allows two or more sets of reference pixel data to be reused in subsequent pixel data calculations on the remaining blocks of pixel data. The search logic 330 loads each individual column of pixel data into the N×N shift register 320. The N×N shift register 320 may be an 8×8 shift register. If so, eight clock cycles occur and the search logic 330 loads the 8×8 shift register up with eight columns of pixel data from the reference video frame. During the initial loading of the pixel data into the N×N shift register 320, eight clock cycles generally occur to load into the eight columns of new data. However, subsequent operations to determine the minimum sum of absolute difference value between the current block of data and the pixel blocks under analysis in the reference video frame merely use one clock cycle because merely one new column of reference blocks of pixel data is being shifted into the full search engine and the other seven columns of pixel data are already loaded into the engine.
Referring to
For example, the matching operation occurs on the third reference block of pixel data 216 on that tenth clock cycle. On the next clock cycle due to the raster ordering, the next block of pixel data in the reference video frame to be analyzed still starts in column −8, but now starts at row lower at row 8 and extends to row 1. Thus, the blocks of pixel data to be analyzed in the next cycle of the search engine will extend from column −8 to column −1 and from row 8 through row 1. The search logic loads the full search engine with eight new columns of pixel data and the matching operation begins again.
There are many matching criteria algorithms for matching a block of pixel data in a video frame, usually the current frame to be encoded, with a block of pixel data in the search window in reference frame, usually a previous frame. The full search motion estimation engine may use a Sum-of-Absolute-Difference (SAD) matching criteria also known as Mean Absolute Difference (MAD) matching criteria because of its low computational requirement without requiring any multiplication or division. The full search engine may also use matching criteria such as Mean Square Error (MSE), Normalized Cross-Correlation Function, Minimized Maximum Error (MiniMax), and other similar methods.
Referring to
The value of the overall absolute difference value for that particular reference block of 8×8 pixel data is then sent to a post processing stage 352 of the image processing engine 318. The tracking logic 350 compares the sum of absolute difference value from that particular reference block of pixel data and the currently stored sum of absolute difference value from previously analyzed blocks of pixel data in the tracking register 342. If the sum of absolute difference value of the new reference block of pixel data is smaller i.e. closer to the value of the current video block, then that sum of absolute difference value is stored in the tracking register 342 along with the coordinates of the motion vectors from the new reference block of pixel data that created that sum of absolute difference value. In an embodiment, the image processing engine 318 looks for a minimum SAD value and captures the motion vector corresponding to minimum SAD into the tracking register 342. The image processing engine 318 detects the minimum value difference amongst all of the reference blocks of pixel data in the search window and determines what is the best matched reference block of pixel data to the current block of pixel data. The full search motion engine calculation pipeline 332 sends the motion vectors, i.e. the X, Y coordinates, corresponding to that best reference block of pixel data with the minimum absolute value into the tracking register 342. The post processing stage 318 may also use more then one register to store the current best matched reference block of pixel data and associated motion vector coordinates of that reference block.
The post processing stage also include a logic block 334 including a comparator to determine a minimum radial distance of a reference block of pixel data to the motion vector coordinates of the current block of pixel data. The logic block 334 may also implement a search pattern algorithm on the reference blocks of pixel data in a search window. The search pattern algorithm emanates from the central region of the search window. The combination of the image processing engine 318 using a raster order full search motion estimation engine and a search pattern algorithm that emanates from the central region of the search window may yield a very high throughput performance (clocks per motion vector) and better coding quality by providing smaller motion vectors resulting in a lower bit rate and a better quality (signal to noise ratios). The image processing engine 318 may be implemented for video encoding streams of real time video.
In an embodiment, the image processing engine 318 may support one SAD operation per clock throughput on an 8×8 current block of pixel data with an overall throughput of this block in the range of 30+GOPs (Giga-operations per second) running at 266 MHz.
The image processing engine 318 detects and captures the possible smallest motion vectors, i.e., coordinates, with the search pattern emanating from the central region of the search window, such as a spiral search pattern. The central region of the search window being the motion vector coordinate associated with the current block of pixel data from the current video frame. Therefore, the post processing stage 352 uses, for example, a spiral search pattern to detect the motion vectors, i.e., coordinates from the central region of the search window by determining what the coordinates of that reference block of data are in terms of radial distance from the central region of the search window.
If there are two or more reference blocks of pixel data with substantially identical sum of absolute difference values that have been determined to be the best matched block of pixel data, then the logic 334 determines which block of pixel data has the least radial distance to the central region of the search window. In an embodiment, additional registers may track the radial distance of each reference block of pixel data with the lowest SAD value.
a-5c illustrate diagrams of embodiments of various search patterns that emanate from the central region of the search window and a raster order search.
b illustrates an embodiment of a spiral pattern emanating from the central region of the search window 508b. The spiral pattern starts at the central region 512b of the search window 508b. The spiral search proceeds outward in either a clockwise or counter clockwise rotation. The spiral search may form small rectangles as the search proceeds outwardly.
c illustrates an embodiment of a diamond pattern emanating from the central region of the search window 508c. The more elaborate diamond shaped search pattern may also be used when determining motion vector coordinates of which reference block of pixel data is the closest to the central region 512c of the search window 508c. The spiral pattern starts at the central region 512c of the search window 508c.
The spiral and diamond search patterns identify reference block of pixel data that result in smaller motion vectors. These smaller motion vectors in turn result in lower bit rate and higher Signal to Noise Ratio (SNR).
In an embodiment, the post-processing stage receives a SAD value and corresponding motion vector. The post-processing stage uses the SAD value to update the motion vector and SAD minimum value stored in the tracking register as shown in the pseudo code below.
An embodiment of a Spiral search pattern may be implemented with the following changes in the pseudo code.
An embodiment of a Diamond search pattern may be implemented with the following changes in the pseudo code.
The algorithm may be altered to account for there being 2 or more Motion Vectors with identical SAD values by having the algorithm choosing the Motion Vector closer to the central region (0 MV) through defining Radial_Distance (r) of a motion vector (x, y) as:
Radial_Distance<=abs(x)+abs(y);
The motion vector determined by the raster order full search algorithm by itself may not always be shortest vector. Two or more reference blocks of pixel data may calculate to approximately the same minimum difference value to the current block of pixel data. The diamond search pattern, spiral search pattern, or similar pattern may be applied by the post processing stage may be used to select the shortest motion vector. The combination of using a raster order full search motion estimation engine and a search pattern algorithm that emanates from the central region of the search window reduces a cost associated with coding of motion vector and coding of error block (quantized Discreet Cosine Transform of error block), to increase the bit rates or SNR from this combined approach.
Computer system 600 further comprises a volatile memory such as random access memory (RAM, EDO-RAM, SD RAM, etc), or other dynamic storage device 604 (referred to as main memory) coupled to bus 611 for storing information and instructions to be executed by processor 612. Main memory 604 also may be used for storing temporary variables or other intermediate information during execution of instructions by processor 612.
The image processing engine 618 may contain a post processing stage 652 that has a register 642 and logic 650. A volatile memory 640, such as a cache, may coupled to the image processing engine 618 to store the blocks of pixel data.
Computer system 600 also comprises a read only memory (ROM) and/or other static storage device 606 coupled to bus 611 for storing static information and instructions for processor 612. The mass storage memory 606 may be a magnetic disk or optical disk and its corresponding disk drive. Mass storage memory 606 is coupled to bus 611 for storing information and instructions.
Computer system 600 may further be coupled to a display device 621, keyboard 622, a cursor control device 623, and a hard copy device 624 such as a printer. Another device that may be coupled to bus 611 is a wired/wireless communication capability 625 to communication to a phone.
In an embodiment, the full search logic and search pattern logic may be part of a motion estimation unit in a programmable/multiprocessor architecture such as a Gila-processor™ by Intel Corporation of Santa Clara California. The motion estimation unit may be part of a larger and/or more complex image signal processor or processing element. For instance,
Moreover, signal processor 200 may be coupled to one or more similar signal processors, where each signal processor may also be coupled to one or more memory and/or other signal processors (e.g., such as in a “cluster”). Also, each cluster may be coupled to one/or more other clusters. For instance signal processor 200 may be connected together in a cluster of eight or nine digital signal processors in a mesh configuration using Quad-ports. The quad-ports can be configured (statically) to connect various ISP's to other ISP's or to double data rate (DDR) random access memory (RAM), such as a “main memory” using direct memory access (DMA) channels. For example, signal processor 200 may be or may be part of programmable multi-instruction multiple data stream (MIMD) digital image processing device. More particularly, signal processor 200, whether coupled or not coupled to another signal processor, can be used for image processing related to a copier, a scanner, a printer, or other image processing device including to process a raster image, a Moving Picture Experts Group (MPEG) image, or other digital image data.
In addition, signal processor 200 can use several PE's connected together through CCRs 210 (e.g., such as where CCRs 210 is a register file switch) to provide a fast and efficient interconnection mechanism and to maximize performance for data-driven applications by mapping individual threads to PE's in such a way as to minimize communication overhead. Moreover, a programming model of the ISP's can be implemented is such that each PE implements a part of a data processing algorithm and data flows from one PE to another and from one ISP to another until the data is completely processed.
Moreover, in embodiments, a PE may be one of various types of processing elements, digital signal processors, comparison units, video and/or image signal processors for processing digital data. Similarly, a PE may be an input from one or more other ISP's, an output to one or more other ISP's, a hardware accelerator (HWA), a MEU (e.g., such as MEU 300), memory controller, and/or a memory command handler (MCH). For example, one of the PE's (e.g., PEO 220) may be an input from another ISP, one of the PE's (e.g., PE1221) may be an output to other ISP, from one to three of the PEs (e.g., PE4, PE5 and PE6) may be configured as HWAs, at least one of the PEs (e.g., PE4) may be configured as a MEU (e.g., such as a HWA MEU, such as MEU 300), and one of the PEs (e.g., PE7227) may be configured as a MCH functioning as a special HWA to manage the data flow for the other PE's in and out of a local memory. Thus, for example, an embodiment may include a cluster of PEs interconnected through CCRs 210, where CCRs 210 is a shared memory core of up to sixteen CCRs and each CCR is coupled to and mapped to the local address space of each PE.
According to embodiments, memory 270 may be a static RAM (SRAM) type memory, or memory 270 may be a type of memory other than SRAM. Memory 270 may be a local signal processor memory used for storing portions of images and/or for storing data temporarily, such as sum of absolute differences (SAD) values between pixels of a current data image and a prior data image. Specifically, memory 270 may provide the function of search memory 322, SAD memory 352, and/or block 870 as described above. Thus, memory 270 may SAD memory 352 by being an SRAM MCH memory, similar to a cache memory, used to temporarily store portions of images or complete image data that may originate from a DDR and may be staged in MCH 227.
Within signal processor 200, or a cluster of such signal processors (e.g., ISPs), Input PE and Output PE may be the gateways to the rest of the ISPs and can also be programmed to some level of processing. Other PEs within an ISP may also provide special processing capabilities. For instance, PE's acting as MEU's (e.g., such as MEU 300) of signal processor 200 (e.g. such as PE 4 and/or other PE's as shown in
In an embodiment, a full search in raster order occurs during the process of the matching operation. A search, such as a spiral search, is then employed in the post processing stage on the results of the full search in raster order.
In one embodiment, the software used to facilitate the search algorithms and engines can be embodied onto a machine-readable medium. A machine-readable medium includes any mechanism that provides (i.e., stores and/or transmits) information in a form accessible by a machine (e.g., a computer, network device, personal digital assistant, manufacturing tool, any device with a set of one or more processors, etc.). For example, a machine-readable medium includes recordable/non-recordable media (e.g., read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; etc.), as well as electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.); etc.
While some specific embodiments of the invention have been shown the invention is not to be limited to these embodiments. For example, most functions performed by electronic hardware components may be duplicated by software emulation. The functions may be combinations of software and hardware. The electronic components may be replaced with similar components that perform a similar function. N×N can mean virtually any number such as 16×16. The N×N components may be replace by N×M components such as a 16×32 component. The invention is to be understood as not limited by the specific embodiments described herein, but only by scope of the appended claims.