The invention will now be described with reference to the accompanying drawings which are provided to illustrate various example embodiments of the invention. Throughout the description, similar reference names may be used to identify similar elements.
Reference frame 202b includes a search area 206 that is centered on the same position as that of block 204a in current frame 202a. Search area 206 includes a plurality of selected regions 208 including for example selected regions 208a and 208b. In an embodiment of the invention, reference frame 202b is a previously encoded frame and may occur before or after current frame 202a in display order. According to an embodiment of the invention, the match errors between a block (e.g., block 204a) and various selected regions 208 (e.g., 208a, 208b) are computed. In various embodiments of the invention, the match error is based on the Sum of Absolute Errors (SAE), defined as
where Cij is the current frame and Rij is the reference frames respectively. The selected region with the minimum match error, such as for example selected region 208b, is selected as the best match for performing motion estimation. In one embodiment of the invention, the regions 208a, 208b may be chosen according to a Directional-line Motion Estimation (DME) pattern and/or a localized full search.
Each of the plurality of directional lines 306 is separated from each other by approximately the same angle degree. The angle degree is empirically determined depending on the size of block 104. The size of block 104 is subject to motion estimation and the size of search area 206. In an embodiment of the invention, the angle degree between any two consecutive directional lines 306 is approximately 22.5°. Accordingly, the 22.5° angle degree results in sixteen directional lines. In another embodiment of the invention, directional lines 306 may originate from more than one pixel located at or near the center of search area 206.
The plurality of pixels 304a (uncolored), which lie between directional lines 306, are referred to as non-DME locations. The group of pixels 304b (which include an asterix) lying along directional lines 306 constitute a DME pattern in accordance with an embodiment of the invention, and may be used for computing a first set of match errors. Group of pixels 304b are hereinafter referred to as “search locations 304b.”
In an embodiment of the invention, a first set of match errors are calculated at the search locations 304b (and first search center 302). Match errors may be calculated using various different methods. In one embodiment of the invention, a match error may be calculated by determining a SAE between a current block, such as block 204a (shown in
In various embodiments of the invention, LFS window 502 is limited to a portion of search area 206. LFS positions 304c are depicted as circles with a dot. In various embodiments of the invention, LFS window 502 may be diamond-shaped, round-shaped, cross-diamond shaped and the like.
In an embodiment of the invention, a second set of match errors are calculated for each of the plurality of LFS positions 304c in LFS window 502. In one embodiment of the invention, a match error may be calculated by determining a SAE between the current block and a block of pixels in reference frame 202b defined by (e.g., encompassing) a LFS position 304c. The second set of match errors are compared against each other and against the match error at the second search center 402. The pixel location within LFS window 502 having the minimum match error overall is selected as a best match.
In an embodiment of the invention, the search range for LFS may be calculated adaptively according to the location of second search center 402. For example, if second search center 402 is located less than or equal to four pixels from first search center 302, the LFS range is one pixel, resulting in diamond-shaped LFS window with a three-pixel diagonal. If the position of second search center 402 is located more than four pixels but less than or equal to eight pixels from first search center 302, the LFS range is two pixels, resulting in a diamond-shaped LFS window with a five-pixel diagonal. If the position of second search center 402 is located more than eight pixels but less than or equal to twelve pixels from first search center 302, the LFS range is three pixels. If the position of second search center 402 is located more than twelve pixels but less than or equal to sixteen pixels, as depicted in
In various embodiments of the invention, the fast sub-pixel search is performed to further refine the estimation of a motion vector. The fast sub-pixel motion search process is used to refine the block by generated interpolation. In an embodiment of the invention, a fast sub-pixel search is performed after the Localized Full Search is performed. The fast sub-pixel search is performed to further refine the position of the third search center by considering the information of the half-pixel and quarter-pixel positions. All positions on both half-pixel positions 604a and 604b and quarter-pixel positions 606a and 606c interpolated blocks are centered by full-pixel position 602a and 602b and have the shortest distances. The fast sub-pixel algorithm reduces the memory access and yields high accuracy motion search results.
It should be noted that embodiments of the present invention may be practiced without the fast sub-pixel search algorithm. Various methods of sub-pixel search, which may be apparent to those of ordinary skill in the art having benefit of the present disclosure, may be used to refine the position of the second search center.
At 706, a Directional-line Motion Estimation (DME) pattern for a search region in the reference frame is defined. In an embodiment of the invention, the DME pattern includes selected search locations, such as search locations 304b, along or near a plurality of directional lines, such as directional lines 306, originating from or near the search center.
At 708, the current block, a first set of match errors at some or all of the selected search locations are computed. As discussed above, a match error may be computed by calculating the SAE between a block of pixels such as for example block 204a (shown in
In an embodiment of the invention, the comparison criterion is based on the Sum of All Errors (SAE). In various embodiments of the invention, other comparison criteria may be used. Furthermore, in various embodiments of the invention, a match error may not be computed for each and every one of the search locations within the DME pattern. For example, the comparison may stop if the match error becomes smaller than a predetermined threshold.
At 710, a Localized Full Search (LFS) window, such as LFS window 502, is defined. In an embodiment of the invention, the LFS window is defined by as a portion of the search area encompassing the location of the second search center. In one embodiment of the invention, the size of the LFS window may be fixed. In various embodiments of the invention, the search range for LFS window is calculated adaptively according to the location of the second search center relative to the first search center. For example, if the estimated second search center is located less than or equal to four pixels from the first search center, the localized full search range may be one pixel. If the estimated second search center is located more than four pixels but less than or equal to eight pixels from the first search center, the localized full search range may be two pixels, and so on.
At 712, a second set of match errors are computed at some or all of the search locations in the LFS window, such as plurality of LFS positions 304c. A search location with the minimum second match error among all the search locations is selected as the best match. In an embodiment of the invention, a fast sub-pixel search may be carried out using the best match to further refine its location. A motion vector for the current block may be produced from the best match search location.
It may be apparent to a person skilled in the art that if the angle degree is smaller the number of directional lines is higher. Accordingly, if the angle degree is higher the number of directional lines is lower. In various embodiments of the invention the angle degree may be changed based on the degree of compression required. In an embodiment of the invention, if a high degree of compression is required then the angle degree may be smaller. Similarly, if a low degree of compression is required then the angle degree may be higher.
At 806, pixels located on or near the directional lines are selected as part of the DME pattern. The pixels not located on or near the directional lines, such as pixels 304a are not considered as part of the DME pattern. Furthermore, some pixels located on or near the directional lines may not be part of the DME pattern. In an embodiment of the invention, not all pixels on the directional lines are part of the DME pattern. For example, in the DME pattern illustrated in
In an embodiment of the invention, first search center estimator 908 generates first search center 302 for search area 206. DME pattern generator 910 generates a DME pattern, and first match error calculator 912 calculates the match errors at some or all of the search locations 304b within the DME pattern.
DME pattern generator 910 may include an angle degree calculator that calculates the inter-directional line angle, and a sub-module that generates the directional lines and identifies the pixels that lie on or near the directional lines. A second search center such as second search center 402 with the least match error among search locations 304b is selected by DME module 904 and is provided to LFS Module 906.
LFS window generator 914 generates an LFS window such as LFS window 502 using second search center 402. Subsequently, second match error calculator 916 calculates the match errors at some or all of the search locations within LFS window 502, such as the plurality of LFS positions 304c. The search location with the minimum match error overall is selected as a best match. In various embodiments of the invention, the best match may be provided to a fast sub-pixel search module to further refine the best match location. Other components of the system include a module for providing a motion vector for the current block based on the best match location.
The invention provides a method, system and computer program product for motion estimation. The method, system and computer program product combine the steps of performing a low intensity search, such as DME, to identify a general vicinity of a best match. Thereafter, a high intensity search, such as LFS, is performed to refine the position of the best match. A sub-pixel search may be used to further refining the position of the best match. Therefore, the method and system provides an excellent mix of high computational efficiency and motion estimation accuracy.
The method of the invention may be embodied by electronic device(s) that perform video encoding, such as mobile telephones, surveillance cameras, handheld video recorders or personal digital assistant (PDA) devices. The computer program product of the invention is executable on a computer system for causing the computer system to perform a method of video encoding including a motion estimation method of the present invention. The computer system includes a microprocessor, an input device, a display unit and an interface to the Internet. The microprocessor is connected to a communication bus. The computer also includes a memory. The memory may include Random Access Memory (RAM) and Read Only Memory (ROM). The computer system further comprises a storage device. The storage device can be a hard disk drive or a removable storage drive such as a floppy disk drive, optical disk drive, etc. The storage device can also be other similar means for loading computer programs or other instructions into the computer system. The computer system also includes a communication unit. The communication unit allows the computer to connect to other databases and the Internet through an I/O interface. The communication unit allows the transfer as well as reception of data from other databases. The communication unit may include a modem, an Ethernet card, or any similar device which enables the computer system to connect to databases and networks such as LAN, MAN, WAN and the Internet. The computer system facilitates inputs from a user through input device, accessible to the system through I/O interface.
The computer system executes a set of instructions that are stored in one or more storage elements, in order to process input data. The set of instructions may be a program instruction means. The storage elements may also hold data or other information as desired. The storage element may be in the form of an information source or a physical memory element present in the processing machine.
The set of instructions may include various commands that instruct the processing machine to perform specific tasks such as the steps that constitute the method of the present invention. The set of instructions may be in the form of a software program. Further, the software may be in the form of a collection of separate programs, a program module with a larger program or a portion of a program module, as in the present invention. The software may also include modular programming in the form of object-oriented programming. The processing of input data by the processing machine may be in response to user commands, results of previous processing or a request made by another processing machine. Computer program mechanisms may include instructions executable by digital signal processors embedded within various video encoding systems.
While the preferred embodiments of the invention have been illustrated and described, it will be clear that the invention is not limited to these embodiments only. Numerous modifications, changes, variations, substitutions and equivalents will be apparent to those skilled in the art without departing from the spirit and scope of the invention as described in the claims.
Furthermore, throughout this specification (including the claims if present), unless the context requires otherwise, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element or group of elements but not the exclusion of any other element or group of elements. The word “include,” or variations such as “includes” or “including,” will be understood to imply the inclusion of a stated element or group of elements but not the exclusion of any other element or group of elements. Claims that do not contain the terms “means for” and “step for” are not intended to be construed under 35 U.S.C. §112, paragraph 6.