The invention is related to the field of video compression.
A temporal prediction filter is used in a video compression process to predict a target image from a set of previously decoded reference images. The temporal prediction process is effective at removing a significant amount of temporal redundancy, which generally results in a higher coding efficiency. The prediction process uses a set of motion vectors and a filter that operates on the motion vectors to predict the target image.
For example, the prediction method divides a reference image 110 into multiple fixed-size blocks 120, as shown in
Conventional temporal filters, which use a single motion vector to predict the location of an associated block, or rely on a filter defined for a regular motion vector pattern, need a regular distribution of motion vectors to perform temporal prediction. Therefore, they are unable to adapt the prediction process to an irregular pattern of motion vectors. There is a need for a filter that can locally adapt its tap and filter coefficients to the variations of an irregular pattern of motion vectors, and also has the flexibility to adapt to object boundaries and spatial textures. There is also a need for a efficient and effective motion estimation procedure that can use the temporal filter to estimate each motion vector value by taking into account the effects of neighboring motion vectors.
A method includes receiving an irregular pattern of motion vectors for a target image, estimating an initial value for each of the motion vectors, using the motion vectors to generate a tap structure for an adaptive temporal prediction filter, and using the tap structure to re-estimate the value of each motion vector.
The present invention is illustrated by way of example and may be better understood by referring to the following description in conjunction with the accompanying drawings, in which:
In the following description, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration a specific embodiment in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the present invention. For example, skilled artisans will understand that the terms field or frame or image that are used to describe the various embodiments are generally interchangeable as used with reference to video data.
An adaptive temporal prediction filter is applied to an irregular pattern of motion vectors to produce a prediction of a target image. In one embodiment, each estimate of a motion vector is determined relative to a tap structure of the adaptive temporal filter for the irregular pattern. The estimate is made in two stages. In the first stage, an initial estimation of the motion vector, independent of the filter's tap structure, is determined. In the second stage, the tap structure is used during a re-estimation of the value of each motion vector. The tap structure that is applied to a particular motion vector is generated from a local pattern of neighboring motion vectors, to include their effects in the estimate of the particular motion vector. In some embodiments, an adaptive pixel influence area and an adaptive search window are used to reduce the complexity of performing the re-estimation process. Re-estimating the values of the motion vectors can be performed using a non-adaptive or an adaptive re-estimation procedure. A non-adaptive slow re-estimation involves a full search for a value of a particular motion vector within a search window having a fixed range. The value for the motion vector that results in the largest reduction in the prediction error is selected as the new value.
An example of an adaptive temporal filtering procedure is shown in
Another example of an irregular pattern of motion vectors is shown in
Returning to
As shown in
In
where {fi} is a set of filter coefficients, and x+vi is the motion compensated pixel when motion vector vi is applied to pixel x. The support or tap of the filter is defined by the set S(x). The tap support S(x) and the filter coefficients {fi} are, in general, functions of the pixel position x and its neighboring motion vectors. That is, the filter coefficients can change for each pixel, because the distribution of motion vectors changes throughout the image. Hence, the filter locally adapts to the changing motion vector pattern.
At 240, the adaptive temporal prediction filter is applied to the target image to perform temporal prediction for the target image. The filter is applied in the time domain to generate a prediction result for the target image given the set of motion vector values and sampling pattern. The filter uses a filter tap and filter coefficients that are defined by an area of overlapping regions to capture the relevance of motion vectors neighboring a pixel to be predicted. An example of this class of prediction filters is an area of influence filter disclosed in co-pending U.S. patent application Ser. No. 11/229,284 entitled ADAPTIVE AREA OF INFLUENCE FILTER by Marco Paniconi et al., concurrently filed with the present application and incorporated herein by reference; another example is a triangulation filter for motion compensated signals.
An example of applying the filter to generate the temporal prediction is shown in
In one embodiment, the prediction filter uses the tap structure and the filter weights to generate a prediction according to the following equation:
Prediction=I1*f1+I2*f2+I3*f3+I4*f4+I5*f5
where the filter tap, which is defined by the local motion vectors, and the filter coefficients {fi}, are determined when the filter is generated.
Returning to
However, the influence of motion vectors of neighboring AOI cells extends into portions of the particular AOI cell. For example, as shown in
The initial value of motion vector 532 was determined without considering the effects of the neighboring motion vectors. Therefore, to predict a value of a pixel in the associated AOI cell, the value of the motion vector 532 is re-estimated, to account for the effects of neighboring motion vectors within the associated AOI cell. The re-estimated value of the motion vector is used by the adaptive temporal prediction filter to predict the values of pixels in the target image. As a result of the re-estimation, the prediction error for all pixels in motion vector 532's area of influence cell is reduced.
For example, block 540 of
In order to re-estimate a particular motion vector, the overlapping influence of the motion vector with pixels in neighboring AOI cells, as shown for example in block 530 of
AiT={xj|fij>T} (2)
Thus, for a particular motion vector i, the area of influence is each pixel {xj} with a filter coefficient greater than a threshold T. The total pixel area of influence for motion vector i is defined as AiT=0. The dominant area of influence for the motion vector is defined using a larger threshold, such as AiT˜0.5.
Another example of an overlapping area of influence uses the filter coefficients and a prediction error. In this example, the overlapping area of influence is:
AiT={xj|fije(xj)>T} (3)
where the magnitude of the prediction error signal for target pixel x is:
e(x)=|Itarget(x)−Ipred(x)|
Thus, for a particular motion vector i, its area of influence is each pixel {xj} having a product of its filter coefficient and prediction error magnitude greater than a threshold T. The total pixel area of influence for motion vector i is defined as AiT=0. The dominant area of influence of the motion vector is defined using a larger threshold, such as AiT˜0.5.
The average prediction error due to presence of motion vector i is determined by summing the errors of all pixels in the motion vector's area of influence, which is:
where T controls the size of the motion vector's influence area as discussed above.
An example of a re-estimation procedure is shown in
At 630, a range for the search window, denoted as {SW1,SW2}; where SW1 is the search window for the x-component of the motion vector, and SW2 for the y-component of the motion vector, is determined. In one embodiment, the search window has a fixed range which is set as an input parameter by the system. For example, the range of the search window can be equal to the area of influence.
At 640, the x-component (vix) and y-component (viy) of the motion vector are varied within the fixed range of the search window:
vix−SW1,vix−SW1+1, . . . ,vix+SW1−1,vix+SW1
viy−SW2,viy−SW2+1, . . . ,viy+SW2−1,viy+SW2
At 650, the value for the selected motion vector that minimizes the prediction error eiT=0 for the pixels in the set AiT=0 is determined. At 660, the value of the motion vector is updated with the value determined at 650. At 670, if another motion vector needs to be re-estimated, the process returns to 610. Otherwise, the process ends at 680.
In some embodiments, the method of
For regions of the image having a high density of motion vectors, the area of influence for a particular motion vector is generally small, and the motion is usually more incoherent. For these regions, the value of each motion vector can be re-estimated using a search window that covers most, if not all, of the associated area of influence without a significant increase in the computational complexity of performing the re-estimation process.
The adaptive aspects can be included in the re-estimation process of
In another embodiment, at 615 the process calculates an adaptive area of influence using spatial activity. The spatial activity is defined around a compensated pixel that is identified in a reference image using the initial value of the motion vector. Pixels that map to low spatial activity regions within the reference image generally have a smaller impact on the value of the motion vector. High spatial activity regions dominate the selection of the value for the motion vector. In this embodiment, a local spatial activity value is computed for each pixel in the area of influence AiT. The area of influence, and hence the complexity of performing the re-estimation, of each motion vector can be reduced by using pixels having a local spatial activity value above a threshold, Csa.
The adaptive process, at 630, can adapt the range of the search window according to an initial prediction error, instead of using the entire area of influence as the search window. For example, a motion vector with a small prediction error is usually in a region of an image having coherent motion and sparse motion vector density. For this type of motion vector, an extensive search window is unnecessary, because the initial value for the motion vector is already close to the optimal value. Thus, the size of the search window can be reduced by a factor of α.
Alternatively, at 630 the process can adapt the search window according to a magnitude of the initial motion vector. For motion vectors whose initial value is very small (e.g., |{right arrow over (v)}|<M), where M is a threshold magnitude, the re-estimated value of the motion vector is expected to also be small (i.e., less than M). Therefore, the search window can be reduced by a factor of β with little decrease in performance of the prediction.
In some embodiments, the adaptive process can perform a coarse-fine search for a re-estimated value of a motion vector with a large area of influence. For example, a motion vector with a large area of influence usually corresponds to a region of an image having a large amount of coherent motion. For this type of motion vector, a two-step search procedure can be performed at 640. First, the value for the motion vector is coarsely varied within the search window using a larger value of T. Then, the value of the motion vector is finely varied using a smaller T. This coarse-fine search procedure can be applied when the size of the area of influence is larger than a threshold value C2.
In some embodiments, the re-estimation procedure can be used with multiple reference images. For multiple reference images, the motion vector has, in addition to a motion vector value, a mode map, as shown in
In this example, referring to 240 of
Similarly, there are two sets of prediction errors for each motion vector, {ei,0, ei,1}, which are obtained from eqs. (3) and (4) for pi={0,1}, respectively. The filter for predicting the target pixel shown in
Ipred=f1(0.5*I1,0+0.5I1,1)+f2I2,0+f3I3,1+f4I4,0+f5I5,0
When multiple reference images are used, re-estimating the motion vector values includes estimating the mode map relative to the temporal prediction filter, and varying the values of two motion vectors. The value of one motion vector is kept frozen, while the optimal value of the other motion vector and the mode map are determined. This process is repeated to vary the previously frozen value, and freeze the previously varied value. The selection of which value to vary first is determined using the prediction errors of the motion vectors. The motion vector with larger prediction error is varied first. In another embodiment, the two sets of motion vectors, and the mode map, are optimized simultaneously. Alternatively, the re-estimation process can proceed in multiple stages (instead of only two), where the process alternates between the two sets of motion vectors, with a correspondingly reduced search window to maintain the same complexity.
An example of a method for re-estimating motion vector values using two reference images is shown in
In one embodiment, the adaptive temporal prediction filter is used by a video coding system for encoding an image (or frame, or field) of video data, as shown in
At 940, a temporal prediction filtering process is applied to the irregular motion sampling pattern. This adaptive filtering process uses the motion vectors, irregular sampling pattern, and reference images to generate a prediction of the target image. At 950, the motion vector values are coded and sent to the decoder. At 960, a residual is generated, which is the actual target data of the target image minus the prediction error from the adaptive filtering process. At 970, the residual is coded and, at 980 is sent to the decoder.
In another embodiment, the adaptive temporal prediction filter is used in decoding a image (or frame, or image) of video data, as shown in
While the invention is described in terms of embodiments in a specific system environment, those of ordinary skill in the art will recognize that the invention can be practiced, with modification, in other and different hardware and software environments within the spirit and scope of the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
4922341 | Strobach | May 1990 | A |
5047850 | Ishii et al. | Sep 1991 | A |
5654771 | Tekalp | Aug 1997 | A |
5872866 | Strongin et al. | Feb 1999 | A |
5974188 | Benthal | Oct 1999 | A |
6178205 | Cheung et al. | Jan 2001 | B1 |
6208692 | Song et al. | Mar 2001 | B1 |
6212235 | Nieweglowski et al. | Apr 2001 | B1 |
6466624 | Fogg | Oct 2002 | B1 |
6480615 | Sun et al. | Nov 2002 | B1 |
6590934 | Kim | Jul 2003 | B1 |
6591015 | Yasunari et al. | Jul 2003 | B1 |
6608865 | Itoh | Aug 2003 | B1 |
6690729 | Hayashi | Feb 2004 | B2 |
6754269 | Yamaguchi et al. | Jun 2004 | B1 |
6765965 | Hanami et al. | Jul 2004 | B1 |
6782054 | Bellers | Aug 2004 | B2 |
6864994 | Harrington | Mar 2005 | B1 |
7110455 | Wu et al. | Sep 2006 | B2 |
20040057517 | Wells | Mar 2004 | A1 |
20040062307 | Hallapuro et al. | Apr 2004 | A1 |
20040131267 | Adiletta et al. | Jul 2004 | A1 |
20040233991 | Sugimoto et al. | Nov 2004 | A1 |
20050100092 | Sekiguchi et al. | May 2005 | A1 |
20050135483 | Nair | Jun 2005 | A1 |
Number | Date | Country |
---|---|---|
WO 0016563 | Mar 2000 | WO |
WO 0178402 | Oct 2001 | WO |
WO 0237859 | May 2002 | WO |
WO 2004047454 | Jun 2004 | WO |
WO 2005006929 | Jul 2005 | WO |
Number | Date | Country | |
---|---|---|---|
20070064804 A1 | Mar 2007 | US |