This invention relates to motion estimation in image processing systems and especially to using normalised cross correlation.
In particular, the invention addresses a problem of searching for a best match between a small portion of a source picture with a correspondingly shaped and sized portion of another picture which is taken as a reference picture. If a shape of the source portion is constrained to be a square macroblock of 16×16 pels taken from the source picture, and the reference picture is drawn from another place nearby in the same video sequence, this application becomes a familiar problem of block-based motion estimation, which is commonly used within a process of video compression.
In principle, a standard mathematical technique of cross-correlation may be applied to the problem of block-based motion estimation instead of a more usual sum of absolute differences (SAD) method.
Furthermore, it is well known in the literature that an un-normalized 2-D cross correlation surface may be efficiently calculated by means of a 2-dimensional Fourier transform. However, un-normalised cross correlation does not work very well for motion estimation.
It is an object of the present invention at least to ameliorate the aforesaid deficiency in the prior art.
According to a first aspect of the invention, there is provided a motion estimator for image processing arranged to find a motion vector from a portion of a search area in a reference picture to a portion of a source picture by finding a maximum of a 2-dimensional normalised cross-correlation coefficients surface between the portion of the source picture and the portions of the reference search area using a transform domain.
Preferably, the motion estimator further comprises: an arithmetic mean stage connected to a first input of the motion estimator arranged to obtain an arithmetic mean of a first sequence representing elements of the portion of the source picture and to subtract the arithmetic mean from each element to form a zero-mean sequence; a padding stage connected to the arithmetic mean stage arranged to pad the zero-mean sequence with zeros to a length of a second sequence representing the reference search area to form a padded sequence; a first two-dimensional fast Fourier transform stage connected to the padding stage arranged to obtain a Fourier transform of the padded sequence; a complex conjugate stage connected to the first two-dimensional fast Fourier transform stage and arranged to form a complex conjugate of the transformed padded sequence.
Preferably, the motion estimator further comprises: a second two-dimensional fast Fourier transform stage connected to a second input of the motion estimator and arranged to perform a fast Fourier transformation of the second sequence; a multiplication stage connected to the complex conjugate stage and to the second two-dimensional fast Fourier transform stage and arranged to multiply the elements of the complex conjugate of the transformed padded sequence by the elements of the transformed second sequence to form a transformed un-normalised cross-correlation; a first two-dimensional inverse fast Fourier transform stage connected to the multiplication stage and arranged to form a un-normalised cross-correlation; and a magnitude squaring stage connected to the first two-dimensional inverse fast Fourier transform stage and arranged to square the magnitude of the elements of the un-normalized cross-correlation to form a squared un-normalised cross-correlation.
Preferably, the motion estimator further comprises: a second squaring stage connected to the second input of the motion estimator and arranged to square the elements of the second sequence to form a squared search area; a third two-dimensional fast Fourier transform stage connected to the second squaring stage and arranged to form a transform of the squared search area; a third multiplication stage connected to the third two-dimensional fast Fourier transform stage and arranged to multiply by a complex conjugate of a Fourier transform of a normalisation matrix to form transformed local sums of the squared search area; and a third two-dimensional inverse fast Fourier transform stage connected to the third multiplication stage and arranged to form local sums of the squared search area.
Advantageously, the motion estimator further comprises: a second multiplication stage connected to the second two-dimensional fast Fourier transform stage arranged to multiply the elements of the transformed search area by a complex conjugate of a Fourier transformed normalisation matrix to form transformed local sums; a second two-dimensional inverse fast Fourier transform stage connected to the second multiplication stage and arranged to form a local sum for each element; a first squaring stage connected to the second two-dimensional inverse fast Fourier transform stage and arranged to square the local sum to form a squared local sum; a divider stage connected to the first squaring stage and arranged to divide the squared local sum by a number of elements in the first sequence to form a scaled squared local sum; a subtraction stage connected to the divider stage and to the third two-dimensional inverse fast Fourier transform stage and arranged to subtract the scaled squared local sum from the local sums of the squared search area to form a squared normalisation factor of the normalised cross-correlation coefficient.
Conveniently, the motion estimator further comprises: a second divisor stage connected to the subtraction stage and to the magnitude squaring stage and arranged to divide the squared un-normalised correlation factor by the squared normalisation factor to form a square of the normalised cross-correlation coefficient and output to a maximising stage arranged to maximise the normalised cross-correlation coefficient to find an optimum motion vector for the source macroblock.
Advantageously, the motion estimator is adapted for a video signal having interlaced fields in which the fields are processed in parallel as real and imaginary parts respectively.
Conveniently, the motion estimator further comprises a first split field stage connected to the first input arranged to split the first sequence into two sequences representing first and second interlaced fields respectively and parallel streams as described above for calculating the numerators of the normalised cross-correlation coefficient for each field respectively.
According to a second aspect of the invention, there is provided a method of estimating motion for image processing comprising the steps of finding a motion vector from a portion of a search area in a reference picture to a portion of a source picture by finding a maximum of a 2-dimensional normalised cross-correlation coefficient between the portion of the source picture and the portion of the reference search area using a transform domain.
Preferably, the method further comprises steps of: forming a first sequence representing elements of the portion of the source picture; obtaining an arithmetic mean of the elements and subtracting the arithmetic mean from each element to form a zero-mean sequence; padding the zero-mean sequence with zeros to a length of a second sequence representing the reference search area to give a padded sequence; performing a two-dimensional fast Fourier transform of the padded sequence to form a transformed padded sequence; forming a complex conjugate of the transformed padded sequence.
Advantageously, the method comprises the further steps of: performing a two-dimensional fast Fourier transform of the second sequence; multiplying the elements of the complex conjugate of the transformed padded sequence by the elements of the transformed second sequence to form a transformed un-normalised cross-correlation coefficient; forming a un-normalised cross-correlation by inverse transformation of the transformed un-normalised cross-correlation coefficient; squaring the transformed un-normalised cross-correlation to form a squared transformed un-normalised cross-correlation coefficient.
Conveniently, the method further comprises further steps of: squaring the elements of the second sequence to form a squared search area; forming a two-dimensional fast Fourier transform of the squared search area to form a transformed squared search area; multiplying by a complex conjugate of a Fourier transform of a normalisation matrix to form transformed local sums of the squared search area; performing a two-dimensional inverse Fourier transform of the transformed local sums of the squared search area to form local sums of the squared search area.
Advantageously, the method comprises further steps of: normalising the Fourier transformed search area by multiplying the elements by a complex conjugate of a Fourier transformed normalisation matrix to form a transformed local sum; performing a two-dimensional inverse Fourier transform of the transformed local mean to form a local sum for each element; squaring the elements of the local sum to form a squared local sum; dividing the squared local sum by the number of elements in the first sequence to form a scaled squared local sum; subtracting the scaled squared local sum from the local sums of the squared search area to form a squared normalisation factor of the normalised cross-correlation coefficient.
Conveniently, the method further comprises dividing the squared un-normalised correlation factor by the squared normalisation factor to form a square of the normalised cross-correlation coefficient and maximising the normalised cross-correlation coefficient to find an optimum motion vector for the source macroblock.
Advantageously for a video signal having interlaced fields, the fields are processed in parallel as real and imaginary parts respectively.
According to a third aspect of the invention, there is a provided a computer program comprising code means for performing all the steps of the method described above when the program is run on one or more computers.
According to a fourth aspect of the invention, there is provided computer executable software code stored on a computer readable medium, the code being for
Other aspects and features of the present invention will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures.
In the Figures, like reference numerals denote like parts.
Throughout the description, identical reference numerals are used to identify like parts.
In this invention there is described a matching criterion based on a local normalised cross correlation coefficient. A transform domain method for calculating this criterion and apparatus for performing this calculation within motion estimation by means of the normalised 2-dimensional cross correlation coefficient are disclosed, together with some optimisations in the transform sequence which reduce the computational load.
For the approach presented herein, a shape of a chosen portion of the source picture is arbitrary. That is, the invention may be applied to any shape of portion, but will be described in terms of a square macroblock without loss of generality.
A normalised cross correlation coefficient is commonly used within the literature to describe a degree to which two series of numbers are related.
The normalised cross-correlation coefficient between two 1-dimensional L-element discrete-time sequences of real numbers, P={p0, p1, . . . , p15} and Q={q0, q1, . . . , q15}, is defined as:
where
and similarly
An effect of subtracting the mean of each series from every element is to make the resultant series have zero mean. The cross-correlation coefficient between two zero mean series is then described as the sum of the elemental products of the series divided by the square root of the product of the sums of the elemental squares of each series.
The normalised cross-correlation coefficient has an interesting property that it is 1 if one zero-mean series is a positive scaled version of the other, −1 if a negative scaled version, and lies between −1 and 1 otherwise, with 0 indicating no correlation at all.
The above example illustrates the case where the two sequences are identical in length. In a block matching situation for image compression this is not usually the case and so a problem is now considered of matching if P has M elements and is longer than Q. The sequence Q might be padded with zeros to a same length as P, and the cross-correlation computed as before. By moving the location of the padding, alternative matching positions can be considered.
For matching positions iε{0, 1, 2, . . . , M-L}. this is mathematically described as:
where
and similarly
As can be seen from a comparison of
Rather than calculate a mean of a zero padded sequence, instead a mean of the original, un-normalised sequence Q is calculated and this mean is subtracted from each element before padding with zeros. The mean of P is calculated only over elements which are correlated with the unpadded elements of Q at every point. This means that
This allows definition of a one-dimensional matching criterion as the maximum value over the range iε{0, 1, 2, . . . , M-L} of the local normalised cross-correlation coefficient given by:
Extension to More than One Dimension
The extension to two dimensions is straightforward. Let R={rx,y} be the reference image and S={sx,y} be a source macroblock of size 16 by 16 pixels. The 2-dimensional normalised cross-correlation coefficient may be calculated as:
i.e. the mean of the source macroblock and
i.e. the mean of the area of the reference image which is compared to the source macroblock in this position.
Since S represents the source macroblock which we are seeking to match, and R represents the (larger) reference search area then it is clear that if the normalised cross-correlation surface C can be calculated, then searching (exhaustively) within this surface for the largest ci,j will provide the position of the closest match according to this criterion with a motion vector [i,j]. Unfortunately this is prohibitively expensive to calculate directly.
A two-dimensional zero-mean macroblock can be calculated. This needs to be calculated once only regardless of a size of a search window 15, and does not significantly contribute to an overall load, requiring 255 additions to calculate the mean, and 256 subtractions to form the zero-mean source macroblock. 256multiplications and 255 additions are then required to form the
term in the denominator.
Considering the case of a 256 by 128 search window 15, and a 16 by 16 macroblock 13, it would be necessary to calculate values of ci,j for 0≦i≦240 and 0≦j≦112. That is 27,233 elements.
Each element requires 255 additions to calculate
terms. One further multiplication and 1 square root is required to form the denominator of equation 9.
So in total the computational load is: 255+256+255+27233*(255+256+255+255)=27,805,659 additions, 256+27233*(256+256+1)=13,970,785 multiplies and 27,233 square roots. In practice it might be chosen to match on the squared normalised cross-correlation, which means using an extra 27,233 multiplies on the top line, to square it, in place of the square roots on the bottom line.
In typical applications requiring this process, only 24 microseconds are available to process each macroblock before a next one arrives, which means it would be necessary to perform 11.125E12 additions and 6.125E11 multiplications per second, which is not economically viable and is practically challenging. A practical solution requires a different approach.
The method and apparatus described below implement the calculations required to fulfil the analysis presented above. The benefit over prior art is that it reduces the computational load of the developed matching criterion so that it may be practically and efficiently implemented.
Consider two discrete-time 1-dimensional series of equal length: P={p0, p1, . . . , pL−1} and Q={q0, q1, . . . , qL−1}. These series are cyclic, i.e. p−1≡pL−1, p−2≡pL−2 etc. and likewise the elements of Q.
The cyclic convolution
A={a0, a1, . . . , aL−1}=PQ is given by:
Because the series are cyclic over L samples:
Now consider the cyclic convolution between P and a time reversed series Q′, where Q′={q0, q−1, q−2, . . . , q1−L}={q0, qL−1, qL−2, . . . , q1}={q′0, q′1, . . . , q′L−1}.
B={b0, b1, . . . , bL−1}=PQ′ is given by:
There is a notable similarity between this result and the numerator
of equation 8, which is required to calculate the normalized cross correlation coefficient.
It is a textbook result that convolution in the time domain may be performed by multiplication in a transform domain such as that provided by the discrete Fourier transform. Applying the convolution theorem:
P
Q′≡−1((P)(Q′)) Equation 15
where (P) is the discrete Fourier transform of P and −1(P) is the inverse discrete Fourier transform of P.
Since Q′ is a time-reversed version of Q, they are related by the complex conjugate of the Fourier transform, so:
(Q′)=(Q)* Equation 16
Thus
P
Q′≡−1((P)(Q)*) Equation 17
The result of equation 17 holds equally well for two dimensional series.
Since these are discrete-time series, a Fast Fourier Transform (FFT) algorithm may be employed. It is a known result that the radix-2 FFT, for power-of-2 values of L, may be performed using log2 L passes comprising L/2 complex multiplies, each of which requires four real multiplies and additions.
Equation 15 extends to two dimensions, using the 2-dimensional discrete Fourier transform and 2-dimensional cyclic convolution. The 2-dimensional DFT of an image L by K pixels, both being powers of 2, may be performed by performing L FFTs of length K followed by K FFTs of length L.
Thus the 2-D DFT of an area 256*128 requires (256*64*7+128*128*8)*4=983,040 adds and multiplies.
Normalised Local Cross-Correlation from Cyclic-Convolution
So far a matching criterion has been defined based on a maximised cross-correlation coefficient given by equation 8 for one dimension, and by equation 9 for two dimensions, showing that in two dimensions it is impractical to calculate by direct means for the purposes of a real-time motion estimator of useful search area.
A similarity has also been shown between the numerator of equation 8 and the convolution of two series, and it has been asserted that this may be calculated efficiently by means of the fast Fourier transform, the proof of which is well known and may be found in the literature.
Considering the numerator of equation 8:
This is an important result. Provided
The length of P is now increased to L samples, and Q is padded to L samples, having first removed the mean of the first 16 elements. Thus
P={p0, p1, . . . , pL−1} and Q={q0−
Considering the cyclic convolution result of equation 14, we obtain:
since Q is padded with 0. Thus, the fact that the convolution is cyclic does not affect the validity of the convolution result provided 0≦i≦L−16. Outside these limits is a boundary condition for which the results are not valid.
The term
found in the denominator of equation 8, is very expensive to calculate directly, since
and is different at every point.
However,
which is the familiar result for the statistical variance of a sequence.
We define a series
and observe that:
This provides basis for an efficient calculation of all the cross correlation coefficients in equation 8.
In two dimensions we obtain:
(N)* is a constant matrix, so equation 24 requires three forward transforms and three inverse Fourier transforms to compute. Each of these may be performed using 983,040 adds and multiplies for a search area of 256×128. The total computational load is approximately 6E6 adds and multiplies per macroblock, which is a significant saving on the direct calculation requiring 2.7E7 additions and 1.35E7 multiplications.
The mathematical considerations above have shown that a normalised cross-correlation coefficient is a useful criterion on which one area of an image might be matched to another. Furthermore we have shown that the cross-correlation coefficient may be more efficiently calculated by means of three 2-dimensional forward and inverse Fourier transforms.
A mean removal stage 112 has an input 111 and an output to a pad-with-zeros stage 113. An output of the pad-with-zeros stage is input to a first 2D Fast Fourier Transform stage 114 which outputs to a complex conjugate stage 115 with an output to a first input of a first element-by-element multiplication stage 123.
A second input 121 of the motion estimator 50 is to a second 2D Fast Fourier Transform stage 122 with an output to a second input of the first element-by-element multiplication stage 123. The first element-by-element multiplication stage 123 has an output to a first 2D Inverse Fast Fourier Transform stage 124 which outputs to a first element-by-element squaring stage 125 with an output to a first input of an element-by-element divide stage 147.
A third input 131 of the motion estimator 50 is to a first input of a second element-by-element multiply stage 132 which has a second input from the output of the second 2D Fast Fourier Transform stage 122. An output of the second element-by-element multiply stage 132 is to a second 2D Inverse Fast Fourier Transform stage 133 which has an output to a second element-by-element squaring stage 134. A divide by 256 stage 135 has input from the second element-by-element squaring stage 134 and an output to a second input of a element-by-element subtract stage 146.
In parallel to the second 2D Fast Fourier Transform stage 122, the second input 121 is also to a third element-by-element squaring stage 142 which has an output to a third 2D Fast Fourier Transform stage 143. A third element-by-element multiplication stage 144 has an input from the third 2D Fast Fourier Transform stage 143, a fourth input 141 of the motion estimator 50 and an output to a third 2D Inverse Fast Fourier Transform stage 145. The third 2D Inverse Fast Fourier Transform stage 145 has an output to a second input of the element-by-element subtraction stage 146. The element-by-element division stage 147 has a second input from the element-by-element subtraction stage 146 and an output to a maximisation engine 148 which has an output 248 of the motion estimator 50.
In use, a source macroblock sx,y is introduced at the first input 111 of the motion estimator 50, and passes through the mean removal stage 112 to give a zero-mean macroblock (sx,y-
The transformed reference search area (R) 222 also undergoes an element-by-element multiplication with the conjugate of a transformed normalisation matrix (N)* input at the third input 131 of the motion estimator 50, and the resulting transformed local sum (R)(N)* 232 is inverse transformed to give a local sum −1((R)(N)*) 233 at every point. This is elementally squared to give a squared local sum (−1((R)(N)*)2 234 and this is divided by 162 to give a scaled squared local sum
The reference search area R input at the second input 121 of the motion estimator 50 is also elementally squared to produce a squared reference search area R2 242, and this is Fast Fourier transformed to give a transformed reference squared search area (R2) 243. This is then multiplied by a transformed normalisation matrix (N)* input at the fourth input 141 of the motion estimator 50 to give transformed local sums (R2)(N)* 244 of the squared reference search area R2. These are then inverse transformed to give the local sums −1((R2)(N)*) 245 of the squared reference search area R2. The scaled squared local sum
is subtracted from this to give the squared normalisation factor
The transformed un-normalised cross-correlation −1((R)(N)*) 223 is inverse transformed to give the un-normalised cross-correlation −1((R)(sx,y-
246, to give the squared normalised cross-correlation
247. It will be noted that this differs from a (magnitude) square of the normalised cross-correlation factor of equation 24 by the factor
which is a constant for a given source macroblock independent of the search area and the expression may therefore be used to find a maximum value to obtain a best fit motion vector. This expression is input to the maximisation engine 148 and a resultant best score or scores and corresponding motion vector or vectors are produced at the output 248.
The method and apparatus described above may be further improved as described below to provide a further reduction of the computational load.
The padded macroblock has only non-zero terms in the first few rows and columns. Therefore, if the two dimensional FFT is performed by horizontal transforms followed by vertical transforms only a few horizontal transforms actually have to be carried out because the remaining rows are all zero. Furthermore, not all of the FFT passes in the vertical transforms need to be carried out completely, particularly in the earlier stages, because many of the elements are zero.
Scaling this up to a realistic case, if R is 256 pels horizontally and 128 pels vertically, and S is 16 pels by 16 pels, then the total load in butterflies for the R transform is (128 rows)*(8 passes)*(128 butterflies)+(256 columns)*(7 passes)*(64 butterflies)=245,760 butterflies. The S transform requires (16 rows)*(16+32+64+128*5 butterflies)+(256 rows)*(16+32+5*64 butterflies)=106,240 butterflies.
Because of the prevalent use of interlaced video, it is often desirable to find the scores for matches between fields of the macroblock and fields in the image and then combine these to provide scores for the overall macroblock. There is an efficient method for calculating, which also reduces the computational load of the overall engine.
The Fourier transform applies to complex series. So, in this embodiment, the even field lines in the search area are coded as real elements, and the odd field lines as imaginary. Coding the first field lines of R as real and the second field as imaginary reduces a cost of an R transform because a number of rows presented to the transform is halved. (64 rows)*(8 passes)*(128 butterflies)+(256 columns)*(6 passes)*(32 butterflies)=114,688 butterflies are now required.
The two fields of the source macroblock are transformed separately. Using the efficient transform above, each transform now requires (8 rows)*(16+32+64+128*5 butterflies)+(256 rows)*(8+16+4*32 butterflies)=44,928 butterflies. The total load for the 2 S transforms is then 89,856 butterflies.
A first split field stage 710 has a first input 311 of the motion estimator 70 and two outputs to parallel mean removal stages 312, 512 which have outputs to parallel pad-with-zeros stages 313, 513. Outputs of the pad-with-zeros stages 313, 513 are input to parallel first 2D Fast Fourier Transform stages 314, 514 which output to parallel complex conjugate stages 315, 515, with outputs to first inputs of parallel first element-by-element multiplication stages 323, 523 respectively.
A second input 321 of the motion estimator 70 is to a second split field stage 720 which has two outputs to a first stage 740 for multiplying, as illustrated, a first input by
before adding to a second input, whereas, in practice the top and bottom search fields are input to a real and imaginary inputs of a second Fast Fourier Transform stage 322 respectively, with an output to second inputs of the first pair of parallel element-by-element multiplication stages 323, 523. The first parallel pair of element-by-element multiplication stages 323, 523 have outputs to a first parallel pair of 2D Inverse Fast Fourier Transform stages 324, 524 which output to a first parallel pair of element-by-element squaring stages 325, 525 with outputs to first inputs of parallel element-by-element divide stages 347, 547.
A third input 331 of the motion estimator 70 is to first inputs of a second pair of parallel element-by-element multiply stages 332, 532 which have second inputs from outputs of the second pair of parallel 2D Fast Fourier Transform stages 322, 522. Outputs of the second pair of parallel element-by-element multiply stages 332, 532 are to a second parallel pair of 2D Inverse Fast Fourier Transform stages 333, 533 which have outputs to a second element-by-element squaring stage 334 and to a second input of an element-by-element subtract stage 346 respectively. A divide by 128 stage 335 has an input from the second element-by-element squaring stage 334 and an output to a first input of the element-by-element subtract stage 346, respectively.
In parallel to the second split field stage 720, the second input 321 of the motion estimator 70 is also to a third element-by-element squaring stage 342 which has an output to a third split field stage 730 which has two outputs to a second stage 750 for multiplying, as illustrated, a first input by
before adding to a second input, whereas, in practice the squared top and bottom search fields are input respectively to real and imaginary inputs of one of the second pair of parallel 2D Fast Fourier Transform stages 522. The parallel element-by-element division stages 347, 547 have first inputs from the element-by-element subtraction stage 346 and outputs to parallel maximisation engines 348, 548 which have parallel outputs 448, 648 of the motion estimator 70.
Output from the element-by-element subtract stage 346 is also input to an add real to imaginary stage 760 output of which is input to one input of a second element-by element divide stage 747. Outputs from the parallel element-by-element squaring stages 325, 525 to a second add real and imaginary stage 770. An output of the second add real to imaginary stage 770 is input to a second input of the element-by-element divider 747. An output of the element-by-element divider 747 is to a third maximisation engine 748 with an output 848 of the motion estimator 70 as an approximation to the overall match criterion as described below.
In use a 16×16 pixel source macroblock input at the first input 311 of the motion estimator 70 is split into two 16×8 fields, the top field 810 of the macroblock and bottom field 811 of the macroblock. These separately undergo mean removal to give a zero mean top field of the macroblock 412 and zero mean bottom field 612 of the macroblock. These then undergo padding to a size X by Y/2 where X is the horizontal dimension of the search area and Y is the vertical dimension, to give padded top field 413 of the macroblock and padded bottom field 613 of the macroblock. These undergo 2D FFTs to give a transformed top field 414 of the macroblock and transformed bottom field 614 of the macroblock, and then complex conjugate operations to give the conjugate transformed top field 415 of the macroblock and conjugate transformed bottom field 615 of the macroblock.
A reference search area input at the second input 321 of the motion estimator 70, having size X by Y pels is split into a top search field 820 and bottom search field 821, each of size X by Y/2 pels. The bottom search field 821 is illustrated as multiplied by
and then added to the top search field 820 to give a complex search picture 840 of size X by Y/2 pels. In practice, the top and bottom search fields are input, respectively, to real and imaginary inputs of a 2D FFT stage to give a transformed complex search picture 422, which is elementally multiplied with both the conjugate transformed fields 415, 615 of the macroblock to give the transformed un-normalised complex top field correlation 423 and transformed un-normalised complex bottom field correlation 623. These undergo inverse 2-D FFTs to give the un-normalised complex top field correlation 424 and un-normalised complex bottom field correlation 624.
The real part of the un-normalised complex top field correlation 424 is representative of the matches between the top field 810 of the macroblock and the top field 820 of the search area. The bottom part of the un-normalised complex top field correlation 624 is representative of the matches between the top field 810 of the macroblock and the bottom field 821 of the search area. Similarly, the real part of the un-normalised complex bottom field correlation 624 is representative of the matches between the bottom field 811 of the macroblock and the top field 820 of the search area. The bottom part of the un-normalised complex bottom field correlation 624 is representative of the matches between the bottom field 811 of the macroblock and the bottom field 821 of the search area. The un-normalised complex field correlations 424, 624 undergo an elemental squaring of the real and imaginary coefficients separately, giving the squared un-normalised complex top field correlation 425 and squared un-normalised complex bottom field correlation 625.
The (real) search area input at second input 321 undergoes an elemental squaring to give a squared search area 442. This is then split into a squared search top field 830 and squared search bottom field 831. The squared search bottom field 831 is then illustrated as multiplied by
and added to the squared search top field 830 to give a complex squared search field 850. In practice, the squared top search field and the squared bottom search field are input, respectively, to real and imaginary inputs of a 2-D FFT stage to give a transformed complex squared search area 622. The transformed complex search area 422 and transformed complex squared search area 622 are both elementally multiplied with a transformed normalisation matrix input at third input 331. The transformed normalisation matrix is the 2D-FFT of an image having ones in the non-padded region of the padded top field of the macroblock and zeros elsewhere, i.e. the 2-D FFT of an array of size X by Y/2 pels with ones in the top left 16 by 8 rectangle and zeros elsewhere. The imaginary part of the untransformed normalisation matrix is all zero. Following this multiplication, the transformed local sums of the complex search area 432 and transformed local sums of the complex squared search area 632 are inverse transformed via the 2-D IFFT to give local sums of the complex search area 433 and local sums of the squared complex search area 633. The local sums of the complex search area 433 then undergo a squaring of the real and imaginary coefficients, to give the squared local sums of the complex search area 434. This is divided by 128 to give the scaled squared local sums of the complex search area 435. This is subtracted from the local sums of the complex squared search area 633 to give the squared complex normalisation matrix 446. The real parts of the squared un-normalised complex top field correlation 425 and squared un-normalised complex bottom field correlation 625 are elementally divided by the real part of the squared complex normalisation matrix 146, and similarly the imaginary parts are divided, giving the squared complex top field correlation 447 and squared complex bottom field correlation 647.
The real part of the squared top field correlation 447 is the local normalised cross correlation of the top field 810 of the macroblock with the top field of the search area 820. The imaginary part of the squared top field correlation 447 is the local normalised cross correlation of the top field 810 of the macroblock with the bottom field 821 of the search area. The real part of the squared bottom field correlation 647 is the local normalised cross correlation of the bottom field 811 of the macroblock with the top field 820 of the search area. The imaginary part of the squared top field correlation 447 is the local normalised cross correlation of the bottom field 811 of the macroblock with the bottom field 821 of the search area. These results are maximised to return the best match or matches for the top field of the macroblock 448 and best match or matches for the bottom field of the macroblock 648.
Furthermore, the un-normalised results may be added together, taking the real part from the squared un-normalised top field correlation 425 added to the corresponding imaginary part of the squared un-normalised complex bottom field correlation 625, returned as a real, and also taking the imaginary part from the squared un-normalised top field correlation 425 added to the corresponding real part of the squared un-normalised complex bottom field correlation 625, returned as imaginary, to give a complex approximation to the un-normalised overall macroblock correlation 870. Note that “corresponding” does not necessarily mean “co-located” since in the latter of these additions an offset of the top field must be allowed for, due to spatial considerations.
The complex squared normalisation matrix 446 may also have its real and imaginary parts added together, returned as real, giving an approximation to the overall normalisation 860. The approximation 870 to the un-normalised overall macroblock correlation may then be divided by the approximation 860 to the overall normalisation to give a complex approximation to the overall local normalised squared cross-correlation 847, which may be maximised to produce a best match or matches for the whole macroblock 848.
There are several similar but distinct ways of obtaining approximations to the overall correlation result. For instance we might combine results 424, 624, 433, 633 immediately after the 2-D IFFT operations instead, but this would require more additional units to calculate the combined normalisation.
In this application we disclose:
The key benefits associated with matching using the described normalised correlation coefficient include improved performance over the more usual sum of absolute differences technique. The method is also invariant to linear transformations of the luminance levels of the images and is therefore particularly useful in processing images in which fading to and from black occurs or when the image is affected by transient flashing effects; both these conditions are problematic in current methods of deriving motion vectors.
Alternative embodiments of the invention can be implemented as a computer program product for use with a computer system, the computer program product being, for example, a series of computer instructions stored on a tangible data recording medium, such as a diskette, CD-ROM, ROM, or fixed disk, or embodied in a computer data signal, the signal being transmitted over a tangible medium or a wireless medium, for example microwave or infrared. The series of computer instructions can constitute all or part of the functionality described above, and can also be stored in any memory device, volatile or non-volatile, such as semiconductor, magnetic, optical or other memory device.
Although the present invention has been described with reference to preferred embodiments, workers skilled in the art will recognize that changes may be made in form and detail without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
0613759.0 | Jul 2006 | GB | national |