The present application claims priority of European Patent Application No. 12 167 657.1, filed in the European Patent Office on May 11, 2012, the entire contents of which being incorporated herein by reference.
1. Field of the Disclosure
The present disclosure relates to an apparatus and a corresponding method for determining the reliability of a shift vector between two images. Further, the present disclosure relates to an image enhancement apparatus, a computer program and a computer readable non-transitory medium
2. Description of the Related Art
For many applications in image processing vector fields are used which describe correspondences (also called shifts hereinafter) between different images. Examples for these methods are Motion Compensation, Super-Resolution, Temporal Filtering, Synthetic View Generation for Multi-View Displays, Depth Estimation in 3D Sequences, De-Interlacing or Segmentation. Motion and Disparity Estimation is a difficult task and in many cases it is not possible to achieve correct and reliable vector fields. The mentioned applications often rely on the vectors and the output quality in many cases strongly depends on the vector quality. A good estimation of the reliability of the input vectors can help to avoid artifacts from erroneous vectors.
In M. Tanaka and M. Okutomi, “Toward Robust Reconstruction-Based Super-Resolution,” in Super-Resolution Imaging, P. Milanfar, Ed. Boca Raton: CRC Press, 2011, pp. 219-244 a method for selecting pixel values from a compensated input for a robust Super-Resolution method is presented. The normalized cross correlation is used as a local similarity estimation in combination with a local displacement estimation, computing local sub-pixel shifts and excluding pixels with a low similarity and a high displacement from being processed.
In US 2010/0119176 A1 a temporally recursive Super-Resolution system is presented that computes the local pixel difference between reference input and compensated input to generate a mask for mixing both inputs and using the result as input for the detail generation step.
In Demin Wang, André Vincent, and Philip Blanchfield, “Hybrid De-Interlacing Algorithm Based on Motion Vector Reliability” IEEE Transactions on circuits and systems for video technology, Vol. 15, No. 8, August 2005 discloses a hybrid de-interlacing method that includes switching between a spatial and a motion compensated processing depending on the vector reliability. The vector reliability is computed by comparing the current vector with spatially neighboring vectors depending on a probability function.
The “background” description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventor(s), to the extent it is described in this background section, as well as aspects of the description which may not otherwise qualify as prior art at the time of filing, are neither expressly or impliedly admitted as prior art against the present invention.
It is an object to provide an apparatus and a corresponding method for determining the reliability of a shift vector between two images with higher accuracy and reliability than known apparatus and methods. It is a further object to provide a corresponding computer program for implementing said method and a computer readable non-transitory medium.
According to an aspect there is provided an apparatus for determining the reliability of a shift vector between two images, said apparatus comprising:
an image compensation unit configured to compensate local shifts between a first image and a second image and to obtain a compensated second image,
a similarity estimation device configured to determine a similarity information by determining one or more similarity measures between said first image and said compensated second image,
a vector consistency check device configured to compare shift vectors describing the shift between said first image and said second image from different shift estimation directions to obtain a consistency weight information, and
a combination unit configured to combine said similarity information and said consistency weight information to obtain a reliability information describing the reliability of said shift vectors.
According to a further aspect there is provided an apparatus for determining the reliability of a shift vector between two images, said apparatus comprising:
an image compensation means for compensating local shifts between a first image and a second image and to obtain a compensated second image,
a similarity estimation means for determining similarity information by determining one or more similarity measures between said first image and said compensated second image,
a vector consistency check means for comparing shift vectors describing the shift between said first image and said second image from different shift estimation directions to obtain a consistency weight information, and
a combination means for combining said similarity information and said consistency weight information to obtain a reliability information describing the reliability of said shift vectors.
According to another aspect an image enhancement apparatus for enhancing an input image of a sequence of input images and obtaining an enhanced output image is provided, said apparatus comprising
a shift apparatus for shifting one or more images by use of shift vectors between two images, and
an apparatus for determining the reliability of said shift vectors as proposed herein, wherein said shift apparatus is configured to take said reliability into account when using said shift vector for shifting one or more images
According to still further aspects a corresponding method, a computer program comprising program means for causing a computer to carry out the steps of the method disclosed herein, when said computer program is carried out on a computer, as well as a non-transitory computer-readable recording medium that stores therein a computer program product, which, when executed by a processor, causes the method disclosed herein to be performed are provided.
Preferred embodiments are defined in the dependent claims. It shall be understood that the claimed method, the claimed computer program and the claimed computer-readable recording medium have similar and/or identical preferred embodiments as the claimed apparatus and as defined in the dependent claims.
The proposed apparatus and method compute the reliability of shift vectors (also called correspondence vectors), in particular of motion and/or disparity vectors. The reliability is computed by combining two means, a vector consistency check and a vector similarity check. The vector consistency check compares vectors from two vector estimations with inverse estimation direction against each other. The vector similarity check calculates several similarity measures for comparing a reference input and the result from an image compensation based on the input vectors. The results of these means are several weighting factors which are combined to a final reliability measure for each input vector.
The computed vector reliability measure for one or more shift vectors can be used in many applications for avoiding artifacts resulting from erroneous vectors. In contrast to the known methods the proposed apparatus and method use one or more (preferably multiple) similarity measures specified for local image characteristics in combination with a consistency check for shift vectors.
It is to be understood that both the foregoing general description of the invention and the following detailed description are exemplary, but are not restrictive, of the invention.
A more complete appreciation of the disclosure and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings, wherein:
Referring now to the drawings, wherein like reference numerals designate identical or corresponding parts throughout the several views,
The image compensation unit 20 is configured to compensate local shifts between a first image X (e.g. a reference input which may be the current image of a sequence of images of a video stream) and a second image Z (e.g. a warped input which may be the preceding image of said sequence of images) and to obtain a compensated second image Y. The similarity estimation device 10 is configured to determine a similarity information 2 by determining one or more similarity measures between said first image X and said compensated second image Y.
The vector consistency check device 50 is configured to compare shift vectors V1, V2 describing the shift between said first image X and said second image Z from different shift estimation directions to obtain a consistency weight information 3. In this embodiment, a vector consistency weight computation unit 51 is provided for this purpose.
The combination unit 60 is configured to combine said similarity information 2 and said consistency weight information 3 to obtain a reliability information 4 describing the reliability of said shift vectors. Preferably, said combination unit 60 is a multiplication unit for multiplying said similarity information 2 and said consistency weight information 3 to obtain said reliability information 4.
Thus, the proposed apparatus 1 computes a reliability measure (the reliability information 4) for shift vectors (in particular motion and/or disparity vectors, or in general for vectors describing local (pixel) shifts) between two images. The vector reliability is computed by a combination of a vector consistency check, comparing the vectors from two different estimation directions, and a similarity estimation between two images, in particular a reference input and a warped input, which is compensated depending on the input vectors, resulting from a vector estimation method. The vector reliability is thus a combination (in particular a product) of at least two (preferably several) weighting factors computed from one or more (different) similarity measure(s) and the vector consistency.
As already mentioned, the vector consistency is computed by comparing vectors from different estimation directions. If motion vectors shall be checked for consistency for example the estimated motion vectors from the time instance t to t-1 can be compared to inverse estimated motion vectors from t-1 to t. If disparity vectors shall be checked for consistency, the estimated disparity vectors from left view to right view can be compared to the estimated vectors from right view to left view. Obviously for this purpose it is preferred that two inverse estimated vector fields are available as input V1, V2. The consistency weight 3 is computed depending on the difference between the inverse vectors V1, V2.
The adaptive similarity estimation unit 30 is configured to adaptively determine a normalized cross correlation weight factor 2a1 from said first image X and said compensated second image Y using said flatness information 5b. The normalized cross correlation obtained by a normalized cross correlation unit 31 is a reliable similarity measure in texture areas, therefore the normalized cross correlation NCC weight is computed in a NCC weight computation unit 32 depending on the image area the observed pixel is located in. In flat areas it is preferably set to 1, so that it does not affect the final reliability value.
Further, in this embodiment said adaptive similarity estimation unit 30 is configured to adaptively determine a summed absolute difference weight factor 2a2 from said first image X and said compensated second image Y using said flatness information 5b and said contrast information 5a. Particularly in flat areas the weighted SAD obtained by a weighted SAD unit 33 is used for similarity estimation. In a SAD weight computation unit 34 the SAD weight is determined, whereby the SAD weight is set to 1 in texture areas. To be able to distinguish whether the current pixel is located in a flat or textured region, the flatness information 5b is used.
The normalized cross correlation weight factor 2a1 and the summed absolute difference weight factor 2a2 are finally multiplied by a multiplication unit 35 to obtain the first similarity information 2a.
The non-adaptive similarity estimation unit 70 is configured to determine a structural similarity (SSIM) weight factor 2b2 from said first image X and said compensated second image Y. Thus, as a further similarity measure the SSIM, which is a combination of luminance, contrast and structure comparison, is determined in a SSIM determination unit 73. Based on this measure the SSIM weight factor 2b2 is computed as a further weighting factor in an SSIM weight computation unit 74. These three measures are preferably all computed in local block areas, therefore they describe an average over a set of pixels.
Strong single pixel differences might not be regarded. Therefore the non-adaptive similarity estimation unit 70 is further configured to determine a luminance difference weight factor 2b1 from said first image X and said compensated second image Y using said contrast information 5a. Thus, the single pixel luminance difference is computed in a luminance difference determination unit 71 and the luminance difference weight factor 2b1 is computed as a further weighting factor based on this value in a luminance difference weight computation unit 72. The SAD weight and the luminance difference weight are computed depending on the local contrast, as SAD and luminance difference strongly depend on this value.
The luminance difference weight factor 2b1 and the SSIM weight factor 2b2 are finally multiplied by a multiplication unit 75 to obtain the second similarity information 2b.
The vector consistency check device 50 is configured to compare the difference between said shift vectors V1, V2 to a shift threshold in a vector consistency check unit 52 and to set said consistency weight information 3 to a first or a second value depending on the result of said comparison in a consistency weight computation unit 53.
The final vector reliability 4 is computed as a product of the five described weighting factors. Preferably, as output a vector reliability map is computed by multiplying a map which is initially set to 1 with the locally computed factors.
Exemplary embodiments of the computation of the different weighting factors and the image analysis methods are described in the following.
The image compensation unit 20 compensates the local shifts between two images X, Z, for example between the temporal instances t and t-1 or between left and right view. These shifts are described by shift vectors V1=(vx, vy) for each pixel. The motion compensation is realized using the following equation:
Y(x,y)=Z(x+vx,y+vy) (1)
The shift vectors can be sub-pixel accurate, in this case for image compensation these sub-pixel positions have to be interpolated. A possible solution is the utilization of a bilinear interpolation. The luminance values of the compensated image are computed as follows:
If the accessed image position of the previous result is out of range, the luminance value of the reference input X is copied.
The local contrast 5a is computed inside a local block area, e.g. a 3×3 block area, around the currently processed pixel value. Inside this area the minimum and maximum value are detected. The local contrast 5a is defined as difference between minimum and maximum value inside the local block area.
As already mentioned, the flat area detection is based on the local gradient variance. In a first step the absolute gradient is computed for the whole image. The gradients in x- and y-directions are computed by simple difference operators.
gradX(x,y)=X(x,y)−X(x−1,y)
gradY(x,y)=X(x,y)−X(x,y−1) (3)
Then the absolute gradient is computed by the following operation:
grad=√{square root over (gradX2+gradY2)} (4)
The gradient variance is computed inside a local block area C, e.g. a 5×5 block area:
Finally the flat area 5a is detected using a binary decision based a gradient variance threshold.
flat area: gradVar≦Threshold
texture area: gradVar≧Threshold
The normalized cross correlation (NCC) is computed for each pixel inside a local block area C, e.g. in a 5×5 block area, around the currently processed image position (x, y) using the following equation
The NCC weighting factor 2a1 is computed for each pixel based on two thresholds Thr1≧Thr2 using the following equation:
In case Thr1 equals Thr2 a binary weighting factor is realized, for offering a hard reliability decision. In flat areas the NCC weight 2a1 is also set to 1, as in flat areas the normalized cross correlation is an unreliable similarity measure.
he weighted SAD is computed for each pixel inside a local block area C, e.g. a 3×3 block area, around the currently processed image position (x, y) using the following equation
Exemplary (already normalized) weights are
The SAD weighting factor 2a2 is computed for each pixel based on two thresholds Thr1≧Thr2 using the following equation:
In case Thr1 equals Thr2 a binary weighting factor is realized, for offering a hard reliability decision. In texture areas the SAD weight 2a2 is set to 1, as in texture areas the normalized cross correlation is an unreliable similarity measure. Thr1 and Thr2 are selected depending on the local contrast for example Thr1=1.2·localContrast and Thr2=0.7·localContrast.
The SSIM is computed for each pixel inside a local block area C, e.g. a 5×5 block area, around the currently processed image position (x, y) using the following equation
The SSIM weighting factor 2b2 is computed for each pixel based on two thresholds Thr1≧Thr2 using the following equation:
In case Thr1 equals Thr2 a binary weighting factor is realized for offering a hard reliability decision.
The similarity measures 2a1, 2a2, 2b2 mentioned up to now are all preferably computed inside a local block area, describing an average over a set of pixels. Strong differences between X and Y that are spatially limited to one pixel (similar to salt and pepper noise) might not be sufficiently regarded. Therefore a further weighting factor 2b1 is computed depending on the pixel-wise luminance difference which is defined by
lumDiff(x,y)=|X(x,y)−Y(x,y)| (21)
The luminance difference dependent weighting factor is computed for each pixel based on two thresholds Thr1≧Thr2 using the following equation:
In case Thr1 equals Thr2 a binary weighting factor is realized, for offering a hard reliability decision. Thr1 and Thr2 are selected depending on the local contrast and should be higher than the SAD thresholds, for example Thr1=1.6·localContrast and Thr2=1.2·localContrast.
For the vector consistency check shift vectors, e.g. motion vectors, from two inverse estimation directions are compared. If the difference between the two compared vectors is above a defined threshold, the vector is assumed to be unreliable. For vector comparison the vector
is compared to the projected inverse vector
by computing the absolute differences in x and y direction. If one of the differences exceeds a defined threshold the vector consistency weighting factor is set to 0, otherwise it is set to 1.
The proposed reliability determination apparatus and method can be used in various constellations and application. An example of an application is illustrated in
Other examples for application of the proposed method and apparatus are Motion Compensation, Super-Resolution, Temporal Filtering, Synthetic View Generation for Multi-View Displays, Depth Estimation in 3D Sequences, De-Interlacing or Segmentation.
The various elements of the different embodiments of the provided apparatus may be implemented as software and/or hardware, e.g. as separate or combined circuits. A circuit is a structural assemblage of electronic components including conventional circuit elements, integrated circuits including application specific integrated circuits, standard integrated circuits, application specific standard products, and field programmable gate arrays. Further a circuit includes central processing units, graphics processing units, and microprocessors which are programmed or configured according to software code. A circuit does not include pure software, although a circuit does include the above-described hardware executing software.
Obviously, numerous modifications and variations of the present disclosure are possible in light of the above teachings. It is therefore to be understood that within the scope of the appended claims, the invention may be practiced otherwise than as specifically described herein.
In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. A single element or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
In so far as embodiments of the invention have been described as being implemented, at least in part, by software-controlled data processing apparatus, it will be appreciated that a non-transitory machine-readable medium carrying such software, such as an optical disk, a magnetic disk, semiconductor memory or the like, is also considered to represent an embodiment of the present invention. Further, such a software may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems.
Any reference signs in the claims should not be construed as limiting the scope.
Number | Date | Country | Kind |
---|---|---|---|
12167657.1 | May 2012 | EP | regional |