The invention relates to a method and device for motion-compensated processing of image signals.
US-20020093588 A1 discloses a cost-effective film-to-video converter for high definition television. High definition video signals are pre-filtered and down-sampled by a video converter system to standard definition picture sizes. Standard definition motion estimators employed for field rate up-conversion are then utilized to estimate motion vectors for the standard definition pictures. The resulting motion vectors are scaled and post-processed for motion smoothness for use in motion compensated up-conversion of the field rate for the high definition pictures.
It is, inter alia, an object of the invention to provide an improved motion-compensated processing of image signals. The invention is defined by the independent claims. Advantageous embodiments are defined in the dependent claims.
The invention is based on the observation that the finest details, particularly on Flat Panel displays, are lost for faster motion. So, in one aspect of the invention, the idea is to use efficient motion-compensated up-conversion operating at a lower spatial resolution, and to add back the uncompensated fine details for slow moving image parts to the up-scaled result. In this way, motion-compensated processing of HDTV signals is possible at mitigated investments in hardware and/or software. A main difference with US-20020093588 is that in that reference, the motion-compensated processing (but for the calculation of the motion vectors) is still carried out on high-definition signals, while it is now proposed to carry out the motion-compensated interpolation on down-converted signals. An embodiment of the invention provides advantageous ways to mix the up-scaled interpolated image with the original dependent on the speed so as to keep full resolution/sharpness for stationary images and to use the motion-compensated image for moving images; at high speed the output is dominated by the motion-compensated image.
These and other aspects of the invention will be apparent from and elucidated with reference to the embodiments described hereinafter.
The lower resolution image lpf is also up-scaled by an up-scaler sc3 to the original 1080/1920 format, and then subtracted from the input signal to obtain a high-pass filtered signal hpf. So, the combination of down-scaler sc1, upscaler sc3, and the subtracter forms a high-pass filter H. In an alternative embodiment, a genuine high-pass filter is used. The high-pass filtered signal hpf is multiplied by a factor k and then added to the up-scaled motion-compensated signal by means of a multiplier and an adder which together form a combining circuit C. The high-pass filter H and the combining circuit C together form a mixer M that combines (a high-pass filtered version of) the input signal with the upscaled motion-compensated signal from up-scaler sc2.
In the embodiments of
Output=k*Orig+(1−k)*NM_result,
where k is a mixing factor, Orig is the original input image, and NM_result is the output of the motion-compensated up-conversion.
As regards using the previous and current vector fields, the spatio-temporal consistency is calculated. The largest difference in vector length is determined between the vector of the current block and all other vectors within a spatio-temporal aperture comprising blocks from a current vectors field CVF and a previous vector field PVF as shown in
As regards using the current vector length, the length of the motion vector of the current block is calculated. Basic idea: for zero and small motion, mixing is allowed, and for large motion not (as it would result in severe artifacts). Result: (10 bits and) full resolution for stationary image parts and (8 bits) lower resolution for moving image parts. In one example, with 8 bits video, k_vectorlenght=1 if the vector length is 0, and k_vectorlength=0 if the vector length is 4or more, with a linear transition between 0 and 1 for vector lengths between 0 and 4.
As regards using the luminance value, the basic idea is that for very dark picture regions do not apply the switching as the switching in itself becomes more easily visible, and rely on the MC lower resolution only. In one example, with 8 bits video, k_luma=0 if the pixel value is less than 25, and k_luma=1 if the pixel value is 32 or more, with a linear transition between 0 and 1 for pixel values between 25 and 32.
The final mix factor k is defined by
k=k_inconsistency * k_vectorlength * k_luma.
In more detail, the basic concept is defined by:
FOUT(
with 0≦k≦1, spatial coordinate
The low and high pass pictures have obviously the same spatial resolution, although the temporal interpolation of the low-pass picture can be applied at a much lower resolution followed by a spatial scaler to arrive at the output resolution.
In stationary picture parts, the k factor can be set to 1, and as such the complete frequency spectrum is being preserved. There is basically no loss of resolution (unless the interpolator introduces errors). For fast moving image parts, the k value is set to 0, and as such the output has only spectral components in the lower frequency spectrum. The higher spectral components are anyway harder to observe, in particularly on an LCD panel. Finally, for slow moving image parts, k is set to an intermediate value, and therefore the output spectrum contains all low and some higher spectral frequency components. If k is set too high, there is a risk on introducing judder, as the high frequency components are not compensated for motion. If k is locally set too low, loss of spatial resolution occurs.
Although there are various means to control this k according to the above description, in a preferred embodiment, the control signal k depends on the consistency of the local motion vectors, the length of the motion vector and the pixel level, i.e.:
k=kconsistencykvectorkpixel
The consistency is determined by the largest difference (‘MVD’) between the motion vector for the current block (blue in the picture below) and the selected neighbors: the spatial neighbors (in green) and the temporal neighbor (in gray). A block of pixels is typically 8 by 8 pixels.
The difference is calculated by the absolute difference of the x components and y components of the motion vectors.
Then kconsistency is determined by:
kconsistency=1—CLIP(βMVD,0,1)
and with CLIP(a,b,c) defined as CLIP(a,b,c)=a if b≦a≦c and, CLIP(a,b,c)=b is a<b and CLIP(a,b,c)=c if a>c. Furthermore, β is a fixed gain/scaling factor.
The dependency on the vector length is defined by:
kvector=1−CLIP(γL,0,1)
with L the vector length (which is the sum of the absolute horizontal vector component and the vertical vector component), and γ a programmable gain factor.
Furthermore, it was found that changes near black are more visible than in other parts of the grey scale. As such, a dependency on the pixel value was added:
kpixel=CLIP(η(F(
with η a gain factor and κ an offset. So for dark pixels this gain factor tends towards zero and for brighter towards one.
One embodiment of the invention can be summarized as follows. An apparatus for motion-compensated picture-rate conversion, the apparatus comprising means sc1 to downscale an input image, means ME to estimate motion using the downscaled image, means UPC to interpolate an intermediate downscaled image using the estimated motion and the downscaled image, means sc2 to upscale the interpolated image, and means M to output a combination of the up-scaled intermediate downscaled image and (a (high-pass) filtered version of) the input image. The invention is advantageously used in a display device (e.g. a TV set) comprising a device as shown in
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, the expression “combining signal A with signal B” includes the embodiment that a first signal derived from signal A is combined with a second signal derived from signal B, such as where only a high-frequency part of a signal is used in the combination. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word “comprising” does not exclude the presence of elements or steps other than those listed in a claim. The word “a” or “an” preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and/or by means of a suitably programmed processor. In the device claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB07/53303 | 8/20/2007 | WO | 00 | 6/26/2009 |
Number | Date | Country | |
---|---|---|---|
60822958 | Aug 2006 | US |