Priority Claims/Related Applications
This application is a 371 U.S. national stage filing of (and claims the benefit and priority under 35 USC 119 and 120 to) PCT/IB2004/052142 filed on Oct. 19, 2004 which in turn claims the benefit and priority under 35 USC 119 to European Patent Application Serial No. 03103993.6 filed on Oct. 23, 2003, the entirety of both of which are incorporated by reference herein.
The invention relates to a method of converting a first motion vector field into a second motion vector field by determining a first one of the motion vectors of the second motion vector field, the first motion vector field being computed, on basis of a first image and a second image of a sequence of images, for a temporal position between the first image and the second image
The invention further relates to a conversion unit for converting a first motion vector field into a second motion vector field.
The invention further relates to an image processing apparatus comprising:
The invention further relates to a video encoding unit comprising such a conversion unit.
The invention further relates to a computer program product to be loaded by a computer arrangement, comprising instructions to convert a first motion vector field into a second motion vector field.
In the article “True-Motion Estimation with 3-D Recursive Search Block Matching” by G. de Haan et al. in IEEE Transactions on circuits and systems for video technology, vol. 3, no.5, October 1993, pages 368-379, a so-called motion estimation unit is disclosed. This motion estimation unit is designed to estimate motion vectors on basis of a sequence of input images. These estimated motion vectors can e.g. be used to compute an interpolated output image. A motion vector is related to the translation of a group of pixels of a first image of the sequence to a further group of pixels of a second image of the sequence. Typically the groups of pixels are blocks of pixels of e.g. 8*8 pixels. The set of motion vectors being computed on basis of a set of input images, or applicable to an output image, is called a motion vector field. The cited motion estimation unit is appropriate for real-time video applications. The recursive approach results into relatively consistent motion vector fields.
Estimation of the motion of relatively small objects that move with a high velocity, relative to the background, appears to be a problem. Particularly, in the case that the objects are smaller than the block size being applied by the motion estimation unit, the motion estimation unit occasionally estimates incorrect motion vectors. This is especially the case when the velocity of the objects is larger than the sample distance of the motion vector grid. As a consequence, relatively small objects sometimes disappear in the motion compensated output image. The motion compensated output image is based on temporal interpolation of a number of input images and the motion vector field.
It is an object of the invention to provide a method of the kind described in the opening paragraph whereby the second motion vector field better represents the motion of relatively small objects, compared to the first motion vector field.
This object of the invention is achieved in that the method comprises:
In general, motion compensation, i.e. temporal interpolation, is performed by systematically running true all the pixels of the output image, fetching pixel values from one or more of the original input images. By doing so, no holes will occur in the output image since every output pixel is assigned a value. However, in general, by running through every output pixel, there will be pixels in the original input images that do not contribute to the motion compensated output image. That means, that there are un-referenced in the input images. Typically, un-referenced pixels will occur in occlusion areas, which is a correct and desirable phenomena. The inventors have observed that un-referenced pixels also occur where the motion estimation unit fails to track relatively small objects with a high velocity. The invention is based on this observation. In two input images un-referenced pixels are searched and subsequently linked to each other by means of a candidate motion vector. If the candidate motion vector seems to be appropriate it is assigned to the previously estimated motion vector of which it is then assumed that it should be incorrect. The relation between the two groups of un-referenced pixels and the coordinates of the previously estimated motion vector is given by the spatial coordinates of the two groups of un-referenced pixels and the temporal relation between the two input images and the motion vector field.
Establishing the first group of un-referenced pixels and establishing the second group of un-referenced pixels can be performed independently of each other. However, preferably establishing the second group of un-referenced pixels is based on the first group of un-referenced pixels. An advantage of that dependence is increased efficiency. Another advantage is increased robustness.
In an embodiment of the method according to invention establishing the second group of un-referenced pixels is based a spatial environment of the first group of un-referenced pixels and on a particular motion vector which belongs to the first motion vector field and which is located in the spatial environment of the first group of un-referenced pixels. The first group of un-referenced pixels and the second group of un-referenced pixels must be located relatively close to each other. That means that given the first group of un-referenced pixels, the second group of un-referenced pixels can be found on basis of the spatial location of the first group of un-referenced pixels and a particular offset. That offset is preferably defined by a motion vector, e.g. from the first motion vector field or constructed by taking a particular motion vector from the first motion vector field and adding a predetermined delta to it. The offset might also be zero, i.e. a so-called null motion vector is applied.
In an embodiment of the method according to invention, establishing the second group of un-referenced pixels is based on computing overlap between the first group of un-referenced pixels and a candidate group of un-referenced pixels in the second image. Overlap is related to the first number of pixels of the first group of un-referenced pixels and the second number of pixels of the second group of un-referenced pixels. Besides that, the overlap can be related to the shape of the first group of un-referenced pixels and the shape of the second group of un-referenced pixels. Computing overlap means counting the number of pixels which are un-referenced in both images given a candidate motion vector which defines the relation between the first group of un-referenced pixels and the second group of un-referenced pixels. In the case that a relatively high overlap ratio is established, e.g. above 75 percent the candidate motion vector is assumed to be an appropriate one. Subsequently, the corresponding match error is computed.
Preferably the size of the first group of un-referenced pixels is not too small. Hence, the first number of pixels of the first group of un-referenced pixels is preferably above a first predetermined count threshold. Besides that, the size of the first group of un-referenced pixels is not too big. Hence, the first number of pixels of the first group of un-referenced pixels is preferably below a second predetermined count threshold. The same two conditions are preferably fulfilled for the second group of un-referenced pixels. For a standard definition video image typical values are: first predetermined count threshold equals 4*4 pixels; second predetermined count threshold equals 10*10 pixels.
In an embodiment of the method according to invention, establishing the match error comprises computing differences between respective pixel values of the first and second group of un-referenced pixels. For example, the match error might be the Sum of Absolute Difference (SAD). This match error is a relatively good measure for establishing a match between images parts and which does not require extensive computations.
It is a further object of the invention to provide a conversion unit of the kind described in the opening paragraph whereby the second motion vector field better represents the motion of relatively small objects, compared to first motion vector field.
This object of the invention is achieved in that the conversion unit comprises:
It is a further object of the invention to provide an image processing apparatus of the kind described in the opening paragraph whereby the temporarily interpolated output images represent small objects having a relatively high velocity, relatively well.
This object of the invention is achieved in that the conversion unit comprises:
The image processing apparatus may comprise additional components, e.g. a display device for displaying the output images. The image processing unit might support one or more of the following types of image processing:
The image processing apparatus might e.g. be a TV, a set top box, a VCR (Video Cassette Recorder) player, a satellite tuner, a DVD (Digital Versatile Disk) player or recorder.
It is a further object of the invention to provide a computer program product of the kind described in the opening paragraph whereby the second motion vector field better represents the motion of relatively small objects, compared to first motion vector field.
This object of the invention is achieved in that the computer arrangement comprises processing means and a memory, the computer program product, after being loaded, providing said processing means with the capability to carry out:
It is a further object of the invention to provide a video encoding unit of the kind described in the opening paragraph having an improved compression rate.
This object of the invention is achieved in that the conversion unit comprises:
Because the eventual motion vector field better represents the actual motion, the video data can be compressed with more efficiency. The residue is less.
Modifications of the conversion unit and variations thereof may correspond to modifications and variations thereof of the image processing apparatus, the method, the video encoding unit and the computer program product, being described.
These and other aspects of the conversion unit, of the image processing apparatus, of the method and of the computer program product, according to the invention will become apparent from and will be elucidated with respect to the implementations and embodiments described hereinafter and with reference to the accompanying drawings, wherein:
Same reference numerals are used to denote similar parts throughout the figures.
A brief inspection of the two images 200, 202 shows that these two images comprise a number of relatively large regions or groups of un-referenced pixels 204-212, 214-222. The shape and size of the different groups of un-referenced pixels 204-212 of the first 200 one of the two images match relatively well with the shape and size of the different groups 214-222 of the second 202 one of the two images. E.g. a first group of un-referenced pixels 204 of the first 200 one of the two images matches relatively well with a second group of un-referenced pixels 214 of the second 202 one of the two images. Also, a third group of un-referenced pixels 210 of the first 200 one of the two images matches relatively well with a fourth group of un-referenced pixels 220 of the second 202 one of images. The method and conversion unit 300 according to this invention are based on this observation. The method of conversion comprises finding correlated groups of un-referenced pixels in subsequent images which fulfill a number of conditions, like having a size being not too big and not too small, and being located within a spatial environment of each other. The estimated difference between a first spatial location of a first group of un-referenced pixels 204 and a second spatial location of a second group of un-referenced pixels 214 represents a candidate motion vector linking these two groups of un-referenced pixels 204, 214.
A further observation of the two images of
With the conversion is meant that a number of motion vectors of the first motion vector fields MVF1 are updated, i.e. replaced by, new motion vectors. Typically, most of the motion vectors of the second motion vector field MVF2 are equal with the respective motion vectors of the first motion vector field MVF1. Typically, only a relatively small number of motion vectors of the second motion vector field MVF2 differ from the respective motion vectors of the first motion vector field MVF1. The motion vectors which have been updated correspond to the movement of relatively small objects. Although, typically only a minor number of motion vectors is updated the eventual result in an interpolated output image might be big. It can be a difference between a football visible in an interpolated output image and the football being invisible in another interpolated output image. Or it can be a difference between a soccer-player being visible or not.
The conversion unit 300 comprises:
The working of the conversion unit 300 is as follows. The conversion unit 300 is provided with the first motion vector field MVF1 at its first input connector 310. On basis of the temporal relation between the first motion vector field MVF1 and the first image 100 the first establishing unit 302 is arranged to determine which of the pixels of the first image are un-referenced. Notice that the first motion vector field MVF1 belongs to t=n+a and the first image belongs to t=n. Being un-referenced means that there is no motion vector in the first motion vector field MVF1 which starts or stops at those pixels. A first intermediate result of this determination process is a binary map of pixels being referenced and pixels being un-referenced (see also
Optionally, the searching for un-referenced pixels is controlled by a first investigation of the consistency, i.e. continuity of the first motion vector field MVF1. The probability of finding un-referenced pixels is relatively high in the spatial environment of a discontinuity in the first motion vector field MVF1. Besides that, the match errors of the respective motion vectors of the first motion vector field may be applied to control the searching for un-referenced pixels. The probability of finding un-referenced pixels is relatively high in the spatial environment of a motion vector having a relatively high match error.
In a similar way, the second establishing unit 304 is arranged to determine which of the pixels of the second image are un-referenced. (See also
Having established the first group of un-referenced pixels and a binary map of referenced and un-referenced pixels for the second image 104 the conversion unit 300 starts investigating whether the first group of un-referenced pixels can be matched with a second group of un-referenced pixels in the second image 104. This investigation is based on the spatial location of the first group of un-referenced pixels and a number of spatial offsets. A first one of the spatial offsets equals zero. A second one of the spatial offsets corresponds to a motion vector of the first motion vector field MVF1. A third one of the spatial offsets corresponds to the latter motion vector combined with a delta. For each of the spatial offsets the first group of un-referenced pixels is compared with the binary map of the second image 104. Comparing in this sense means a kind of template matching. In other works, for each spatial offset the overlap between the first group of un-referenced pixels and the “1” values of the binary map of the second image 104 are computed. An spatial offset corresponding to an overlap which is more than 75 percent is assumed to be appropriate as a candidate motion vector.
This candidate motion vector is subsequently evaluated by means of the computing unit 306. The computing unit 306 computes a match error on basis of the spatial offset being provided by the second establishing unit 304, the pixel values of the first input image 100 being provided by means of the third input connector 314 and the pixel values of the second input image 104 being provided by means of the fourth input connector 312. The pixel values can e.g. represent luminance and/or chrominance. Typically, the pixel values of the pixels of the first group and second group of un-referenced pixels are applied.
Then the computed match error and the candidate motion vector are provided to the comparing unit 308. The comparing unit 308 compares the match error with a predetermined match threshold T1, being provided by means of the fifth input connector 322 or derived from external input being provided by the fifth input connector 322. A typical value of the predetermined match threshold T1 equals 48 in the case that the number of luminance levels of the images equals 256 and the size of the first group of un-referenced pixels equals 16 pixels. If the computed match error is below the predetermined match threshold T1, then the candidate motion vector is assigned to the appropriate motion vectors of the second motion vector field MVF2. The coordinates of the appropriate motion vectors are determined on basis of the temporal position of the second motion vector field MVF2 (n+a), the spatial location of the first group of un-referenced pixels and based on the candidate motion vector. It will be clear that the size of the first group of un-referenced pixels is related to the number of the appropriate motion vectors being updated.
The first establishing unit 302, the second establishing unit 304, the computing unit 306 and the comparing unit 308 may be implemented using one processor. Normally, these functions are performed under control of a software program product. During execution, normally the software program product is loaded into a memory, like a RAM, and executed from there. The program may be loaded from a background memory, like a ROM, hard disk, or magnetically and/or optical storage, or may be loaded via a network like Internet. Optionally an application specific integrated circuit provides the disclosed functionality.
The signal may be a broadcast signal received via an antenna or cable but may also be a signal from a storage device like a VCR (Video Cassette Recorder) or Digital Versatile Disk (DVD). The signal is provided at the input connector 410. The image processing apparatus 400 might e.g. be a TV. Alternatively the image processing apparatus 400 does not comprise the optional display device but provides the output images to an apparatus that does comprise a display device 406. Then the image processing apparatus 400 might be e.g. a set top box, a satellite-tuner, a VCR player, a DVD player or recorder. Optionally the image processing apparatus 400 comprises storage means, like a hard-disk or means for storage on removable media, e.g. optical disks. The image processing apparatus 400 might also be a system being applied by a film-studio or broadcaster.
Alternatively, the conversion unit 300 is applied in a video encoding unit. The conversion unit 300 according to the invention is particular of interest for the computation of B frames in e.g. MPEG encoding.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be constructed as limiting the claim. The word ‘comprising’ does not exclude the presence of elements or steps not listed in a claim. The word “a” or “an” preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements and by means of a suitable programmed computer. In the unit claims enumerating several means, several of these means can be embodied by one and the same item of hardware. The usage of the words “first”, “second” and “third”, etcetera do not indicate any ordering. These words are to be interpreted as names. That means e.g. that the first image may precede or succeed the second image.
Number | Date | Country | Kind |
---|---|---|---|
03103993 | Oct 2003 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB2004/052142 | 10/19/2004 | WO | 00 | 4/25/2006 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2005/041586 | 5/6/2005 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5398068 | Liu et al. | Mar 1995 | A |
5477272 | Zhang et al. | Dec 1995 | A |
5619268 | Kobayashi et al. | Apr 1997 | A |
6307888 | Le Clerc | Oct 2001 | B1 |
20010002922 | Hayashi | Jun 2001 | A1 |
20010008545 | Takeda et al. | Jul 2001 | A1 |
Number | Date | Country |
---|---|---|
0395271 | Oct 1990 | EP |
1128678 | Aug 2001 | EP |
2363274 | Dec 2001 | GB |
2000-69487 | Mar 2000 | JP |
1997-2967 | Mar 1997 | KR |
2003-82794 | Oct 2003 | KR |
Number | Date | Country | |
---|---|---|---|
20070081096 A1 | Apr 2007 | US |