The invention relates to analyzing the motion of real objects in digital image sequences.
It is expedient to influence the image contents by real objects visible in the image, particularly in ‘augmented reality’ applications in which virtual objects are superposed in a real video feed. A simple example of such an application is described in the article by V. Paelke, Ch. Reimann and D. Stichling, “Foot-based mobile Interaction with Games”, ACE2004, Singapore, June 2004, in which a virtual football is intended to be struck by the real foot of the player. Equipment is needed for this purpose which determines the motion of the foot from the video image.
One of the methods known for this purpose determines edges in the video image and motion-analyzes the extracted edges based on this. In order to be able to determine the edge motion, a first step attempts to approximate the edges by polylines. This is also holds in the abovementioned article; see p. 2, left-hand column, paragraph under
Other methods can be found mainly in relation to two keywords: ‘tracking’ and ‘optical flow’. ‘Tracking’ also includes techniques which determine the motion of a camera and are thus not relevant in this case.
An overview of the prior art in the field of ‘tracking’ is provided in the technical report TR VRVis 2001 025, “State of the Art Report on Optical Tracking” by Miguel Ribo, Vienna 2001. For applications of the type mentioned above, all methods with specially prepared objects and all methods in which a model of the object to be followed has to be specified are excluded. The remaining methods either use edge-following or complex matrix operations to determine the motion by means of which minimal deviation of the image information is determined.
This also includes methods described in the article by C.-L. Huang, Y.-R. Choo and P.-C. Chung, “Combining Region-based Differential and Matching Algorithms to Obtain Accurate Motion Vectors for Moving Object in a Video Sequence”, ICDCSWO2, 2002. The Horn-Schunk or Lucas-Kanade methods, which are optical flow methods and specified there, are known. They determine gradients by forming differentials and require substantial computational power. The same holds for the methods illustrated in the article by B. Galvin, B. McCane, K. Novins, D. Mason and S. Mills, “Recovering Motion Fields: An Evaluation of Eight Optical Flow Algorithms”, BMVC98, 1998. Most of the methods mentioned additionally have the disadvantage that they are sensitive toward image interference and require further steps to compensate for the latter.
Motion analysis for sequential video images is also used in MPEG-encoding, where the motion of pixel blocks of a fixed size is determined for compression purposes. In this case it is irrelevant whether this motion corresponds to the motion of image objects; for this reason these methods cannot be used within the scope of ‘augmented reality’. By contrast, the methods described in detail in the following are substantially simpler, faster and more robust than the previously known methods. They do not require a model of the object partially or wholly visible in the image and do not demand vectorization of edges; furthermore they are relatively insensitive toward image noise and other interference which disrupt the edge image in the case of conventional edge recognition.
It concerns a method for recognizing the motion of image sections in digital image sequences, in which the average value of the displacement vectors from every pixel to adjacent pixels is determined after a contour accentuation in a selected section and thereupon the average value of all these displacement vectors is formed and used as a displacement vector for an object visible in the section.
Each individual image of the image sequence is pretreated by known filters before applying the method described in more detail below. These filters are used for reducing the color of the image pixels, reducing the noise and accentuating contours or edges. The type and scope of the pretreatment is intended to be determined depending on the application. When applied in a handheld unit such as a mobile telephone with a camera, it was advantageous to apply all of the following filters.
Colored output images are firstly converted to grayscale values (for example by averaging all color channels of each pixel). Optionally, it is possible to smooth very noisy images by applying a Gaussian filter; by way of example, this can occur if a sensor determines that the surroundings are not very bright. Subsequently, an edge, image is generated from the grayscale image by contour filters. In practice, it is conventional to use the Sobel filter for this purpose. Alternatively, it is also possible to use the Prewitt filter, the Laplace filter or comparable filters for generating an edge image.
In one instance of the invention, a pure black and white image with 1 bit per pixel is used, that is to say the brightness values are reduced to one bit, so that each pixel is, in a binary fashion, either white (0 or “no edge”) or black (1 or “edge”). The threshold value for this conversion can either be fixedly predetermined, or it can be determined relative to the mean or median value of the grayscale. In the following, pixels with the value 1 are described as edge pixels for simplicity, even if the invention does not vectorize edges,
but rather allows determination of the motion without reconstructing edges from pixel motion. Instead of explicitly determining edges, the motion of an image section in two successive images (for example, for implicitly recognizing collision with a virtual object) is calculated according to the invention by two nested steps which only refer to the pixels of the image. These pixels are preferably the above-mentioned edge pixels.
In the following example, a square image section of five by five points is used for simplicity.
In a first variant of the invention, the motion is calculated for each edge pixel in the current image (1′, 2′, 3′ and 4′).
In this example the Moore neighborhood is used for this purpose, i.e. all positions which are directly or diagonally adjacent to the current position and the current position itself, that is to say pixels are considered at a predetermined distance. Edge pixel 1′ has two adjacent edge pixels in the preceding image (1 and 2). The averaged motion M1′ of 1′ is thus:
Correspondingly, the other pixels have the following motion:
In order to calculate the overall motion M of the image section, the average of all the individual motions is determined:
It can be seen that a strong upward motion (−0.5) and a very small motion to the right (0.083) were detected.
All points whose pixel value has changed are used in an alternative method of calculation.
Here, only those points are taken into account which are within a predetermined neighborhood of the respective point; in this example these are two pixels in the x-direction and y-direction, that is to say a Moore neighborhood with a range of 2. It is for this reason that the vectors for the distance from the new point 2 to the old point 4, and the new point 3 to the old point 1, are discarded and surrounded by brackets. Compared to the previous variant, only pixel values that changed were taken into account but a larger area was considered.
A new average is formed from the average values of the points in an analogous manner and it already represents the result. This also correctly results in an upward motion. The value of the actual displacement is 0/−1.
Black and white images were used in the two examples, with the black pixels corresponding to edges due to filters, and only these black pixels being taken into account. However, the invention is not limited to this. If a higher accuracy is required in return for higher computational power, the method can also be applied to grayscale or color images.
In this case, all pixels of the preceding image which are equivalent to a pixel in the current image are firstly determined for said pixel in the current image. In the case of grayscale images, these are pixels having the same grayscale value with respect to a predetermined boundary of the deviation; in the case of 8 bit or 256 grayscale values these could be 8 grayscale values, for example. Alternatively, it is possible that the grayscale image is firstly quantized by only using 16 grayscale values from the possible 256 grayscale values, with the other values being rounded to these 16 values, and then using exact equality of the pixel values. Both methods result in slightly different equivalences, because the quantization is too different. In the examples illustrated above, 1 bit quantization was carried out after edge filtering and prior to determining equivalent pixels, and the white pixels remained unused. Thus, it was firstly quantized and then only pixels in a predetermined interval were used, in this case only the black pixels. Since the color or grayscale value only corresponds to 1 bit in this case, only the equality of pixel values is meaningful. The invention can be used in an ‘augmented reality’ application to effect interaction between real and virtual objects with little computational complexity. By way of example, a mobile telephone is used which comprises a camera on the rear side and a screen on the front side, and the camera image is reproduced on the screen so that it seems as if the background scene can be seen through the screen. The virtual object is a ball, as in the case of the article mentioned in the introduction. The invention specifies a substantially improved method for recognizing the movement of a real foot and the strike in the direction of and onto the virtual ball. By contrast, the known methods described in the abovementioned article could only be used by delegation in real time to a more powerful computer connected over a network.
Number | Date | Country | Kind |
---|---|---|---|
10 2006 009 774.2 | Mar 2006 | DE | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2007/051847 | 2/27/2007 | WO | 00 | 1/9/2009 |