1. Field of the Invention
Embodiments of the present invention generally relate to a method and apparatus for camera projector system for enabling an interactive surface.
2. Description of the Related Art
Projection surfaces are widely used in classrooms and meeting rooms. For a surface to be interactive, it is typically required to engineer the surface with touch sensors. Interactivity includes, for example, touching virtual buttons on the screen, selecting items, using hands and fingers to paint or to write. Using a touch sensor on a surface has proven to be costly and prone to calibration and accuracy problems.
Therefore, there is a need for a method and/or apparatus for improving the interactive surface.
Embodiments of the present invention relate to a method and apparatus for enabling an interactive surface. The method includes determining pixels of a depth image relating to an object at least one of touching or in close proximity to a related surface, differentiating between a small and a larger cluster of pixels, determining smaller cluster of pixels to be a level1 blob and the larger cluster of pixels to be a level2 blob and declaring the level1 blob an object touching the surface, and computing the coordinates of the level1 blob and repeating the process to enable the interactive surface.
So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
Herein an interactive surface is any surface being used for presentation, such as, a white board, a projection screen, a black board, etc. In one embodiment, a camera projector system is utilized to enable interactivity on such a surface. The camera projector introduces a depth sensing capability. Thus, the depth data is processed to infer interactivity events, such as, hands/fingers touching the projection surface in order to facilitate interactions between the user and the system.
The camera projector system performs depth sensing and performs depth data analysis to determine user actions and support interactivity. To perform the depth sensing, a geometric triangulation approach is used. To perform the depth sensing, the system establishes pixel-wise correspondence between two or more views. In the stereo vision approach, for example, the views are the “left” and “right” images coming from two cameras observing the scene. In the structured light approach, one of the cameras is replaced with an active illumination source, such as, a laser stripe or a projector. In stereo vision, the pixel-wise correspondence needs to be established between two images, which may not be known ahead of time. In structured light, the correspondence of interest is between an a priori known pattern and an image of it captured through a camera.
In one embodiment, a projector may be used, such as, a Digital Light Processing (DLP) projection system, to facilitate both stereo vision and structured light. These systems have high projection frame rate and capability to project arbitrary images.
Utilizing high projector frame rates, a highly textured pattern is intermittently projected onto the surface. The highly textured pattern is then followed by its negative. In one embodiment, the duration of the projections is short enough that the human eye integrates these two images to a “flat field”. It is possible to configure and to synchronize cameras such that the cameras capture one or more of the patterns. Such patterns may be injected into a presentation, movie, or any subject matter being projected.
In stereo vision, one may deploy a highly textured projection pattern, which may be invisible to the human eye, to ensure optimal performance of the depth estimation algorithm. The projected texture provides unique visual signatures for the stereo algorithm to match left and right image pixels.
For Depth analysis for gesture analysis, one may analyze the depth images for objects that are around the size of a human hand or fingers, which are touching or closely hovering over the projection surface.
Due to illumination conditions and visibility constraints, depth estimates may contain a significant amount of spurious measurements. Without a filtering operation, a touch detection system that relies on depth information may produce false alarms, i.e., report touch events even though there is no object near the surface. Utilizing a dual-threshold approach aims to mitigate these problems by defining two overlapping depth zones above the touch surface, and by imposing a number of constraints on allowable detections. For example, when a user touches the surface of interest with a finger or palm, the rest of user's hand or forearm will also be very close, but slightly further, from the surface. This observation relates to the physical characteristics of a human body.
At step 806, the method 800 filters out blobs with less than s1 pixel area size, which will be referred herein as level1 blob. The remaining blobs are referred to herein as level2 blobs. Hence, the method 800 determines that the level1 blob(s) is/are a touch point on the surface, which tends to be a smaller size cluster of pixels. Whereas, a level 2 blob tends to be a larger cluster of pixels.
If a level1 blob is indeed caused by the tip of a hand, finger, pointer and the likes, touching the surface, it is likely connected to an object, such as, the rest of the hand, forearm, pointer or the likes, which is extending slightly further from the surface. Accordingly, the method 800 determines the pixels which have depth within the [d3, d4] depth interval, wherein d1<d3<d2, and d2<d4 (i.e., a depth zone that overlaps with the first one but extends farther from the surface). The method 800 then may apply morphological operations to clean up the spurious pixels and to fill in holes. The method 800 then filters out blobs which have less than s2 pixel area size, wherein s1<s2. The remaining blobs are determined to be level2 blobs.
At step 808, the method 800 eliminates level1 blobs that are not connected to a level 2 blob and level1 blobs that are larger than their related level2 blob. Thus, the method 800 may apply a logical AND operation between level1 and level2 binary blob bitmasks to find out overlapping pixels.
At step 810, the method 800 computes a representative touch coordinate for the remaining level1 blob. Hence, the method 800 computes the centroid (x,y) of the pixels that have the lower 10-percentile of depth values found on that blob and declare a “touch event” at the (x,y) coordinate. The method 800 end at step 812.
In one embodiment, an intensity-based appearance model of the scene to infer foreground pixels is utilized. If there is indeed a user's hand or arm in the camera's field of view, we would expect the intensity model to detect a change in the scene as well. Accordingly, the method 800 may use the intensity images to build and maintain an appearance-based model of the scene. For instance, for each pixel, the method 800 may compute the running mean value of pixel intensity values over time. If the current pixel intensity deviates from the modeled value beyond a threshold, the method 800 may label the pixel as “foreground” and then apply morphological operations to clean up this foreground binary image to infer foreground blobs.
The method 800, then, may re-label the level1 blobs that overlap with the same foreground blob, which eliminates generating multiple level1 blobs from the same hand or finger. In yet another embodiment, the method 800 may analyze the depth range observed within each level1 blob. If the range is larger that a threshold, the level1 blob is suppressed or eliminated.
While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
This application claims benefit of U.S. provisional patent application Ser. No. 61/431,513, filed Jan. 11, 2010, which is herein incorporated by reference.
| Number | Date | Country | |
|---|---|---|---|
| 61431513 | Jan 2011 | US |