The present disclosure relates to a method and system for recognizing an object, and more particularly, a method and system for recognizing one or more fingers of a user.
A vision-based hand gesture recognition system, as reported in the proceeding of Computer Vision in Human-Computer Interaction, ECCV 2004 Workshop on HCl, May 16, 2004, entitled, “Hand Gesture Recognition in Camera-Projector System” authored by Licsár and Szirányi, provides a hand segmentation method for a camera-projector system to achieve an augmented reality tool. Hand localization is based on a background subtraction, which adapts to the changes of the projected background. Hand poses are described by a method based on modified Fourier descriptors, which involves distance metric for the nearest neighbor classification.
U.S. Pat. No. 6,128,003 A discloses a hand gesture recognition system and method that uses skin color to localize a hand in an image. Gesture recognition is performed based on template matching. Models of different hand gestures are built so that in real time, the unknown rotational vector can be compared to all models, the correct hand shape being the one with the smallest distance. To make model search more efficient, these models are arranged in a hierarchical structure according to their similarity to one another.
U.S. Pat. No. 7,599,561 B2 discloses a compact interactive tabletop with projection-vision system particularly for front-projected vision-based table systems for virtual reality purpose. The system utilizes an infrared LED illuminant to generate a finger shadow and use the shadow to detect whether finger touches the table surface or hover over the table surface.
“Fast tracking of hands and finger tips in infrared images for augmented desk interface” published in IEEE International Conference on Automatic Face and Gesture Recognition, March 2000, Sato, Kobayashi and Koike introduced augmented desk interface system in which a user can use natural hand gestures to simultaneously manipulate both physical objects and electronically projected objects on a desk. An infrared camera is used to detect light emitted from a surface by setting the temperature range to approximate human body temperature (30° C. to 34° C.), so that image regions corresponding to human skin appear particularly bright in the images from the infrared camera.
In accordance with an exemplary embodiment, a method is disclosed for recognizing an object, the method comprising: emitting an array of infrared rays from an infrared emitter towards a projection region, the projection region including a first object; generating a reference infrared image by recording an intensity of ray reflection from the projection region without the first object; generating a target infrared image by recording the intensity of ray reflection from the projection region with the first object; comparing the target infrared image to the reference infrared image to generate a predetermined intensity threshold; and extracting the first object from the target infrared image, if the intensity of ray reflection of the target infrared image of the first object exceeds the predetermined intensity threshold.
In accordance with an exemplary embodiment, a system is disclosed for recognizing an object, the system comprising: an infrared emitter configured to emit an array of infrared rays from an infrared emitter towards a projection region, the projection region including a first object; an infrared camera for recording an intensity of ray reflection from the projection region without the first object as a reference infrared image and the intensity of ray reflection of the projection region with the first object infrared image of the first object as a target infrared image; and a processor for: comparing the target infrared image to the reference infrared image to generate a predetermined intensity threshold; and extracting the first object from the target infrared image, if the intensity of ray reflection of the target infrared image of the first object exceeds the predetermined intensity threshold.
In accordance with an exemplary embodiment, a non-transitory computer readable medium containing a computer program having computer readable code embodied to carry out a method for recognizing an object is disclosed, the method comprising: emitting an array of infrared rays from an infrared emitter towards a projection region, the projection region including a first object; generating a reference infrared image by recording an intensity of ray reflection from the projection region without the first object; generating a target infrared image by recording the intensity of ray reflection from the projection region with the first object; comparing the target infrared image to the reference infrared image to generate a predetermined intensity threshold; and extracting the first object from the target infrared image, if the intensity of ray reflection of the target infrared image of the first object exceeds the predetermined intensity threshold.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.
The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.
In accordance with an exemplary embodiment, a system and method having an interactive user interface, which can enable users to actively operate digital contents on a surface in which the image is projected. For example, the system and method can allow a presenter or speaker to use his/her hands/fingers to directly interact with a projected image, and the system can recognize natural gesture-based commands.
One of the bottlenecks for this system is how to obtain clear hand segmentation from a changeable background under a variable light condition due to that the various objects project onto the surface. In addition, previous methods of color segmentation or background subtraction simply do not perform well.
In accordance with an exemplary embodiment, a method and system are disclosed for detecting and segmenting the hands from a variable background. With this technique, an interactive system is disclosed with a natural user interface to control and manipulate contents on the projected image on a surface, such as a table, a whiteboard, or a wall.
In accordance with an exemplary embodiment, the RGB Camera 110 can be used for acquiring a color image 112 of a projection region 152. The projection region 152 preferably includes at least a portion of the image or content cast by the projector 140 and the presenter or speaker. The IR camera 120 can acquire the invisible infrared rays in the form of an IR image 122 generated by the IR emitter 130. In accordance with an exemplary embodiment, an IR pass filter (not shown) can increase the contrast of the IR image 122. The infrared (IR) emitter 130, for example, can be a laser diffractive optical element (DOE) similarly to that in an array of IR LEDs or a Kinect device, which is configured to cast a large number of pseudo-randomly arranged rays into an arbitrary projection region such as the surface of a table, a whiteboard or a wall 150. In accordance with an exemplary embodiment, for example, the projector 140 can project or casts the contents onto the table, whiteboard, or wall, and the rays can be only observed by the IR camera 120 through the IR pass filter.
In accordance with an exemplary embodiment, the RGB camera 110 and the IR camera 120 can be physically separate units, which can be pre-calibrated to within an acceptable pixel error so that the image coordinates of the IR camera can be precisely mapped to the image coordinate of RGB camera using a calibration module 160. The calibration module 160 can be configured to determine the relative positions of the RGB cameral 110 and the IR camera 120 based on one or more calibration parameters 162.
As shown in
In accordance with an exemplary embodiment, each of the modules 170, 180, 190, 192, 194, 196 preferably includes a computer or processing device 102 having a memory, a processor, an operating system, one or more software applications for executing an algorithm, and a graphical user interface (GUI) or display. It can be appreciated that the modules 170, 180, 190, 192, 194, 196 can be part of a standalone computer, or can be contained within one or more computer or processing devices 102.
In accordance with an exemplary embodiment, the IR emitter (or illuminator) 130 can be configured to emit an IR lights or an IR dot pattern 300, which can include, for example, a large number of pseudo-randomly arranged rays 310. The pseudo-randomly arranged rays 310 can be emitted in an array of rays forming any shape such as circular array and/or a rectangular array.
In accordance with an exemplary embodiment, for example, as shown in
The RGB image 112, the IR image 122, and the reference image 124 are input into a computer processor having one or more software application and/or algorithms for processing. The one or more software applications and/or algorithms can include, for example, a background subtraction module 410, a connected component analysis module 420, and a quasi-connected component analysis module 430.
In accordance with an exemplary embodiment, the background subtraction module 410 is configured to receive the IR image 122, for example, an IR image with a IR dot pattern, which can be combined the input from the reference image 124. The reference image 124 can be subjected to an optical threshold detection 440 to generate an optimal threshold 442. Alternatively, the reference image 124, the optimal threshold 442, and the IR image 122 can be used to perform background subtraction 410, which can be directly combined with the IR image to generate a binarized image 446 as disclosed herein. After a connected component analysis process 420, a rough hand segmentation can be extracted from the binarized image 446. With calibration parameters 450, the rough hand segmentation can be mapped to the RGB image 112 so that an approximate region of hands in the RGB image 112 can be known through the module 448. After a quasi-connected component analysis 430 on the approximate hand region in the RGB image 112, an accurate hand blob image 460 can be generated.
In accordance with an exemplary embodiment, the hand segmentation module 170 can be configured to utilize the difference between the intensity of the ray reflection on the hand surface of an individual and the intensity of ray reflection of other parts of the projection region 152 as illustrated in
For example,
In accordance with an exemplary embodiment, to detect and track, for example, a hand using a hand detection module 190, or fingers during human-computer interaction using a finger tip detection module, a hand model 900 is needed. In accordance with an exemplary embodiment, for example, the hand model 900 can include a complex 3D hand model, a model with histogram of the image gradient direction, and/or a skeleton model as shown in
In accordance with an exemplary embodiment, for example, a hand model 900 for detecting finger tips as shown in
In accordance with an exemplary embodiment, the model hand pose 1130 and the hand pose and width 1140 can be used to generate a localized palm by circle fitting in step 1180 as shown in
In accordance with an exemplary embodiment, for example, to find one or more finger tips 1170, first, hand convexities (candidate tips) 1122 and convexity defects (candidate finger roots) 1124 of the hand are identified. In step 1126, the convexity points are identified and in step 1150, the convexity points have a depth of root less than a predetermined depth threshold can be removed and the tip candidates are identified (step 1190). In step 1168, unlike finger tips can be removed to extract finger tips in step 1170.
In accordance with an exemplary embodiment, in step 1128, the convexity points that have a depth of the corresponding roots, which are less than a predetermined depth threshold can be removed. In step 1160, a finger with a deepest root can be determined and other fingers that point in an opposite direction of the finger can be eliminated. In step 1162, the gravity center for the pixels between the lower boundary and the upper boundary of the palm can be found, which will be the center of the palm. For example, the radius of the palm can be the distance between the gravity center and the point of the deepest root. In step 1164, any tip points, which are close to the palm region within a given threshold, can be eliminated. In step 1166, any tips of fingers which point in an opposite direction of the deepest root can be eliminated and in step 1168, the unlike tips can be removed to extract finger tips (step 1170).
While, in the scenario of whiteboard application, for example, the camera can takes the user's full hand and the hand segmentation module as disclosed gives the blob image with a full hand. In accordance with an exemplary embodiment, since the hand is convex, the shape of the hand can be approximated by ellipse or elliptical parameters, which can provide an approximated hand pose.
In accordance with an exemplary embodiment, a touch and hover detection module 180 can be used to detect whether a finger tip contacts a touch surface. In accordance with an exemplary embodiment, once a touch incidence of a finger is detected, the finger and its associated hand can be tracked and the movement trajectory will be memorized for touch-based gesture recognition.
For example, in accordance with an exemplary embodiment, the features 1200 around the finger tip 1210 can provide differentiation between a touch or touching a surface (image on left) and hovering over a surface (image on right) is shown in
In accordance with an exemplary embodiment, the touch and hover detection module 180 can use a machine-learning algorithm, for example, Adaboost for training a classifier to determine touch and hover. In accordance with an exemplary embodiment, the pixels can be taken around the finger edges near the tip and the seven Haar-like features of each pixel are extracted. In a classification stage, first, a contact area between a finger and the touch surface can be defined and shown in
As shown in
In accordance with an exemplary embodiment, the hand/finger tracking approach 194 as disclosed herein can fall in the framework of a global nearest neighbor (GNN). The hand track updating process can choose the best observation that associates track. The observation is the position of the center of hand palm. The procedure of tracking is comprised of two major steps: gating and association as shown in
As shown in
In accordance with an exemplary embodiment, the palm predication position can be constrained by the predicted positions of all finger tips associated with it, as shown in the hand model depicted in
In accordance with an exemplary embodiment, all measurements that satisfy the gating relationship fall within the gate and are considered for track update. When a single measurement is gated to a single track, an assignment can be immediately made. However, when multiple measurements fall within a single gate, or when a single measurement falls within the gates of more than one track, Murphy's optimal solution can be used to solve the linear assignment problem by minimizing the summed total distance in the following cost matrix:
and dij is a norm of residual vector related to prediction and measurement from Kalman filtering, in which dij has χ2 distribution for correct observation-to-track pairings with M degrees of freedom and allowable probability p=1−pd of a valid observation falling outside the gate, where Pd is the probability for correct detection.
In accordance with an exemplary embodiment, once the hand segmentation on an IR dots pattern image from IR camera is obtained, the system 100 needs to know the color information of its correspondent pixels on a RGB image 112 from the RGB camera 110. Thus, a geometrical mapping between the IR camera 120 and the RGB camera 110 is needed. As can be seen from
A and B can be used for projecting the hand segment on the IR image to an approximate region on the RGB image, then shape matching can be used to align the hand profile on the both images.
In accordance with an exemplary embodiment as shown in
In accordance with an exemplary embodiment, touch screen gesture recognition 196 can be used to recognize the gestures of finger movement on the surface. For example, a feature vector including 11 features, i.e., v=[f1 . . . f11], can be used for recognizing touch screen hand gestures.
In accordance with an exemplary embodiment, a non-transitory computer readable medium containing a computer program having computer readable code embodied to carry out a method for recognizing an object is disclosed, the method comprising: emitting an array of infrared rays from an infrared emitter towards a projection region, the projection region including a first object and one or more second objects; recording an optical image of the first object and the one or more second objects; recording an infrared image of the first object and the one or more second objects, the infrared image including an image intensity of the infrared rays on the first object and the one or more second objects; and determining a location of the first object relative to the one or more second objects based on a difference between the image intensity of the infrared rays and the optical image.
The computer usable medium, of course, may be a magnetic recording medium, a magneto-optic recording medium, or any other recording medium which will be developed in future, all of which can be considered applicable to the present invention in all the same way. Duplicates of such medium including primary and secondary duplicate products and others are considered equivalent to the above medium without doubt. Furthermore, even if an embodiment of the present invention is a combination of software and hardware, it does not deviate from the concept of the invention at all. The present invention may be implemented such that its software part has been written onto a recording medium in advance and will be read as required in operation.
It will be apparent to those skilled in the art that various modifications and variation can be made to the structure of the present invention without departing from the scope or spirit of the invention. In view of the foregoing, it is intended that the present invention cover modifications and variations of this invention provided they fall within the scope of the following claims and their equivalents.