The present invention relates to the field of gesture based control of electronic devices.
Specifically, the invention relates to computer vision based hand gesture recognition.
The need for more convenient, intuitive and portable input devices increases, as computers and other electronic devices become more prevalent in our everyday life.
Recently, human gesturing, such as hand gesturing, has been suggested as a user interface input tool in which a hand gesture is detected by a camera and is translated into a specific command. Gesture recognition enables humans to interface with machines and interact naturally without any mechanical appliances. The development of alternative computer interfaces (forgoing the traditional keyboard and mouse), video games and remote controlling are only some of the fields that may implement human gesturing techniques.
Currently, gesture recognition requires robust computer vision methods and hardware. Recognition of a hand gesture usually requires identification of an object as a hand and tracking the identified hand to detect a posture or gesture that is being performed.
Known gesture recognizing systems detect a user hand by using color or shape detectors. However, such detectors are currently not reliable since they do not accurately identify a hand in all environments. Also tracking the hand through some environments is a challenge for these detectors.
Other known detectors may include contour detectors. These known detectors, which utilize edge detecting algorithms, are currently not suited for hand gesture recognition, among other reasons, because edges are difficult to detect during movement of an object.
Thus, there are still many unanswered challenges associated with the accuracy and usefulness of gesture recognition software.
The present invention provides a method and system for accurate computer vision based hand identification.
According to one embodiment there is provided a method for computer vision based hand identification, the method comprising: obtaining an image of an object; detecting in the image at least two different types of shape features of an object; obtaining information of each type of shape feature; combining the informations of each type of shape features to obtain combined information; and determining that the object is a hand based on the combined information.
According to one embodiment there is provided a method for computer vision based hand identification, the method comprising: obtaining an image of an object; applying on the image at least two types of detectors, said detectors configured to detect shape features of an object, to obtain information from the two detectors; combining the information from the two detectors to obtain combined information; and determining that the object is a hand based on the combined information.
According to one embodiment shape features include a shape boundary feature (such as the contour of the object). According to one embodiment shape features include appearance features enclosed within the boundaries of the object.
According to one embodiment combining the information from the two type of features or detectors comprises assigning a weight to the information from each type of feature or detector.
According to one embodiment the method comprises applying a machine learning algorithm to obtain the contour information. According to some embodiments the method includes applying a distance function to obtain the contour information. According to some embodiments the weight assigned to the information from each detector is based on the reliability of each detector. The reliability of each detector may be specific to a frame environment.
In some embodiments the method includes combining the information from the two detectors and a value of a parameter which is unrelated to the detectors, to obtain combined information; and determining that the object is a hand based on the combined information.
According to some embodiments the parameter that is unrelated to the detectors comprises pattern of object movement or a number of frames in which an object is detected. The pattern of object movement may include movement of the object in a predefined pattern.
According to one embodiment the method may include presenting to the machine learning algorithm a hand and a non-hand object. The non-hand object may include a manipulated hand, e.g., a partial view of a hand.
According to one embodiment there is provided a method for computer vision based hand identification, which includes: obtaining images of an object; applying a contour detector to find contour features of the object; comparing the contour features of the object to a contour model of a hand to obtain a vector of comparison grades; applying a machine learning algorithm to obtain a vector of numerical weights; calculating a final grade from the vector of comparison grades and the vector of weights; and if the final grade is above a predetermined threshold identifying the object as a hand.
According to one embodiment the method includes subtracting two consecutive images of the object to obtain a motion image of the object; and applying the contour detector to find contour features in the motion image.
According to another embodiment the method includes applying on the image of the object an algorithm for edge detection to obtain an edge image; and applying the contour detector on the edge image.
According to one embodiment the method includes presenting to the machine learning algorithm at least one hand and at least one non-hand object, which may be a manipulated hand, such as a partial view of a hand.
According to one embodiment the method includes comparing the contour features of the object to the contour model of a hand comprises applying a distance function. The method may also include applying on the images another detector in addition to the contour detector to obtain information from two detectors; combining the information from the two detectors to obtain combined information; and determining that the object is a hand based on the combined information. According to one embodiment the other detector is a detector to detect appearance features enclosed within contours of the object.
According to one embodiment of the invention there is provided a system for computer vision based hand identification, the system comprising a detector to detect at least two different types of shape features of an object or at least two types of detectors, said detectors configured to detect shape features of an object from an image of the object, to obtain information of the different types of shape features or from the two detectors; and a processor to combine the informations to obtain combined information and to determine that the object is a hand based on the combined information.
The system may include an image sensor to obtain an image of the object, said image sensor in communication with the at least two types of detectors. Further, the system may include a processor to generate a user command based on image analysis of the object determined to be a hand. The user command may be accepted by a device such as a TV, DVD player, PC, mobile phone, camera, STB (Set Top Box) and a streamer.
The invention will now be described in relation to certain examples and embodiments with reference to the following illustrative figures so that it may be more fully understood. In the drawings:
Methods according to embodiments of the invention may be implemented in a user-device interaction system which includes a device to be operated by a user and an image sensor which is in communication with a processor. The image sensor obtains image data (typically of the user) and sends it to the processor to perform image analysis and to generate user commands to the device based on the image analysis, thereby controlling the device based on computer vision.
According to embodiments of the invention the user commands are based on identification and tracking of the user's hand. The processor or image analyzing module of the system includes, according to embodiments of the invention, a detector capable of detecting at least two different types of object feature.
According to one embodiment the image analyzing module comprises at least two types of detectors. Information from the at least two types of detectors is combined and the combined information is used to identify the user's hand in the images obtained by the image sensor. Once a user's hand is identified it can be tracked such that hand gestures may be identified and translated into user operating and control commands.
Detecting more than one type of shape feature or the use of more than one type of detector, according to embodiments of the invention, raises the probability of correct identification of a hand, rendering the system more reliable and user friendly. In one example, where detecting two types of shape features or the use of two detectors may assist in correct identification of a moving object, shape boundary features are detected as well as appearance features. For example, a contour detector is used together with an appearance feature detector.
In this example, in a case where there is a large amount of light in the background (e.g. there is an open window in the background of the user) contour features become very clear but appearance features within the contour shape are less visible. On the other hand, in a background having elements that are similar to hand elements, contour features will be less visible than appearance features within the contour shape. Thus, combining a contour detector and an appearance feature detector raises the probability of correct identification of a hand in all situations that may occur while a user is operating a system.
Methods for computer vision based hand identification according to embodiments of the invention include obtaining an image (or images) of an object in a field of view by an image sensor, such as a standard 2D camera. The image sensor may be associated with a processor and a storage device for storing image data. The storage device may be integrated within the image sensor or may be external to the image sensor. According to some embodiments image data may be stored in the processor, for example in a cache memory. In some embodiments image data of the field of view is sent to the processor for analysis. A user command is generated by the processor, based on the image analysis, and is sent to a device, which may be any electronic device that can accept user commands, e.g., TV, DVD player, PC, mobile phone, camera, STB (Set Top Box), streamer, etc. According to one embodiment the device is an electronic device available with an integrated standard 2D camera. According to other embodiments a camera is an external accessory to the device. According to some embodiments more than one 2D camera is provided to enable obtaining 3D information. According to some embodiments the system includes a 3D camera.
The detectors applied according to some embodiments of the invention are detectors to detect object shape features. Object shape features typically include shape boundary features such as contours (e.g., the outer line of the object) and appearance features, such as features enclosed within the contours of the object. Object features may also include other physical properties of an object.
According to embodiments of the invention both different types of object features are detected in the same set of images and in some embodiments both detectors are applied on the same set of images rather than a first detector being applied on a set of images and the second detector being applied on the output of the first detector.
For example, a texture detector and edge detector may be used. If both specific texture and specific edges are detected in a set of images then an identification of a hand may be made.
One example of an edge detection method includes the Canny™ algorithm available in computer vision libraries such as Intel™ OpenCV. Texture detectors may use known algorithms such as texture detection algorithms provided by Matlab™.
In another example, a detector using an algorithm for calculating Haar features is applied together with a contour detector. Contour detection may be based on edge detection, typically, of edges that meet some criteria, such as minimal length or certain direction (examples of possibly used contour detectors are described with reference to
Other combinations of types of features or detectors may be used.
In some embodiments more than two detectors may be used.
According to some embodiments two detectors to detect shape features of an object may be assisted by one or more additional detectors which do not detect shape features of objects. For example, a motion detector may be applied in addition contour detector and an appearance detector. In this example both motion and specific appearance and contour information must be detected in a specific set of images in order to decide that an object is a hand. According to some embodiments motion information showing movement of an object in a predetermined specific pattern together with shape feature information may be used to determine that the object is a hand.
The weighted informations are combined (280) to obtain combined information.
According to one embodiment a method for combining information may include the following calculation:
combined information=first information*first weight+second information*second weight.
Based on the combined information a decision is made (290) either that the object is identified as a hand (222) or that it is not a hand and additional images are then processed.
According to one embodiment, the reliability of a detector or of a shape feature may be specific to a frame environment.
Reference is now made to
An image of an object is obtained (310); typically in one or more frames. A first type of feature is detected in the image (e.g., by applying a first type of detector on the image) (image may include a set of frames) (320) to obtain a first information (340) and a second type feature is detected (e.g., by applying a second type of detector (330) on the image) (e.g., on the same set of frames) to obtain a second information (350). Parameters of the frame or set of frames (e.g., motion, illumination) are quantified (360) and each type of feature or detector is assigned a reliability grade based on the frame parameters. For example, a texture detector (or texture features) may receive a low reliability grade in a high motion image since an object's texture may be blurred in such an image whereas a motion based contour detector (or contour features) may receive a high reliability grade for motion images. An appearance detector (or shape or appearance features) may receive a higher reliability grade than a contour detector (or contour feature) for a highly illuminated frame or set of frames, and so on.
The first information is assigned a weight (370) which is a function of the reliability grade of the first feature or detector for the current frame (or set of frames) and the second information is assigned a second weight (380), which is a function of the reliability grade of the second feature or detector for the current frame (or set of frames).
The weighted informations are combined (390). Based on the combined information a decision is made (391) either that the object is identified as a hand (392) or that it is not a hand and additional images are then processed.
According to some embodiments the process of identifying an object as a hand is assisted by additional parameters that are unrelated to the reliability of the detectors.
In
The weighted informations of the detectors and of the additional unrelated parameter are combined (490). Based on the combined information a decision is made (491) either that the object is identified as a hand (492) or that it is not a hand and additional images are then processed.
Although the embodiments described above relate to the use of two detectors, the methods described are also applicable for at least two different types of object features even if they are detected by a different number of detectors or by other means.
The method according to this embodiment includes the steps of obtaining images of an object (510); applying a contour detector to find contour features of the object (520); comparing the contour features of the object to a contour model of a hand to obtain a vector of comparison grades (530); applying a machine learning algorithm to obtain a vector of numerical weights (540); calculating a final grade from the vector of comparison grades and the vector of weights (550) and if the final grade is above a predetermined threshold (555) the object is identified as a hand (560), thus a hand is detected. If the final grade is below the predetermined threshold additional images are then processed.
As in the other embodiments described, once a user's hand is identified it can be tracked such that hand gestures may be identified and translated into user operating and control commands.
According to one embodiment both an object and a contour model of a hand can be represented as sets of features, each feature being a set of oriented edge pixels. A contour model of a hand may be created by obtaining features of model hands, which is a collection of multiple hands used to generate a model of a hand; randomly perturbing the features of the model hand; aligning the features and selecting the most differencing features using a machine learning techniques (e.g., as described below) out of the features of the model hand (e.g., selecting 100 most differencing features out of 1000 features) to generate a contour model of a hand. In addition, a weight and threshold may be calculated for each selected feature using the machine learning technique. The comparison of the object to the contour model (step 530) may be done, for example, by matching edge maps of the object and model (e.g., oriented chamfered matching). The matching may include applying a distance function. For example, a point on the contour of the object from within a region of interest may be compared to a centered model to obtain the distance between the two and an average distance may be calculated by averaging all the measured distances. If the distance is lower than the threshold calculated for that feature, the weight of that feature is added to the total rank of the matching. If the total rank is above a certain threshold, a hand object is detected (the object is identified as a hand).
According to one embodiment the machine learning process (step 540) includes receiving two inputs; one input is a training set which includes positive features and the other input is a training set which includes negative features. The learning process may result in a cascade of weak classifiers from which a strong classifier may be obtained. In addition a set of strong classifiers can be obtained from the learning process to compose a fast and robust final strong classifier.
According to one embodiment the positive features presented to the machine learning algorithm (used for example in step (540)) are hand features and the negative features are non-hand objects. Non hand objects may include any object, shape or pattern that is not a hand, as is typical in machine learning algorithms. However, according to one embodiment also a manipulated hand is presented to the learning algorithm as a non-hand object. A manipulated hand may include views of a hand but not a regular, full open hand. For example, a manipulated hand may include a first or a hand with only a few fingers folded. A manipulated hand may also include partial views of a hand (e.g., the base of a hand with only some of the fingers, a view of a longitudinal half of a hand, only the base of the hand, only the fingers, only some fingers and so on.). Since there are many objects in a user's environment having a contour that resembles a contour of a hand and a contour of partial views of a hand, teaching the system to treat these “partial hand objects” as “non-hand” objects may greatly contribute to the accuracy and reliability of the system.
Additionally, different hand postures or gestures may be detected by different detectors. Thus, each detector may be presented, as a positive feature, a hand in the specific posture of that detector, whereas hands in other postures are presented as negative features to that detector.
According to one embodiment, which is schematically illustrated in
According to other embodiments the contour detector may be applied on a single image. For example, an edge detector may be used (typically on a single frame) to obtain an edge image of an object and the contour detector may then be applied on the edge image to obtain contour features of the object.
The contour information may be combined with information from another detector (such as with an appearance detector) to identify the object as a hand, as described above.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IL11/00944 | 12/15/2011 | WO | 00 | 6/12/2013 |
Number | Date | Country | |
---|---|---|---|
61423608 | Dec 2010 | US |