This application claims all benefits accruing under 35 U.S.C. §119 from Taiwan Patent Application No. 106105231, filed on Feb. 17, 2017, in the Taiwan Intellectual Property Office, the contents of which are hereby incorporated by reference.
The present disclosure relates to gesture recognition devices and man-machine interaction systems using the same.
Machine learning evolves the study of pattern recognition and computational learning theory in artificial intelligence. A branch of machine learning, called deep learning, is based on a set of algorithms that attempt to model high-level abstractions in data by using a deep graph with multiple processing layers. The deep learning is composed of multiple linear and non-linear transformations. With the exponential growth of technological advancements, deep learning is used everywhere, including cloud computing, medicine, media, security and autonomous vehicles.
Aside from artificial intelligence, virtual reality and augmented reality are another area that is currently blooming in the technological field. They allow users to interact with non-existing items that are only present in the mind of the machines. A common issue that developers face is the different type of ways that allow user to interact with the virtual objects. The simplest and most traditional option is to use actual peripherals, such as the gaming controllers utilized by HTC Vive and Oculus Rift. Although accurate and precise, using physical actuators would deeply deteriorate the immersive experience that virtual realities are hoping to achieve.
Alternatively, voice activation commands can be employed, although not without its drawbacks. First, to accommodate all languages in the world, one simple command may need to be implemented into at least ten different pronunciations. It is also incredibly difficult to accurately interpret spoken words, varying factors such as pitch, accent and rhythm could all contribute and affect the machine's ability to output the correct result. Lastly, any surrounding noise would greatly lower the chance to accurately interpret the spoken words. The proposed method, virtual/augmented reality hand input recognition through machine learning, allows users to communicate with the machine in both virtual and augmented reality without the need to interact with any physical devices. The man-machine interaction system usually uses an ordinary camera for hand images capture, a first neural network for positioning a hand, and a second neural network for 2-dimensional (2D) recognition of the hand's motions. However, the man-machine interaction system is complicated and has poor efficiency because two different neural networks are used.
What is needed, therefore, is to provide a gesture recognition device and man-machine interaction system that can overcome the problems as discussed above.
Many aspects of the exemplary embodiments can be better understood with reference to the following drawings. The components in the drawings are not necessarily drawn to scale, the emphasis instead being placed upon clearly illustrating the principles of the exemplary embodiments. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views.
It will be appreciated that for simplicity and clarity of illustration, where appropriate, reference numerals have been repeated among the different figures to indicate corresponding or analogous elements. In addition, numerous specific details are set forth in order to provide a thorough understanding of the exemplary embodiments described herein. However, it will be understood by those of ordinary skill in the art that the exemplary embodiments described herein can be practiced without these specific details. In other instances, methods, procedures, and components have not been described in detail so as not to obscure the related relevant feature being described. The drawings are not necessarily to scale, and the proportions of certain parts may be exaggerated better illustrate details and features. The description is not to considered as limiting the scope of the exemplary embodiments described herein.
Several definitions that apply throughout this disclosure will now be presented. The terms “connected” and “coupled” are defined as connected, whether directly or indirectly through intervening components, and is not necessarily limited to physical connections. The connection can be such that the objects are permanently connected or releasably connected. The term “outside” refers to a region that is beyond the outermost confines of a physical object. The term “inside” indicates that at least a portion of a region is partially contained within a boundary formed by the object. The term “substantially” is defined to essentially conforming to the particular dimension, shape or other word that substantially modifies, such that the component need not be exact. For example, substantially cylindrical means that the object resembles a cylinder, but can have one or more deviations from a true cylinder. The term “comprising” means “including, but not necessarily limited to”; it specifically indicates open-ended inclusion or membership in a so-described combination, group, series and the like. It should be noted that references to “an” or “one” exemplary embodiment in this disclosure are not necessarily to the same exemplary embodiment, and such references mean at least one.
In general, the word “module,” as used herein, refers to logic embodied in hardware or firmware, or to a collection of software instructions, written in a programming language, such as, for example, Java, C, or assembly. One or more software instructions in the modules may be embedded in firmware, such as an EPROM. It will be appreciated that modules may include connected logic units, such as gates and flip-flops, and may include programmable units, such as programmable gate arrays or processors. The modules described herein may be implemented as either software and/or hardware modules and may be stored in any type of computer-readable medium or other computer storage device.
References will now be made to the drawings to describe, in detail, various exemplary embodiments of the present gesture recognition devices and man-machine interaction systems using the same.
Referring to
The intelligent interaction device 12 can be a game engine unity, a virtual reality device, or an augmented reality device. The intelligent interaction device 12 can include image acquisition sensors and sound acquisition sensors.
Different examples of the gesture recognition device 11 are described as below.
Referring to
The controlling module 110 controls the operation of the gesture recognition device 11. The capturing module 111 detects the position of a hand to obtain the hand's positional data. The calculating module 112 calculates a distance between two positions of the hand according to the hand's positional data. The recognizing module 113 recognizes the gesture according to the hand's positional data. The communication module 114 communicates with the intelligent interaction device 12. The gesture recognition device 11 can further includes a storage module (not shown) for storing data.
The capturing module 111 includes a 3-dimensional (3D) sensor for hand motion capture. The 3D sensor can be an infrared sensor, a laser sensor, or an ultrasonic sensor. In one exemplary embodiment, the 3D sensor is a LEAP MOTION®. The LEAP MOTION® is a hand motion sensing device that is able to capture and output the position of both hands through USB 3.0. The gesture recognition device 11 does not need a special neural network for recognizing the hand position from the image. The gesture recognition device 11 is simple and highly efficient.
In one exemplary embodiment, the gesture recognition device 11 further includes a first determining module 115. The first determining module 115 determines whether the gesture of the user is 2-dimensional (2D) gesture. The recognizing module 113 includes a 2D recognizing module 1132 and a 3D recognizing module 1133. The 2D recognizing module 1132 only recognizes the 2D gesture, and the 3D recognizing module 1133 only recognizes the 3D gesture. Thus, the gesture recognition device 11 has a high recognition efficiency.
The 2D recognizing module 1132 includes a 2D recognizing neural network specially used for recognizing 2D gesture. The 3D recognizing module 1133 includes a 3D recognizing neural network specially used for recognizing 3D gesture. Both the 2D recognizing neural network and the 3D recognizing neural network are essentially the same, in terms of converting the user inputs into the input layer of the neural networks. The 3D recognizing neural network is used to recognize a complicated gesture and cost more time than the 2D recognizing neural network. For 2D recognizing neural network, the number of input pixels (hand positions) will be width×height, whereas 3D recognizing neural network will be width×height×depth. Both the 2D recognizing neural network and the 3D recognizing neural network can be a deep learning network, such as a convolutional neural network or recurrent neural network. Through proper training with forward and backward propagation, a satisfied output will be computed by the trained network.
Referring to
step S11, obtaining the hand's positional data of a gesture, proceeding to step S12;
step S12, determining whether the gesture is a 2D gesture, if yes, proceeding to step S13, if no, proceeding to step S14;
step S13, recognizing the gesture using the 2D recognizing module 1132, proceeding to step S15;
step S14, recognizing the gesture using the 3D recognizing module 1133, proceeding to step S15; and
step S15, sending the gesture to the intelligent interaction device 12, return to step S11.
Referring to
step S121, calculating a maximum distance of the gesture along a depth direction; and
step S122, determining whether the maximum distance is less than or equal to a distance threshold, if yes, proceeding to step S13, if no, proceeding to step S14.
In step S121, the direction, that is perpendicular the front surface of the 3D sensor, is defined as the depth direction as shown in
In step S122, the distance threshold can be selected according to need or experience. In one exemplary embodiment, the distance threshold can be in a range of about 2 centimeters to about 5 centimeters. When the maximum distance is less than or equal to the distance threshold, 2D gesture is determined as a result. When the maximum distance is greater than the distance threshold, 3D gesture is determined as a result.
In testing, a 3-layered neural network with 30 hidden neurons has been implemented to test MNIST hand written digit data with an accuracy of up to 95%. Pinch draw using Leap Motion in Unity is also successful.
Referring to
The gesture recognition device 11A of example 2 is similar to gesture recognition device 11 of example 1 except that the gesture recognition device 11A further include the second determining module 116. The second determining module 116 determines whether an initiation command or an end command is received.
The initiation command and the end command can be electromagnetic signals from other device, such as mobile phone of user, and received by the communication module 114. The initiation command and the end command can also be a gesture performed by user and recognized by the recognizing module 113. As shown in
Referring to
step S10, determining whether an initiation command is received by the communication module 114, if yes, proceeding to step S11, if no, repeating step S10;
step S11, obtaining the hand's positional data of a gesture, proceeding to step S12;
step S12, determining whether the gesture is a 2D gesture, if yes, proceeding to step S13, if no, proceeding to step S14;
step S13, recognizing the gesture using the 2D recognizing module 1132, proceeding to step S15;
step S14, recognizing the gesture using the 3D recognizing module 1133, proceeding to step S15;
step S15, sending the gesture to the intelligent interaction device 12, proceeding to step S16; and
step S16, determining whether an end command is received by the communication module 114 with in a time threshold, if yes, return to step S10, if no, return to step S11.
In step S16, the time threshold can be selected according to need or experience. In one exemplary embodiment, the time threshold can be in a range of about 2 seconds to about 5 seconds.
Referring to
step S10, obtaining the hand's positional data of a first gesture, recognizing the first gesture and determining whether the first gesture is an initiation command, if yes, proceeding to step S11, if no, repeating step S10;
step S11, obtaining the hand's positional data of a second gesture, proceeding to step S12;
step S12, determining whether the second gesture is a 2D gesture, if yes, proceeding to step S13, if no, proceeding to step S14;
step S13, recognizing the second gesture using the 2D recognizing module 1132, proceeding to step S15;
step S14, recognizing the second gesture using the 3D recognizing module 1133, proceeding to step S15;
step S15, determining whether the second gesture is an end command, if yes, return to step S10, if no, proceeding to step S16; and
step S16, sending the second gesture to the intelligent interaction device 12, return to step S11.
In step S10, a first standard gesture is defined as the initiation command. When the first standard gesture is a 2D gesture, the first gesture is recognized directly by the 2D recognizing module 1132 and then compared with the first standard gesture by the second determining module 116. When the first standard gesture is a 3D gesture, the first gesture is recognized directly by the 3D recognizing module 1133 and then compared with the first standard gesture by the second determining module 116. When the first gesture is the same as the first standard gesture, the first gesture is determined to be the initiation command.
In step S15, a second standard gesture is defined as the end command, and the second gesture is compared with the second standard gesture by the second determining module 116. When the second gesture is the same as the second standard gesture, the second gesture is determined to be the end command.
Referring to
The gesture recognition device 11B of example 3 is similar to gesture recognition device 11A of example 2 except that the gesture recognition device 11B further includes the third determining module 117. The third determining module 117 determines whether a selecting command is received. The selecting command selects one of the 2D recognizing module 1132 and the 3D recognizing module 1133 as a selected recognizing module.
The selecting command can be electromagnetic signals from other device, such as mobile phone of user, and received by the communication module 114. The selecting command can also be gesture performed by user and recognized by the recognizing module 113. As shown in
Referring to
step S20, determining whether an initiation command is received by the communication module 114, if yes, proceeding to step S21, if no, repeating step S20;
step S21, determining whether a selecting command is received by the communication module 114, if yes, proceeding to step S22, if no, repeating step S21;
step S22, selecting one of the 2D recognizing module 1132 and the 3D recognizing module 1133 according to the selecting command as a selected recognizing module, proceeding to step S23;
step S23, obtaining the hand's positional data of a gesture, proceeding to step S24;
step S24, recognizing the gesture using the selected recognizing module, proceeding to step S25;
step S25, sending the gesture to the intelligent interaction device 12, proceeding to step S26; and
step S26, determining whether an end command is received by the communication module 114 with in a time threshold, if yes, return to step S20, if no, return to step S23.
Referring to
step S20, obtaining the hand's positional data of a first gesture, recognizing the first gesture and determining whether the first gesture is an initiation command, if yes, proceeding to step S21, if no, repeating step S20;
step S21, obtaining the hand's positional data of a second gesture, recognizing the second gesture and determining whether the second gesture is a selecting command, if yes, proceeding to step S22, if no, repeating step S21;
step S22, selecting one of the 2D recognizing module 1132 and the 3D recognizing module 1133 according to the selecting command as a selected recognizing module, proceeding to step S23;
step S23, obtaining the hand's positional data of a third gesture, proceeding to step S24;
step S24, recognizing the third gesture using the selected recognizing module, proceeding to step S25;
step S25, determining whether the third gesture is an end command, if yes, return to step S20, if no, proceeding to step S26; and
step S26, sending the third gesture to the intelligent interaction device 12, return to step S23.
In step S21, when a third standard gesture is defined as the selecting command, and the second gesture is compared with the third standard gesture by the third determining module 117. When the second gesture is the same as the third standard gesture, the second gesture is determined to be the selecting command.
Referring to
The gesture recognition device 11C of example 4 is similar to gesture recognition device 11B of example 3 except that the gesture recognition device 11C further include the fourth determining module 118. The fourth determining module 118 determines whether a switching command is received. The switching command switches the selected recognizing module between the 2D recognizing module 1132 and the 3D recognizing module 1133.
The switching command can be electromagnetic signals from other device, such as mobile phone of user, and received by the communication module 114. The switching command can also be a gesture performed by user and recognized by the recognizing module 113. As shown in
Referring to
step S20, determining whether an initiation command is received by the communication module 114, if yes, proceeding to step S21, if no, repeating step S20;
step S21, determining whether a selecting command is received by the communication module 114, if yes, proceeding to step S22, if no, repeating step S21;
step S22, selecting one of the 2D recognizing module 1132 and the 3D recognizing module 1133 according to the selecting command as a selected recognizing module, proceeding to step S23;
step S23, obtaining the hand's positional data of a gesture, proceeding to step S24;
step S24, recognizing the gesture using the selected recognizing module, proceeding to step S25;
step S25, sending the gesture to the intelligent interaction device 12, proceeding to step S26;
step S26, determining whether an end command is received by the communication module 114 with in a first time threshold, if yes, return to step S20, if no, proceeding to step S27;
step S27, determining whether a switching command is received by the communication module 114 with in a second time threshold, if yes, proceeding to step S28, if no, return to step S23; and
step S28, switching the selected recognizing module between the 2D recognizing module 1132 and the 3D recognizing module 1133, return to step S23.
In step S26 and step S27, the first time threshold and the second time threshold can be selected according to need or experience. In one exemplary embodiment, the first time threshold is in a range of about 2 seconds to about 5 seconds, and the second time threshold is in a range of about 2 seconds to about 5 seconds.
Referring to
step S20, obtaining the hand's positional data of a first gesture, recognizing the first gesture and determining whether the first gesture is an initiation command, if yes, proceeding to step S21, if no, repeating step S20;
step S21, obtaining the hand's positional data of a second gesture, recognizing the second gesture and determining whether the second gesture is a selecting command, if yes, proceeding to step S22, if no, repeating step S21;
step S22, selecting one of the 2D recognizing module 1132 and the 3D recognizing module 1133 according to the selecting command as a selected recognizing module, proceeding to step S23;
step S23, obtaining the hand's positional data of a third gesture, proceeding to step S24;
step S24, recognizing the third gesture using the selected recognizing module, proceeding to step S25;
step S25, determining whether the third gesture is an end command, if yes, return to step S20, if no, proceeding to step S26;
step S26, determining whether the third gesture is a switching command, if yes, proceeding to step S27, if no, proceeding to step S28;
step S27, switching the selected recognizing module between the 2D recognizing module 1132 and the 3D recognizing module 1133, return to step S23; and
step S28, sending the third gesture to the intelligent interaction device 12, return to step S23.
In step S26, a fourth standard gesture is defined as the switching command. The third gesture is compared with a fourth standard gesture by the fourth determining module 118. When the third gesture is the same as the fourth standard gesture, the third gesture is determined to be the switching command.
It is to be understood that the above-described exemplary embodiments are intended to illustrate rather than limit the disclosure. Any elements described in accordance with any exemplary embodiments is understood that they can be used in addition or substituted in other exemplary embodiments. Exemplary embodiments can also be used together. Variations may be made to the exemplary embodiments without departing from the spirit of the disclosure. The above-described exemplary embodiments illustrate the scope of the disclosure but do not restrict the scope of the disclosure.
Depending on the exemplary embodiment, certain of the steps of methods described may be removed, others may be added, and the sequence of steps may be altered. It is also to be understood that the description and the claims drawn to a method may include some indication in reference to certain steps. However, the indication used is only to be viewed for identification purposes and not as a suggestion as to an order for the steps.
| Number | Date | Country | Kind |
|---|---|---|---|
| 106105231 | Feb 2017 | TW | national |