The present application relates to a hand pose construction method, an electronic device, and a non-transitory computer readable storage medium. More particularly, the present application relates to a hand pose construction method, an electronic device, and a non-transitory computer readable storage medium for estimating occluded hand poses.
With the evolution of computerized environments, the use of human-machine interfaces (HMI) has dramatically increased. A growing need is identified for more natural human-machine user interface methods such as, for example, hand poses (or hand gestures) interaction to replace and/or complement traditional HMIs such as, for example, keyboards, pointing devices and/or touch interfaces, and the hand poses interaction may be applied to the VR/AR application. Several solutions for identifying and/or recognizing hand(s) poses may exist. Most of the commonly used hand poses detection and hand poses control now are realized with image detection and image analysis by computer vision. However, in computer vision, it's hard to predict hand poses when the hand in occluded by other objects. When two hands are interacting with each other or when the user is moving in a VR scene, due to different body orientations during the activity or obstruction between the lens and hands, it is inevitable for the camera to face difficulty when capturing the user's current movements, and the accuracy and stability for hand tracking would be influenced.
The disclosure provides a hand pose construction method. The hand pose construction method includes the following operations: capturing an image of a hand of a user from a viewing angle of a camera, wherein a hand image of the hand of the user is occluded within the image; obtaining a wrist position and a wrist direction of a wrist of the user according to a movement data of a tracking device wear on the wrist of the user; obtaining several visible feature points of the hand of the user from the image; and constructing a hand pose of the hand of the user according to the several visible feature points, the wrist position, the wrist direction, and a hand pose model.
The disclosure provides an electronic device. The electronic device includes a camera and a processor. The camera is configured to capture an image of a hand of a user from a viewing angle of a camera, wherein a hand image of the hand of the user is occluded within the image. The processor is coupled to the camera. The processor is configured to: obtain a wrist position and a wrist direction of a wrist of the user according to a movement data of a tracking device wear on the wrist of the user; obtain several visible feature points of the hand of the user from the image; and constructing a hand pose of the hand of the user according to the several visible feature points, the wrist position, the wrist direction, and a hand pose model.
The disclosure provides a non-transitory computer readable storage medium with a computer program to execute aforesaid hand pose construction method.
It is to be understood that both the foregoing general description and the following detailed description are by examples and are intended to provide further explanation of the invention as claimed.
Aspects of the present disclosure are best understood from the following detailed description when read with the accompanying figures. It is noted that, according to the standard practice in the industry, various features are not drawn to scale. In fact, the dimensions of the various features may be arbitrarily increased or reduced for clarity of discussion.
Reference will now be made in detail to the present embodiments of the disclosure, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.
It will be understood that, in the description herein and throughout the claims that follow, although the terms “first,” “second,” etc. may be used to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the embodiments.
It will be understood that, in the description herein and throughout the claims that follow, the terms “comprise” or “comprising,” “include” or “including,” “have” or “having,” “contain” or “containing” and the like used herein are to be understood to be open-ended, i.e., to mean including but not limited to.
It will be understood that, in the description herein and throughout the claims that follow, the phrase “and/or” includes any and all combinations of one or more of the associated listed items.
Reference is made to
The HMD system 100 includes an HMD device 110, a tracking device 130A, and a tracking device 130B. As shown in
In some embodiments, a camera is set within the HMD device 110. In some other embodiments, a camera is set at any place which could capture the image of the head and the hands of the user U together. However, whether the camera is set within the HMD device 110 or at any other places, it is evitable for the camera to capture an image with the hands being partially occluded, and the performance of the HMD system 100 would be influenced.
Reference is made to
In some embodiments, the electronic device 200 may be configured to perform a SLAM system. The SLAM system includes operations such as image capturing, features extracting from the image, and localizing according to the features. The details of the SLAM system will not be described herein.
Specifically, in some embodiments, the electronic device 200 may be applied in a virtual reality (VR)/mixed reality (MR)/augmented reality (AR) system. For example, the electronic device 200 may be realized by, a standalone head mounted display device (HMD) or VIVE HMD. In detail, the standalone HMD or VIVE HMD may handle such as processing location data of position and rotation, graph processing or others data calculation.
As shown in
The processor 230 is electrically connected to the camera 210, the memory 250, and the display circuit 270. In some embodiments, the processor 230 can be realized by, for example, one or more processing circuits, such as central processing circuits and/or micro processing circuits, but are not limited in this regard. In some embodiments, the memory 250 includes one or more memory devices, each of which includes, or a plurality of which collectively include a computer readable storage medium. The computer readable storage medium may include a read-only memory (ROM), a flash memory, a floppy disk, a hard disk, an optical disc, a flash disk, a flash drive, a tape, a database accessible from a network, and/or any storage medium with the same functionality that can be contemplated by persons of ordinary skill in the art to which this disclosure pertains.
The camera 210 is configured to capture one or more images of the real space that the electronic device 200 is operated. In some embodiments, the camera 210 may be realized by a camera circuit device or any other camera circuit with image capture functions. In some embodiments, the camera 210 may be realized by a RGB camera or a depth camera.
The display circuit 270 is electrically connected to the processor 230, such that the video and/or audio content displayed by the display circuit 270 is controlled by the processor 230.
Reference is made to
For example, in one embodiment, the camera may be located at the HMD device 110 wear on the head of the user U, as the camera 210A illustrated in
The camera 210A includes a viewing angle V1 imitating a viewing angle of the user U. The camera 210B includes a viewing angle V2. The images are captured by the camera according to the viewing angle. From the viewing angle V1 and the viewing angle V2 as illustrated in
It should be noted that, the electronic device 200 may be a device other than the HMD device, any device which is able obtain the positions of the head and the hands of the user may be included within the embodiments of the present disclosure.
In some embodiments, the HMD device 110 and the tracking devices 130A, 130B include a SLAM system with a SLAM algorithm. With the SLAM system of the tracking devices 130A, 130B and the HMD device 110, the processor 230 may obtain the position of the HMD device 110 and the tracking devices 130A, 130B within the real space.
In some embodiments, since the user U is wearing the HMD device 110 and the tracking devices 130A, 130B, the position of the HMD device 110 may represent the position of the head of the user U, the position of the tracking device 130A is taken as the position of the left wrist of the user U, and the position of the tracking device 130B is taken as the position of the right wrist of the user U. In some embodiments, the position of the tracking device 130A is taken as the position of the feature point of the left wrist of the user U, and the position of the tracking device 130B is taken as the position of the feature point of the right wrist of the user U.
For example, as illustrated in
It is noted that, the embodiments shown in
Reference is made to
As shown in
In operation S310, several frames of images of the hands of the user are captured. In some embodiments, operation S310 is performed by the camera 210 in
In operations S330, whether a hand image of the hands of the user is about to be occluded within the image is predicted, and a previous hand pose of the hands of the user is stored when it is predicted that the hand image of the hands of the user is about to be occluded within the image. In some embodiments, the operation S330 is performed by the processor 230 as illustrated in
Reference is made to
In operation S332, an arm skeleton of two arms of the user is constructed according to the wrist positions of the user and a head position of the user. In some embodiments, the arm skeleton of the two arms of the user is further constructed according to the wrist positions of the user, a head position of the user, and an arm skeleton model.
Reference is made to
As illustrated in
Reference is made to
In operation S334, several arm positions of the arm feature points of the user of several time points are obtained according to the arm skeleton. That is, several positions of the feature points of the arms of the user U at several time points is obtained according to the arm skeleton. Reference is made to
The images 71 and 72 as shown in
In operation S336, the hand image is about to be occluded is predicted according to the positions of the feature points of the arms of several time points. In some embodiments, whether the hands of the user U is about to be occluded is predicted according to the moving velocity and the moving direction of the arms of the user U.
For example, in an embodiment, the processor 230 as illustrated in
In some other embodiments, according to the position of the wrists of the user U, the processor 230 may predict whether the hands of the user U are moving toward an object within the real space, and the processor 230 may predict that the hand image of the user U will be occluded in the future.
Reference is made to S330 again. In some embodiments, when it is predicted that the hand image of the hands of the user is about to be occluded within the image, a previous hand pose of the hands of the user is stored.
For example, as illustrated in
Reference is made to
As illustrated in
The feature points as mentioned above are for illustrative purposes and the embodiments of the present disclosure are not limited thereto.
In some embodiments, the hand pose model 800 is stored in the memory 250 in
Reference is made to
Reference is made back to
Reference is made to
In some embodiments, when constructing the hand pose of the hands according to the hand image, the processor 230 as illustrated in
After the processor 230 finds the positions of the tracking device 130A and the tracking device 130B in the real space, the processor 230 searches the area surrounding the tracking devices 130A and 130B according to the hand image captured by the camera 210 as illustrated in
As illustrated in
In some embodiments, when the number of the visible feature points is less than a number threshold value, the processor 230 determines to perform the hand pose reconstruction method. In some other embodiments, when the ratio of the visible feature points is less than a ratio threshold value, the processor 230 determines to perform the hand pose reconstruction method.
In some embodiments, the processor 230 calculates an occlusion percentage of the hand, when the occlusion percentage of the hand is higher than a percentage threshold value, the processor 230 determines to perform the hand pose reconstruction method.
In some embodiments, some feature points of the hands are considered to include high significance. If the feature points with high significance are invisible, the processor 230 determines to perform the hand pose reconstruction method.
In operation S370, a hand pose reconstruction method is performed. In some embodiments, operation S370 is performed by the processor 230 as illustrated in
In operation S372, a wrist position and a wrist direction of a wrist of the user are obtained according to a movement data of a tracking device wear on the wrist of the user.
Reference is made to
According to the movement data, the processor 230 calculates the wrist position and the wrist direction of the wrists of the user. For example, in an embodiments, the processor 230 obtains an initial orientation (initial direction) and an initial position of the tracking device 130A of an initial time point. The processor 230 then obtains a movement data of the tracking device 130A from the initial time point to a current time point. By adding the initial orientation and the initial position and the movement data between the initial time point and the current time point, the processor 230 obtains a position and a direction of the tracking device 130A at the current time point. The position and the direction of the current time point is then taken as the wrist position and the wrist direction of the left wrist at the current time point, in which the left wrist is wearing the tracking device 130A.
In operation S374, several visible feature points of the hand of the user are obtained from the image. Reference is made to 11A and 11B together.
Reference is made to
Reference is made to
In some embodiments, the searching of the visible feature points is operated by searching an area surrounding the wrist position, and the position of the tracking device is taken as the wrist position. That is, the processor 230 finds the position of the tracking device 130A first, and then the processor 230 searches the area surrounding the position of the tracking device 130A to find the visible feature points according to the hand image.
In operation S376, the hand pose of the hand of the user is constructed according to several visible feature points, the wrist position and the wrist direction of the wrist, and a hand pose model.
In some embodiments, operation S376 is operated with a machine learning model ML stored in the memory 250 as illustrated in
In some embodiments, when all of the feature points of the same finger are invisible feature points or when the feature point of the fingertip is invisible, the processor 230 constructs the hand pose of the finger with the previous hand pose. In detail, the processor 230 obtains a relationship between the finger feature points and the wrist position according to the previous hand pose, and the processor 230 maintains the relationship between the finger feature points and the wrist position of the previous hand pose so as to construct the hand pose with occluded part.
For example, reference is made to
Reference is made to
Reference is made to
The relationship between the finger feature points F40, F42, F44, F46, and the feature point F10 of the wrist in the hand pose 1200B are the same as the relationship between the finger feature points F40, F42, F44, F46, and the feature point F10 of the wrist in the previous hand pose 1200A. That is, the processor 230 maintains the relationship between the finger feature points F40, F42, F44, F46, and the feature point F10 of the wrist in the previous hand pose 1200A so as to construct the hand pose 1200B.
The construction of the thumb and the fore finger within the hand pose 1200B are the same as that of the middle finger and will not be described in detail here.
In some embodiments, in operation S376, according to the visible feature points, the processor 230 as illustrated in
Reference is made to
According to the visible feature points F40, F42, the wrist position and the wrist direction of the wrist wearing the tracking device 130A and the hand pose model, the processor 230 estimates the estimated position of the invisible feature points of the middle finger, so as to construct the hand pose.
Reference is made to
Take the hand pose of the middle finger as illustrated in
The construction of the hand pose of the ring finger in the hand pose 1300 is similar to the hand pose of the middle finger and will not be described in detail here.
In some embodiments, the hand pose is constructed with the information of the wrist direction. Reference is made to
In some embodiments, the processor 230 further construct the hand pose of the user according to a hand pose database stored in the memory 250. The hand pose database includes several normal hand poses, such as hand poses for dancing or sign language. By comparing the hand image captured to the normal hand poses of the hand pose database, the processor 230 may estimate a possible hand pose according to the positions of the visible feature points and the position of the wrist feature point, so as to construct the hand pose.
Through the operations of various embodiments described above, a hand pose construction method, an electronic device, and a non-transitory computer readable storage medium are implemented. For the hand image that is occluded, the hand pose can be predicted and constructed with the position of the tracking device and the visible feature points of the hands. Moreover, the movement of the arms of the user can be predicted by detecting the position of the head and the wrist of the user, and the hand self-occlusion status can be predicted in advance so as to store the image of the hand pose before the hand image is occluded. For applications such as dance or sign language, there are many self-occlusion case, the embodiments of the present disclosure can help to predict more stable hand pose by predicting the movement of the arms. Furthermore, in applications such as dance or sign language, there are known data sets are database for classifying and recognizing the normal hand poses or movements, the embodiments of the present disclosure can predict the occluded hand poses more correctly according to the database. The movement of the arms are calculated according to the positions of the wrists and the position of the head of the user, and other time consuming algorithm (for example, object detection model for detecting the arms) are not needed, which reduce the amount of calculation of the processor.
The embodiments of the present disclosure make the construction of the hand poses more stable when the hand image is occluded. Thereby, the hand interactions can be shown perfectly in the UI/UX, and the hand poses can be displayed on the screen and the user experience can be increased. Moreover, with auto detection and prediction of the hand image being occluded by the hand or arm of the user, the hand image can be stored previous to the hand image being occluded, and the hand construction is more stable and accurate.
It should be noted that in the operations of the abovementioned hand pose construction method 300, no particular sequence is required unless otherwise specified. Moreover, the operations may also be performed simultaneously or the execution times thereof may at least partially overlap.
Furthermore, the operations of the hand pose construction method 300 may be added to, replaced, and/or eliminated as appropriate, in accordance with various embodiments of the present disclosure.
Various functional components or blocks have been described herein. As will be appreciated by persons skilled in the art, the functional blocks will preferably be implemented through circuits (either dedicated circuits, or general purpose circuits, which operate under the control of one or more processing circuits and coded instructions), which will typically include transistors or other circuit elements that are configured in such a way as to control the operation of the circuitry in accordance with the functions and operations described herein.
Although the present disclosure has been described in considerable detail with reference to certain embodiments thereof, other embodiments are possible. Therefore, the scope of the appended claims should not be limited to the description of the embodiments contained herein.
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the present invention without departing from the scope or spirit of the invention. In view of the foregoing, it is intended that the present invention cover modifications and variations of this invention provided they fall within the scope of the following claims.
This application claims priority to U.S. Provisional Application Ser. No. 63/386,490, filed Dec. 7, 2022, which is herein incorporated by reference.
Number | Date | Country | |
---|---|---|---|
63386490 | Dec 2022 | US |