The present disclosure generally relates to a method for estimating behavior, in particular, to a behavior understanding system and a behavior understanding method.
The problems of human motion analysis and behavior understanding exist for many years and have attracted many researches because of its large panel of potential applications.
However, the task of understanding human behaviors is still difficult due to the complex nature of the human motion. What further complicates the task is the necessity of being robust to execution speed and geometric transformations, like the size of the subject, its position in the scene and its orientation with respect to the sensor. Additionally, in some contexts, human behaviors imply interactions with objects. While such interactions can help to differentiate similar human motions, they also add challenges, like occlusions of body parts.
Accordingly, the present disclosure is directed to a human behavior understanding system and a human behavior understanding method, in which the behavior of the user is estimated according to one or more base motions.
In one of the exemplary embodiments, a behavior understanding method includes, but not limited to, the following steps. A sequence of motion sensing data is obtained, and the motion sensing data is generated through sensing a motion of a human body portion for a time period. At least two comparing results respectively corresponding to at least two timepoints are generated. The comparing results are generated through comparing the motion sensing data with base motion data. The base motion data is related to multiple base motions. A behavior information of the human body portion is determined according to the comparing results. The behavior information is related to a behavior formed by at least one base motion.
In one of the exemplary embodiments, a behavior understanding system includes, but not limited to, a sensor and a processor. The sensor is used for sensing a motion of a human body portion for a time period. The processor is configured to perform the following steps. At least two comparing results respectively corresponding to at least two timepoints are generated. The timepoints are within the time period. The comparing results are generated through comparing the motion sensing data with motion base data. The base motion data is related to multiple base motions. A behavior information of the human body portion is determined according to the comparing results. The behavior information is related to a behavior formed by at least one base motion.
It should be understood, however, that this Summary may not contain all of the aspects and embodiments of the present disclosure, is not meant to be limiting or restrictive in any manner, and that the invention as disclosed herein is and will be understood by those of ordinary skill in the art to encompass obvious improvements and modifications thereto.
The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the disclosure and, together with the description, serve to explain the principles of the disclosure.
Reference will now be made in detail to the present preferred embodiments of the disclosure, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.
The sensor 110 may be an accelerometer, a gyroscope, a magnetometer, a laser sensor, an inertial measurement unit (IMU), an infrared ray (IR) sensor, an image sensor, a depth camera, or any combination of aforementioned sensors. In the embodiment of the disclosure, the sensor 110 is used for sensing the motion of one or more human body portions for a time period. The human body portion may be a hand, a head, an ankle, a leg, a waist, or other portions. The sensor 110 can sense the motion of the corresponding human body portion, to generate a sequence of motion sensing data from the sensing result of the sensor 110 (e.g. camera images, sensed strength values, etc.) at multiple timepoints within the time period. For one example, the motion sensing data comprises a 3-degree of freedom (3-DoF) data, and the 3-DoF data is related to the rotation data of the human body portion in three-dimensional (3D) space, such as accelerations in yaw, roll and pitch. For another example, the motion sensing data comprises a relative position and/or displacement of a human body portion in the 2D/3D space. It should be noticed that, the sensor 110 could be embedded in a handheld controller or a wearable apparatus, such as a wearable controller, a smart watch, an ankle sensor, a head-mounted display (HMD), or the likes.
Memory 130 may be any type of a fixed or movable Random-Access Memory (RAM), a Read-Only Memory (ROM), a flash memory or a similar device or a combination of the above devices. The memory 130 can be used to store program codes, device configurations, buffer data or permanent data (such as motion sensing data, comparing results, information related to base motions, etc.), and these data would be introduced later.
The processor 150 is coupled to the memory 130, and the processor 150 is configured to load the program codes stored in the memory 130, to perform a procedure of the exemplary embodiment of the disclosure. Functions of the processor 150 may be implemented by using a programmable unit such as a central processing unit (CPU), a microprocessor, a microcontroller, a digital signal processing (DSP) chip, a field programmable gate array (FPGA), etc. The functions of the processor 150 may also be implemented by an independent electronic device or an integrated circuit (IC), and operations of the processor 150 may also be implemented by software.
It should be noticed that, the processor 150 may or may not be disposed at the same apparatus with the sensor 110. However, the apparatuses respectively equipped with the sensor 110 and the processor 150 may further include communication transceivers with compatible communication technology, such as Bluetooth, Wi-Fi, IR, or physical transmission line, to transmit/receive data with each other.
In some embodiments, the sequence of the motion sensing data may be generated by combining the first part of motion sensing data and the second part of motion sensing data for the same human body. For example, one motion sensing data is determined based on the first part of motion sensing data at one or more timepoints, and another is determined based on the second part of motion sensing data at one or more other timepoints. For another example, the first part of motion sensing data and the second part of motion sensing data at one timepoint are fused with a weight relation of the first part and the second part, to determine one of the sequence of the motion sensing data.
In some embodiments, the sequence of the motion sensing data may be generated according to the first part of motion sensing data or the second part of motion sensing data solely. For example, one of the first part of motion sensing data and the second part of motion sensing data is selected to determine the sequence of the motion sensing data, and the unselected motion sensing data would be omitted.
In some embodiments, the HMD 120 may further include another IMU (not shown), to obtain rotation information of human body portions B5 (i.e., the head). The HMD 120, the ankle sensors 140, and the handheld controllers 160 may communicate with each other through compatible communication technology.
It should be noticed that, the behavior understanding system 200 is merely an example to illustrate the disposing and communication manners of sensor 110 and processor 150. However, there are still may other implementations of the behavior understanding system 100, and the present disclosure is not limited thereto.
To better understand the operating process provided in one or more embodiments of the disclosure, several embodiments will be exemplified below to elaborate the operating process of the behavior understanding system 100 or 200. The devices and modules in the behavior understanding system 100 or 200 are applied in the following embodiments to explain the control method provided herein. Each step of the control method can be adjusted according to actual implementation situations and should not be limited to what is described herein.
The terminology “behavior” in the embodiment of the present disclosure is defined with three types: human gestures, human actions and human activities. Each type of behaviors is characterized by a specific degree of motion complexity, a specific degree of human-object interaction and a specific duration of the behavior. For example, the gesture behaviors have low complexity and short duration, the action behaviors have medium complexity and intermediate duration, and the activity behaviors have high complexity and long duration. It is not possible to interact with another object for the gesture behaviors, and it is possible to interact with another object for the action behaviors and the activity behaviors. One gesture behavior may be characterized by a motion of only one part of the human body portion (often the arm). One action behavior may be characterized by a slightly more complex movement, which can also be a combination of multiple gestures, or characterized by motion of multiple human body portions. In addition, the activity behavior may be characterized by a high level of motion complexity, where multiple movements or actions are performed successively.
Taking the behavior understanding system 200 as an example, 6-DoF information of the human body portion B1 can be determined.
On the other hand, the stereo camera 115 captures mono images m1, m2 toward the human body portion B1 (step S403). The processor 150 may perform a fisheye dewarp process on the mono images m1, m2, and the dewarped images M1, M2 are generated (step S404). The human body portion B1 in the dewarped images M1, M2 would be identified through a machine learning technology (such as deep learning, artificial neural network (ANN), or support vector machine (SVM), etc.). The sensing strength and the pixel position corresponding to the human body portion B1 then can be used for estimating depth information of the human body portion B1 (i.e., a distance relative to the HMD 120) (step S405) and estimating 2D position of the human body portion B1 at a plane parallel to the stereo camera 115 (step S406). The processor 150 can generate a 3D position in the predefined coordinate system according to the distance and the 2D position of the human body portion B1 estimated at steps S405 and S406 (step S407). Then, the rotation and 3D position of the human body portion B1 in the predefined coordinate system can be fused (step S408), and a 6-DoF information, which would be considered as the motion sensing data, can be outputted (step S409).
In another embodiment, the 3D position of the human body portion B1 can be determined according to the 3D position of the human body portion B5 and the rotation information of the human body portion B1. Specifically, a 6-DoF sensor may be equipped on the human body portion B5, so as to obtain the position and the rotation information of the human body portion B5. On the other hand, the rotation information of the human body portion B1 can be obtained as described at step S402. Then, a displacement of the human body portion B1 can be estimated through double integral on the detected acceleration of the human body portion B1 in three axes. However, when a user walks, an error of the estimated displacement of the human body portion B1 of the user may be accumulated, and the estimated position of the human body portion B1 would be not accurate. In order to improve the accuracy of the estimated position, the position of the human body portion B5 can be considered as a reference point of the user, and the estimated position of the human body portion B1 can corrected according to the reference point. While walking or running, the displacement of the human body portion B5 would correspond to the displacement of the human body portion B1 with a specific pose, such as lifting leg, unbending leg, other any other pose of walking or running. The position of the human body portion B1 with the specific pose can be considered as a reset position, and the reset position has a certain relative position corresponding to the reference point. When the processor 150 determines the user is walking or running according to the displacement of the human body portion B1, the estimated position of the human body portion B1 can be corrected at the reset position according to the certain relative position corresponding to the reference point, so as to remove the error of estimation generated by the IMU 111.
It should be noticed that, there are still many other embodiments for obtaining the motion sensing data. For example, a 6-DoF sensor may be equipped on the human body portion B1, so as to make the 6-DoF information be the motion sensing data. For another example, a depth camera may be equipped on the human body portion B1, so as to make the depth information detected be the motion sensing data.
Referring to
In some embodiments, the motion sensing data at each timepoint would be compared with multiple predefined base motions in the base motion data, to generate a comparing result. Each predefined base motion is associated with a specific motion sensing data, such as a specific position and a specific orientation in 3D space. In addition, because an order of multiple base motions is essential condition to form one behavior, the comparing results at different timepoints would be stored in the memory 130 for later use. It should be noticed that, the order described in the embodiment is related that base motions are sorted by happening timepoint thereof.
In some embodiments, the specific motion sensing data of multiple base motions could be training samples for training a classifier or a neural network model based on the machine learning technology. The classifier or the neural network model can be used to identify which base motion corresponds to the motion sensing data obtained at step S310 or determine a likelihood that the motion of the detected human body portion is one of the base motions.
In some embodiments, the comparing result may be the most similar one or more base motions or likelihoods respectively corresponding to different base motions.
In some embodiments, to quantize the likelihood, a matching degree between the motion sensing data and the base motion data can be used to represent one likelihood that the motion of the detected human body portion is a specific base motion. The matching degree could be a value from 0 to 100 percentages to present the possibility that motion of the human body portion is a specific base motion, and the summation of the matching degrees corresponding to all predefined base motions could be, for example, 100 percentages. For example, the comparing result at a timepoint includes 10 percentages of lifting base motion, 0 percentage of pointing base motion, 75 percentages of kicking base motion, 3 percentages of stepping base motion, and 22 percentages of jumping base motion.
In some embodiments, one or more base motions could be selected as a representative of a comparing result according to the matching degrees corresponding to all base motions at each timepoint. For example, the one or more base motions with the highest matching degree could be the representative of the comparing result. For another example, the one or more base motions with matching degree lager than a threshold (such as 60, 75 or 80 percentages) could be the representative of the comparing result.
It should be noticed that, the comparing result includes multiple matching digresses corresponding to all predefined base motions in the aforementioned embodiments. However, there are still may other implementations for determining the comparing result. For example, the comparing result may include difference between the motion sensing data obtained at step S310 and the specific motion sensing data of the base motions, and the one or more base motions with less difference could be the representative of a comparing result. In addition, the base motions may be selected for the comparison with the motion sensing data first according to the limitation of the geometric structure of the human body. For example, most of human cannot stretch their arm horizontally backward over a specific degree relative to their chests.
In some embodiments, in addition to the predefined base motions, a non-predefined base motion different from the predefined base motions in the base motion data could be trained by using the sequence of motion sensing data and the machine learning algorithm. For example, if there is none of the predefined base motions with matching degree lager than a threshold, the motion sensing data at current timepoint would be a training sample for training a classifier or a neural network model of a new base motion.
Referring to
Referring to
Accordingly, one behavior may be predicted correctly without obtaining further motion sensing data at subsequent timepoints.
It should be noticed that, the time window may be variable. In response to the comparing results being not matched with any predefined behavior, the time window may be enlarged to include more comparing results in one combination. For example, referring to FIG. 5, the time window W1 is enlarged to become the time window W2, and a combination within the time window W2 includes three comparing results at three timepoints. The combination within the time window W2 would be determined whether be matched with any predefined behavior.
On the other hand, in response to the comparing results being matched with one predefined behavior, the time window may be reduced or maintained. For example, referring to
It should be noticed that, the value of matching degree may be related to the confidence that the comparing result is correct. In one embodiment, the matching degree of the representative of the comparing result at each timepoint may be compared with a threshold. The threshold may be, for example, 50, 70 or 80 percentages. In response to the matching degree of the representative being larger than the threshold, the representative would be used to determine the behavior of the human body portion. For example, the threshold is 60 percentages, and a jumping base motion with 75 percentages would be a reference to determine a behavior.
On the other hand, in response to the matching degree of the representative being not larger than the threshold, the representative would be not used to determine the behavior of the human body portion. The representative would be abandoned or weighted with lower priority. For example, the threshold is 80 percentages, and a kicking base motion with 65 percentages would be abandoned, and the kicking base motion would not be a reference to determine a behavior. For another example, the threshold is 60 percentages, a pointing base motion with 65 percentages at the first timepoint, a lifting base motion with 55 percentages at the second timepoint, and a kicking base motion with 80 percentages at the third timepoint are determined. The processor 150 may not consider that a kicking behavior is performed by the three base motions.
In addition, one behavior may be related to base motions of multiple human body portions. For example, referring to
For example, a lifting base motion is determined according to the motion sensing data of the human body portion B1 at the first timepoint t1, and a pointing base motion is determined according to the motion sensing data of the human body portion B1 at the second timepoint t2. In addition, a pointing base motion is determined according to the motion sensing data of the human body portion B2 at the first timepoint t1, and a lifting base motion is determined according to the motion sensing data of the human body portion B2 at the second timepoint t2. Then, the processor 150 may determine that a running behavior is performed according the combination of determined base motions of the human body portions B1 and B2.
It should be noticed that, based one different design requirement, in other embodiments, one or more predefined behaviors may be associated with multiple base motions of three or more human body portions. The processor 150 may determine whether comparing results of these human body portions are matched with any predefined behavior.
After the behavior information of the human body portion is determined, a motion of an avatar or an image presented in a display can be modified according to the determined behavior. For example, the behavior of legs is running, and the avatar may run accordingly. For another example, the behavior of a head is raising, and a sky would be showed in the image of the display.
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the present disclosure without departing from the scope or spirit of the disclosure. In view of the foregoing, it is intended that the present disclosure cover modifications and variations of this disclosure provided they fall within the scope of the following claims and their equivalents.
Number | Date | Country | |
---|---|---|---|
Parent | 16136182 | Sep 2018 | US |
Child | 16565512 | US | |
Parent | 16136198 | Sep 2018 | US |
Child | 16136182 | US | |
Parent | 16137477 | Sep 2018 | US |
Child | 16136198 | US |