This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2021-55639, filed on Mar. 29, 2021, the entire contents of which are incorporated herein by reference.
The embodiment discussed herein is related to a skeleton recognition method, a computer-readable recording medium storing a skeleton recognition program, and an artistic gymnastics scoring support apparatus.
A skeleton recognition technique is a technique for identifying positions of joints of a human body from information of a point cloud that is a plurality of points on a surface of the human body obtained from three-dimensional sensors. A human body model, which is a geometric model, is fitted to the point cloud, and positions of joints in the human body model are determined. The term “fitting” refers to optimizing an objective function that represents a degree of agreement between the point cloud and the human body model. The optimization is implemented by minimizing a distance between the point cloud and the human body model.
Masui Shoichi et al., “Practical Implementation of Gymnastics Scoring Support System based on 3D Sensing and Skill Recognition Technology (3D Senshingu-Waza Ninshiki Gijutsu ni yoru Taiso Saiten Shien Shisutemu no Jitsuyoka)”, [online], 2020, Information Processing, [Searched on Mar. 18, 2021], Internet (URL: https://www.ipsj.or.jp/dp/contents/publication/44/S1104-S01.html) is disclosed as related art.
According to an aspect of the embodiments, a skeleton recognition method includes: obtaining, by a computer, an integrated three-dimensional point cloud by integrating three-dimensional point clouds obtained by detecting a target person and a target object from a plurality of directions with a plurality of detection devices; and recognizing skeleton information of the target person by optimizing, based on the integrated three-dimensional point cloud and a three-dimensional model that represents the target person and the target object that is in contact with the target person, an objective function that represents matching between coordinates of the integrated three-dimensional point cloud and surface coordinates of the three-dimensional model and by obtaining a joint angle of the target person. The objective function is a first objective function that includes a function based on a distance between a hand end of the target person and the target object in a case where the distance between the hand end of the target person and the target object is less than or equal to a certain length.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
With the current skeleton recognition technique, although a hand end of a target person and a target object are actually in contact with each other, it may be recognized that they are not in contact with each other in some cases.
In one aspect, it is an object of the present disclosure to improve the accuracy of recognition of a contact between a hand end of a target person and a target object.
By using a plurality of detection devices 32, the point cloud generation unit 12 measures distances from the detection devices 32 to a target person and to a target object and generates depth images. The detection devices 32 may be, for example, three-dimensional laser sensors. The three-dimensional laser sensors may be Micro Electro Mechanical Systems (MEMS) mirror type laser sensors that employ Light Detection and Ranging (LiDAR) technology. The target person may be, for example, a gymnast. The target object may be, for example, a gymnastics apparatus. In the present embodiment, the gymnastics apparatus is a horizontal bar.
Based on time periods from when a laser pulse is projected from a light projecting unit of each of the plurality of detection devices 32 to when reflected light reflected by the target person and reflected light reflected by the target object are received by a light-receiving unit, the point cloud generation unit 12 measures distances to the target person and to the target object and generates a depth image. The point cloud generation unit 12 generates three-dimensional point clouds from the respective depth images each generated using a corresponding one of the plurality of detection devices 32, and by integrating the generated three-dimensional point clouds, generates an integrated three-dimensional point cloud.
To obtain multi-viewpoint depth images of the target person and the target object, the plurality of detection devices 32 are used as illustrated in
By combining, for example, skeleton recognition and fitting, the skeleton recognition unit 14 extracts three-dimensional coordinates of each joint that constitutes the human body, from the integrated three-dimensional point cloud generated by the point cloud generation unit 12. In skeleton recognition, the three-dimensional skeleton coordinates are inferred by using, for example, a trained inference model. The inference model may be created on, for example, a convolutional-neural-network-based (CNN-based) deep learning network.
In fitting, by using a result of fitting in the previous frame or the like as an initial value, a three-dimensional model that represents the target person and the target object is applied to the integrated three-dimensional point cloud generated by the point cloud generation unit 12. By defining an objective function that represents a likelihood representing a degree of matching between coordinates of the integrated three-dimensional point cloud and surface coordinates of the three-dimensional model and by determining joint angles with the highest likelihood through optimization, three-dimensional skeleton coordinates are determined. In the example in
As illustrated in
Because there is a state in which the target person and the target object are not in contact with each other, a model in which the three-dimensional model of the human body and the three-dimensional model of the bar member are not coupled to each other is used. The expression “be in contact” refers to a state in which the target person and the target object are coupled to each other, and encompasses, for example, a state in which the target person is gripping the target object.
The skill recognition unit 16 recognizes a break between basic moves from time-series data of the three-dimensional skeleton coordinates, which is a result of the fitting, and determines a feature quantity and a basic move for each divisional piece of the time-series data. The break between basic moves, the feature quantity, the basic moves, and the like are determined based on rules or through machine learning. The skill recognition unit 16 recognizes basic skills by using, as a parameter, the feature quantity related to the basic moves, and recognizes skill information subjected to scoring by comparing the consecutive basic skills with a skill dictionary 34, which is a database created in advance.
The scoring support unit 18 generates, for example, a multi-angle view illustrated in
In the multi-angle view, the three-dimensional skeleton coordinates may be displayed from viewpoints such as front, side, and plan, for example. In the skill recognition view, for example, the time-series skill recognition result, the group number of the skill, the difficulty of the skill, the difficulty value point, the score indicating the difficulty of all the demonstrated skills, and the like may be displayed. As illustrated in
The objective function adjustment unit 22 initializes the objective function to an objective function equivalent to a second objective function represented, for example, by Equation (1). Equation that represents the degree of agreement between the integrated three-dimensional point cloud and the three-dimensional model (degree of agreement between point cloud and model) may be determined based on an existing technique.
Objective function=(Degree of agreement between point cloud and model) (1)
When a distance d1 between the target object and a hand end of the left hand of the target person is less than or equal to a certain length, the objective function adjustment unit 22 adds a function f(d1) based on the distance between the target object and the hand end of the left hand of the target person to the initialized objective function as represented by Equation (2). In this manner, the objective function adjustment unit 22 adjusts the objective function to an objective function equivalent to a first objective function. This is done for adjusting a measurement error because of which it is determined that the target object and the hand end of the left hand are not in contact with each other despite the fact that they are in contact with each other.
Objective function=(Degree of agreement between point cloud and model)+f(d1) (2)
In the case of a horizontal bar event, for example, a measurement error because of which it is determined that a bar member is not gripped by the left hand of the athlete despite the fact that the bar member is gripped by the left hand of the athlete is adjusted.
When a distance d2 between the target object and a hand end of the right hand of the target person is less than or equal to the certain length, the objective function adjustment unit 22 adds a function f(d2) based on the distance between the target object and the hand end of the right hand of the target person to the initialized objective function as represented by Equation (3). In this manner, the objective function adjustment unit 22 adjusts the objective function to an objective function equivalent to the first objective function. This is done for correcting a measurement error because of which it is determined that the target object and the hand end of the right hand are not in contact with each other despite the fact they are in contact with each other.
Objective function=(Degree of agreement between point cloud and model)+f(d2) (3)
In the case of the horizontal bar event, for example, a measurement error because of which it is determined that the bar member is not gripped by the right hand of the athlete despite the fact that the bar member is gripped by the right hand of the athlete is corrected.
When the distance d1 between the target object and the hand end of the left hand of the target person and the distance d2 between the target object and the hand end of the right hand of the target person are less than or equal to the certain length, the objective function adjustment unit 22 adds the function f(d1) and the function f(d2) to the initialized objective function as represented by Equation (4). In this manner, the objective function adjustment unit 22 adjusts the objective function to an objective function equivalent to the first objective function. This is done for adjusting a measurement error because of which it is determined that the target object and the hand ends of both hands are not in contact with each other despite the fact they are in contact with each other. In the case of the horizontal bar event, for example, a measurement error because of which it is determined that the bar member is not gripped by both hands of the athlete despite the fact that the bar member is gripped by both hands of the athlete is adjusted.
Objective function=(Degree of agreement between point cloud and model)+f(d1)+f(d2) (4)
Let d denote a distance between the bar member B and a hand end H of an athlete. Then, a function f(d) based on the distance d between the bar member B and the hand end H may be calculated using Equation (5) as an example.
f(d)=d2=h·h(h·e)2 (5)
As illustrated in
By performing fitting for applying the three-dimensional model to the three-dimensional point cloud and by determining, with the adjusted objective function, joint angles with the highest likelihood through optimization, the optimization unit 24 determines the three-dimensional skeleton coordinates.
When the distance d1 between the target object and the hand end of the left hand of the target person exceeds the certain length and the distance d2 between the target object and the hand end of the right hand of the target person exceeds the certain length, the objective function adjustment unit 22 does not adjust the initialized objective function represented by Equation (1). By performing fitting for applying the three-dimensional model to the three-dimensional point cloud and by determining, with the not-adjusted objective function, joint angles with the highest likelihood through optimization, the optimization unit 24 determines the three-dimensional skeleton coordinates.
In the present embodiment, the accuracy of recognition of a contact may be improved by adjusting the objective function when the distance between the hand end of the target person and the target object is small, for example, is less than or equal to the certain length. For example, the certain length may be 20 cm to 30 cm.
In the present embodiment, for each hand of the left hand and the right hand, the objective function is adjusted by adding the function based on the distance between the target object and the hand end of the target person when the distance between the target object and the hand end of the target person is less than or equal to the certain length. Thus, the objective function may be applied in any of the case where the hand end of any one of the hands is in contact with the target object, the case where both hand ends are in contact with the target object, and the case where neither hand ends are in contact with the target object. For example, the objective function may be applied in any of the case where the bar member is gripped by the athlete with the hand end of any one of the hands, the case where the bar member is gripped with both hand ends, and the case where the bar member is gripped with neither hand ends.
The detection devices 32 project laser onto the target person and the target object. When only part of a spot, which is a cross section of the laser, hits the target person or the target object, the remaining part of the spot hits a different object located at a position farther than the target object from the detection devices 32. As a result, the target person and the target object may be recognized to be located farther than the actual distances from the detection devices 32 in some cases.
For example, as illustrated in
The CPU 52 is an example of a processor that is hardware. The CPU 52, the RAM 54, the SSD 56, and the external interface 58 are coupled to each other through a bus 72. The CPU 52 may be a single processor or may be a plurality of processors. In place of the CPU 52, for example, a graphics processing unit (GPU) may be used.
The RAM 54 is a volatile memory and is an example of a primary storage device. The SSD 56 is a nonvolatile memory and is an example of a secondary storage device. The secondary storage device may be a hard disk drive (HDD) or the like in addition to or instead of the SSD 56.
The secondary storage device includes a program storage area, a data storage area, and so on. The program storage area stores a program such as an artistic gymnastics scoring support program as an example. The data storage area may store, for example, three-dimensional point cloud data, a skill dictionary, artistic gymnastics scoring results, and so on.
By loading the program such as the artistic gymnastics scoring support program from the program storage area and executing the program through the RAM 54, the CPU 52 operates as the point cloud generation unit 12, the skeleton recognition unit 14, the skill recognition unit 16, and the scoring support unit 18 illustrated in
The program such as the artistic gymnastics scoring support program may be stored in an external server and may be loaded by the CPU 52 via a network. The program such as the artistic gymnastics scoring support program may be recorded on a non-transitory recording medium such as a Digital Versatile Disc (DVD) and may be loaded by the CPU 52 through a recording medium reading device.
An external device is coupled to the external interface 58. The external interface 58 is responsible for transmission and reception of various kinds of information between the external device and the CPU 52.
By defining an objective function that represents a likelihood representing a degree of matching between coordinates of the integrated three-dimensional point cloud and surface coordinates of the three-dimensional model of the athlete and by determining, through optimization, joint angles with the highest likelihood, the CPU 52 determines three-dimensional skeleton coordinates. In step 108, the CPU 52 recognizes basic skills from time-series data of the three-dimensional skeleton coordinates obtained in step 106, and recognizes skills subjected to scoring by comparing the skills with the skill dictionary 34 in time series. In step 110, the CPU 52 performs scoring by using the skill recognition result or the like obtained in step 108. In step 112, the CPU 52 displays, on the display 64, the multi-angle view, the skill recognition view, and the like for supporting a judge in scoring.
In step 114, the CPU 52 determines whether or not a distance between the bar member and the hand end of the left hand in the integrated three-dimensional point cloud of the previous frame obtained by the three-dimensional laser sensors 62 is less than or equal to a certain length. If the determination in step 114 is positive, the CPU 52 adjusts the objective function by adding a function based on the distance between the bar member and the hand end of the left hand to the objective function as represented, for example, by Equation (2). If the determination in step 114 is negative, the objective function is not adjusted.
In step 118, the CPU 52 determines whether or not a distance between the bar member and the hand end of the right hand in the previous frame is less than or equal to the certain length. If the determination in step 118 is positive, the CPU 52 adds a function based on the distance between the bar member and the hand end of the right hand to the objective function as represented, for example, by Equation (3) or Equation (4). If the determination in step 114 is negative and the determination in step 118 is positive, the objective function is adjusted as represented, for example, by Equation (3). If the determination in step 114 and the determination in step 118 are positive, the objective function is adjusted as represented, for example, by Equation (4).
If the determination in step 114 and the determination in step 118 are negative, the objective function is not adjusted. In step 122, the GPU 52 determines the three-dimensional skeleton coordinates of the athlete by optimizing the objective function that is adjusted or not adjusted in steps 114 to 120. The processing in steps 112 to 122 is applied to each frame obtained by the three-dimensional laser sensors 62.
The present embodiment is not limited to the scoring support apparatus for the horizontal bar event of gymnastics, and may be applied to scoring support and training support of various sports. The present embodiment may be applied to creation of entertainment materials such as movies, skill analysis in handicrafts or the like, training support, and so on.
The present embodiment is not limited to improvement of the accuracy of recognition of a contact between a hand end of a target person and a target object. For example, the present embodiment may be applied to improvement of the accuracy of recognition of a contact between a foot end of a target person and a target object, improvement of the accuracy of recognition of a contact between hand ends of a target person, improvement of the accuracy of recognition of a contact between hand ends of two or more target persons, and so on.
In the present embodiment, an integrated three-dimensional point cloud is obtained by integrating three-dimensional point clouds obtained by detecting a target person and a target object that is in contact with the target person from a plurality of directions with a plurality of detection devices. Skeleton information of the target person is recognized by optimizing, based on the integrated three-dimensional point cloud and a three-dimensional model that represents the target person and the target object, an objective function that represents matching between coordinates of the integrated three-dimensional point cloud and surface coordinates of the three-dimensional model and by obtaining a joint angle of the target person. The skeleton information of the target person is recognized by performing optimization using, as the objective function, a first objective function that includes a function based on a distance between a hand end of the target person and the target object in a case where the distance between the hand end of the target person and the target object is less than or equal to a certain length.
According to the present embodiment, the accuracy of recognition of a contact between a hand end of a target person and a target object may be improved.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2021-055639 | Mar 2021 | JP | national |