The present application claims priority under 35 U.S.C. § 119 to Japanese Patent Application No. 2023-112465, filed Jul. 7, 2023, the contents of which application are incorporated herein by reference in their entirety.
The present disclosure relates to a technique suitable for use in estimating a joint pose of an object including a human.
JP6433149B discloses a prior art for estimating a pose of an articulated object. According to the prior art, first, joint position candidates of the object are calculated from a range image including the object. Next, the consistency of the arrangement of the joint position candidates is evaluated based on tolerance information for arrangement relationship between joints in an articulated object model corresponding to the object. Then, based on the evaluation, the joint positions of the object are determined from the joint position candidates to estimate the pose of the object.
However, occlusion and truncation are not considered in the above-mentioned prior art. Therefore, in a difficult situation including occlusion or truncation, the estimation accuracy of the joint position may be reduced.
JP4709723B and JP5555207B can be exemplified as documents showing the technical level of the technical field related to the present disclosure in addition to JP6433149B.
The present disclosure has been made in view of the above-described problems, and an object thereof is to provide a technique capable of accurately estimating a two-dimensional joint pose of an object even in a difficult situation including occlusion and truncation.
The present disclosure provides a joint pose estimation technique to achieve the above object. The joint pose estimation technique according to the present disclosure includes a joint pose estimation method, a joint pose estimation system, and a joint pose estimation program.
The joint pose estimation method of the present disclosure includes the following first to fourth steps. The first step is a step of estimating a two-dimensional joint pose of a target object belonging to an articulated object represented by a plurality of joints from an image of the target object. A two-dimensional joint pose model is used to estimate the two-dimensional joint pose. The two-dimensional joint pose model is a model configured to output a two-dimensional joint pose of an articulated object and a confidence score of estimated position of each joint upon an input of an image of the articulated object. The second step is a step of generating a query pose by removing a joint whose confidence score is lower than a threshold from the two-dimensional joint pose of the target object. The third step is a step of obtaining a sample two-dimensional joint pose closest to the query pose from a database in which a plurality of sample two-dimensional joint poses is registered for each basic joint pose of a sample articulated object. The fourth step is a step of correcting the two-dimensional joint pose of the target object by replacing the joint whose confidence score is lower than the threshold with a corresponding joint of the sample two-dimensional joint pose closest to the query pose.
The joint pose estimation system of the present disclosure comprises at least one processor and a program memory coupled to the at least one processor and storing a plurality of instructions. The plurality of instructions is configured to cause the at least one processor to execute the following first to fourth processes. The first process is a process of estimating a two-dimensional joint pose of a target object belonging to an articulated object represented by a plurality of joints from an image of the target object by using a two-dimensional joint pose model. The second process is a process of generating a query pose by removing a joint whose confidence score is lower than a threshold from the two-dimensional joint pose of the target object. The third process is a process of obtaining a sample two-dimensional joint pose closest to the query pose from a database in which a plurality of sample two-dimensional joint poses is registered for each basic joint pose of a sample articulated object. The fourth process is a process of correcting the two-dimensional joint pose of the target object by replacing the joint whose confidence score is lower than the threshold with a corresponding joint of the sample two-dimensional joint pose closest to the query pose.
The joint pose estimation program according to the present disclosure comprises a plurality of instructions executable by at least one processor. The plurality of instructions is configured to cause the at least one processor to perform the following first to fourth processes. The first process is a process of estimating a two-dimensional joint pose of a target object belonging to an articulated object represented by a plurality of joints from an image of the target object by using a two-dimensional joint pose model. The second process is a process of generating a query pose by removing a joint whose confidence score is lower than a threshold from the two-dimensional joint pose of the target object. The third process is a process of obtaining a sample two-dimensional joint pose closest to the query pose from a database in which a plurality of sample two-dimensional joint poses is registered for each basic joint pose of a sample articulated object.
The fourth process is a process of correcting the two-dimensional joint pose of the target object by replacing the joint whose confidence score is lower than the threshold with a corresponding joint of the sample two-dimensional joint pose closest to the query pose. The joint pose estimation program according to the present disclosure may be stored in a non-transitory computer-readable storage medium or may be provided via a network.
According to the joint pose estimation technique of the present disclosure, among the joints constituting the two-dimensional joint pose estimated using the two-dimensional joint pose model, the joint having a high confidence score of the estimated position maintains the original position, and only the joint having a low confidence score of the estimated position is replaced with the corresponding joint of the closest sample two-dimensional joint pose. This makes it possible to accurately estimate the two-dimensional joint pose of the target object even in a difficult situation such as occlusion or truncation.
First, the 2D joint pose estimation unit 110 will be described. A 2D joint pose model 120 is used to estimate the 2D joint pose estimate. The 2D joint pose model 120 is a model (neural network) used for estimating a 2D joint pose of an articulated object. The 2D joint pose model 120 is prepared for each type of target object. In the following description, it is assumed that the target object is a human, and the 2D joint pose model 120 has been learned for estimating the 2D joint pose of the human.
As the 2D joint pose model 120, a joint position model based on a top-down approach is particularly effective. The architectures Transpose, AlphaPose, VitPose, RMPE, and Location-free Human Pose Estimation described in the following papers are examples of top-down joint position models that can be used as the 2D joint pose model 120.
According to the 2D joint pose model 120 such as the Transpose architecture, the joint position of an articulated object can be estimated very accurately under normal circumstances. A normal situation means a situation without occlusion or truncation. However, under difficult conditions including occlusion and truncation, the accuracy of joint position estimation by the 2D joint pose model 120 is reduced.
As illustrated in
Here, a sample of the articulated object to which the target belongs is defined as a sample articulated object. Although the joint poses that the sample articulated object can take are infinite, they can be three dimensionally classified into a finite number of basic joint poses. The joint pose of the target object corresponds to any of the basic joint poses of the sample articulated objects. Therefore, even if a part is hidden on the image due to occlusion or truncation, if the basic joint pose of the sample articulated object corresponding to the joint pose of the target human is known, it is possible to estimate the correct 2D joint pose of the target object from the basic joint pose with high accuracy.
In the joint pose estimation method according to the present embodiment, a large number of sample 2D joint poses with different viewpoints are prepared for one basic joint pose by projecting the basic joint pose of the sample articulated object onto a plane from various directions. Each sample 2D joint pose is registered in the database in association with the corresponding basic joint pose. In the joint pose estimation method according to the present embodiment, a query pose generated from an output of the 2D joint pose model 120 is used as query information for searching the database.
The query pose is generated by the query pose generation unit 130 shown in
The query pose generated by the query pose generation unit 130 is sent to the sample 2D joint pose obtainment unit 140. The sample 2D joint pose obtaining unit 140 includes a 2D/3D database 150. In the 2D/3D database 150, a three-dimensional (3D) pose representing a basic articulated pose of a sample articulated object and a 2D pose representing a sample 2D articulated pose are registered. Each piece of 3D pose data is associated with corresponding 2D pose data. For example, 100 sets of 3D pose information are registered in the 2D/3D database 150. For example, in the 2D/3D database 150, 70 sets of 2D pose information are registered for one set of 3D pose information, that is, a total of 7000 sets of 2D pose information are registered. The sample 2D joint pose obtaining unit 140 searches the 2D/3D database 150 using the query pose as query information, and obtains a sample 2D joint pose closest to the query pose from the 2D/3D database 150.
The sample 2D joint pose obtained by the sample 2D joint pose obtainment unit 140 is used to correct the estimated 2D joint pose obtained by the 2D joint pose estimation unit 110. Hereinafter, the sample 2D joint pose used for correcting the estimated 2D joint pose is referred to as a correction 2D joint pose 14. The correction of the estimated 2D joint pose is performed by the 2D joint pose correction unit 160 shown in
The position of each joint in the correction 2D joint pose is a natural joint position corresponding to a basic joint pose that the object can actually take. By correcting only a joint having a low confidence score of an estimated position to such a natural position while maintaining the original position of a joint having a high confidence score of an estimated position in the estimated 2D joint pose, the probability of the estimated 2D joint pose as a whole is improved. By performing such correction, it is possible to estimate a correct 2D joint pose of the target human with high accuracy even if a part of the target human is hidden by occlusion or truncation on the original image 10. The joint pose estimation system 100 outputs the estimated 2D joint pose corrected by the 2D joint pose correction unit 160 as the 2D joint pose 15 of the target human.
Next, a specific flow of the joint pose estimation method performed by the joint pose estimation system 100 will be described with reference to
In step A, in the image 10 of the target human input to the 2D joint pose model 120, the left leg of the target human is hidden by an obstacle and is not shown. In other words, occlusion occurs in the image 10. A key point heat map 11 corresponding to the image 10 is output from the 2D joint pose model 120.
In step B, an estimated 2D joint pose 12 is created from the keypoint heat map 11. Each joint of the estimated 2D joint pose 12 is provided with data indicating the confidence score of the estimated position. In the example shown in
In step C, a query pose 13 is generated by removing from the estimated 2D joint pose 12 those joints whose estimated position confidence is less than a threshold.
In step D, the 2D/3D database 150 is searched by nearest neighbor search using the normalized query pose 13 as query information.
In step E, the sample 2D joint pose obtained by searching the 2D/3D database 150 is obtained as the correction 2D joint pose 14. At this time, a 3D basic joint pose most related to the query pose 13 may be obtained together with the correction 2D joint pose 14.
In step F, among the joints constituting the estimated 2D joint pose 12, the joint whose confidence score is less than the threshold value, that is, the joint removed in the query pose 13 is replaced with the corresponding joint in the correction 2D joint pose 14. As a result, the 2D joint pose 15 is obtained which is estimated with high accuracy not only for the in the image 10 among the joints of the target human but also the joint position of the left foot hidden by the obstacle.
Finally, an example of a hardware configuration of the joint pose estimation system 100 according to the present embodiment will be described with reference to
The joint pose estimation system 100 includes a computer 200, a display device 220, and an input device 240. The computer 200 comprises a processor 202, a program memory 204 and a data storage 208. Processor 202 is coupled to program memory 204 and data storage 208.
The program memory 204 stores a plurality of executable instructions 206. The data storage 208 is, for example, a flash memory, an SSD, or an HDD, or a hard disk drive (HDD), and stores the image 10 and data required to execute the instructions 206. A portion of data storage 208 also comprises a 2D/3D database 150.
Instructions 206 comprise a joint pose estimation program. When some or all of the instructions 206 are executed by the processor 202, the functions of the 2D joint pose estimation unit 110, the query pose generation unit 130, the sample 2D joint pose obtainment unit 140, and the 2D joint pose correction unit 160 are implemented in the computer 200.
The display device 220 displays a calculation result by the computer 200. The input device 240 is, for example, a keyboard or a mouse, and receives an operation on the computer 200. The joint pose estimation system 100 may be configured by a plurality of computers connected via a network or may be configured by a server on the Internet.
Number | Date | Country | Kind |
---|---|---|---|
2023-112465 | Jul 2023 | JP | national |