The present invention relates to an image processing apparatus, an image processing method, and a program.
A technique related to the present invention is disclosed in Patent Documents 1 to 3 and Non-Patent Document 1.
Patent Document 1 discloses a technique for computing a feature value of each of a plurality of keypoints of a human body included in an image, searching for an image including a human body with a similar pose and a human body with a similar movement, based on the computed feature value, and putting together the similar poses and the similar movements and classifying. Further, Non-Patent Document 1 discloses a technique related to skeleton estimation of a person.
Patent Document 2 discloses a technique for performing learning of a discriminator that classifies, after a plurality of images in which a predetermined area is captured and information indicating a change in a situation of the predetermined area are acquired, the plurality of images, based on the information indicating the change in the situation of the predetermined area, and decides the situation of the predetermined area from the image by using at least a part of the plurality of images.
Patent Document 3 discloses a technique for detecting a state change of a target in a person, based on an input image, and deciding an abnormal state in response to detection of occurrence of the state change of the target in a plurality of people.
According to the technique disclosed in Patent Document 1 described above, a human body with a desired pose and a desired movement can be detected from an image being a processing target by preregistering, as a template image, an image including a human body with a desired pose and a desired movement. As a result of discussing such a technique disclosed in Patent Document 1, the present inventor has newly found out that a human body with a desired pose and a desired movement can be detected without an omission by newly and additionally registering, as a template image, an image including a human body with a variety of poses and movements that are not decided to be the same or the same kind as a pose and a movement indicated by a registered template image but that are similar. Then, the present inventor has newly found out that there is room for improvement in workability of work for finding an image including a human body with a variety of poses and movements that are not decided to be the same or the same kind as a pose or a movement indicated by such a registered template image but that are similar.
All of Patent Documents 1 to 3 and Non-Patent Document 1 described above do not disclose a problem related to a template image and a solution to the problem, and thus have a problem that the problem described above cannot be solved.
One example of an object of the present invention is, in view of the problem described above, to provide an image processing apparatus, an image processing method, and a program that solve a problem of workability of work for registering, as a template image, an image including a human body with a variety of poses and movements that are not decided to be the same or the same kind as a pose and a movement indicated by a registered template image but that are similar.
One aspect of the present invention provides an image processing apparatus including:
Further, one aspect of the present invention provides an image processing method including,
Further, one aspect of the present invention provides a program causing a computer to function as:
According to one aspect of the present invention, an image processing apparatus, an image processing method, and a program that solve a problem of workability of work for registering, as a template image, an image including a human body with a variety of poses and movements that are not decided to be the same or the same kind as a pose and a movement indicated by a registered template image but that are similar are acquired.
The above-described object, the other objects, features, and advantages will become more apparent from suitable example embodiment described below and the following accompanying drawings.
Hereinafter, example embodiments of the present invention will be described with reference to the drawings. Note that, in all of the drawings, a similar component has a similar reference sign, and description thereof will be appropriately omitted.
The skeleton structure detection unit 11 performs processing of detecting a keypoint of a human body included in an image. The similarity degree computation unit 12 computes a degree of similarity between a pose or a movement of a human body detected from the image and a pose or a movement of a human body indicated by a preregistered template image, based on the detected keypoint. The determination unit 13 determines a place in the image where a human body also with the degree of similarity to a pose or a movement of a human body indicated by any template image to be less than a first threshold value and satisfying a first similarity condition to a pose or a movement of a human body indicated by any template image is captured. The output unit 14 outputs information indicating the determined place or a partial image acquired by cutting the place out of the image, as a candidate for the template image to be additionally registered in a decision apparatus that decides a pose or a movement of a human body detected from the image, based on a pose or a movement of a human body indicated by the template image.
The image processing apparatus 10 can solve a problem of workability of work for registering, as a template image, an image including a human body with a variety of poses and movements that are not decided to be the same or the same kind as a pose or a movement indicated by a registered template image but that are similar.
An image processing apparatus 10 computes a degree of similarity between a pose or a movement of a human body included in an image (hereinafter simply referred to as an “image”) being an original of a template image and a pose or a movement of a human body indicated by a preregistered template image, and then determines a place in the image where a human body also with the degree of similarity to a pose or a movement of a human body indicated by any template image to be less than a first threshold value but satisfying a first similarity condition to the pose or the movement of the human body indicated by any template image is captured. Then, the image processing apparatus 10 outputs information indicating the determined place or a partial image acquired by cutting the determined place out of the image, as a candidate for the template image to be additionally registered in a decision apparatus. The decision apparatus performs detection processing using a registered template image, and the like, and, in a case where the above-described degree of similarity is equal to or more than the first threshold value, the decision apparatus decides that the pose or the movement of the human body detected from the image is the same or the same kind as the pose or the movement of the human body indicated by the template image.
Such an image processing apparatus 10 can determine a place in an image where, in a group of human bodies detected from the image, a human body not decided to have a pose or a movement being the same or the same kind as a pose or a movement of a human body indicated by any template image but with a similar pose or a similar movement is captured, and can output information about the determined place. Description is given in more detail by using
In the second example embodiment, as illustrated in
Next, one example of a hardware configuration of the image processing apparatus 10 will be described. Each functional unit of the image processing apparatus 10 is achieved by any combination of hardware and software concentrating on a central processing unit (CPU) of any computer, a memory, a program loaded into the memory, a storage unit (that can also store a program downloaded from a storage medium such as a compact disc (CD), a server on the Internet, and the like in addition to a program previously stored at a stage of shipping of an apparatus) such as a hard disk that stores the program, and a network connection interface. Then, various modification examples of an achievement method and an apparatus thereof are understood by a person skilled in the art.
The bus 5A is a data transmission path for the processor 1A, the memory 2A, the peripheral circuit 4A, and the input/output interface 3A to transmit and receive data to and from one another. The processor 1A is an arithmetic processing apparatus such as a CPU and a graphics processing unit (GPU), for example. The memory 2A is a memory such as a random access memory (RAM) and a read only memory (ROM), for example. The input/output interface 3A includes an interface for acquiring information from an input apparatus, an external apparatus, an external server, an external sensor, a camera, and the like, an interface for outputting information to an output apparatus, an external apparatus, an external server, and the like, and the like. The input apparatus is, for example, a keyboard, a mouse, a microphone, a physical button, a touch panel, and the like. The output apparatus is, for example, a display, a speaker, a printer, a mailer, and the like. The processor 1A can output an instruction to each of modules, and perform an arithmetic operation, based on an arithmetic result of the modules.
The skeleton structure detection unit 11 performs processing of detecting a keypoint of a human body included in an image.
An “image” is an image being an original of a template image. The template image is an image being preregistered in the technique disclosed in Patent Document 1 described above, and is an image including a human body with a desired pose and a desired movement (a pose and a movement desired to be detected by a user). The image may be a moving image formed of a plurality of frame images, and may be a still image formed of one image.
The skeleton structure detection unit 11 detects N (N is an integer of two or more) keypoints of a human body included in an image. In a case where a moving image is a processing target, the skeleton structure detection unit 11 performs processing of detecting a keypoint for each frame image. The processing by the skeleton structure detection unit 11 is achieved by using the technique disclosed in Patent Document 1. Although details will be omitted, in the technique disclosed in Patent Document 1, detection of a skeleton structure is performed by using a skeleton estimation technique such as OpenPose disclosed in Non-Patent Document 1. A skeleton structure detected in the technique is formed of a “keypoint” being a characteristic point such as a joint and a “bone (bone link)” indicating a link between keypoints.
For example, the skeleton structure detection unit 11 extracts a feature point that may be a keypoint from an image, refers to information acquired by performing machine learning on the image of the keypoint, and detects N keypoints of a human body. The detected N keypoints are predetermined. There is variety in the number (i.e., the number of N) of detected keypoints and which portion of a human body a keypoint is used to detect, and various variations can be adopted.
Hereinafter, as illustrated in
Returning to
There are various ways of computing a degree of similarity of a pose or a movement of a human body described above, and various techniques can be adopted. For example, the technique disclosed in Patent Document 1 may be adopted. Further, the same technique as the technique of the decision apparatus that computes a degree of similarity between a pose or a movement of a human body indicated by a template image and a pose or a movement of a human body detected from an image, and detects a human body with the degree of similarity equal to or more than a first threshold value as a human body with a pose or a movement being the same or the same kind as the human body indicated by the template image may be adopted. Hereinafter, one example will be described, which is not limited thereto.
As one example, by computing a feature value of a skeleton structure indicated by a detected keypoint, and computing a degree of similarity between a feature value of a skeleton structure of a human body detected from an image and a feature value of a skeleton structure of a human body indicated by a template image, the similarity degree computation unit 12 may compute a degree of similarity between poses of the two human bodies.
The feature value of the skeleton structure indicates a feature of a skeleton of a person, and is an element for classifying a state (a pose and a movement) of the person, based on the skeleton of the person. This feature value normally includes a plurality of parameters. Then, the feature value may be a feature value of the entire skeleton structure, may be a feature value of a part of the skeleton structure, or may include a plurality of feature values as in each portion of the skeleton structure. A method for computing a feature value may be any method such as machine learning and normalization, and a minimum value and a maximum value may be acquired as normalization. As one example, the feature value is a feature value acquired by performing machine learning on the skeleton structure, a size of the skeleton structure from a head to a foot on an image, a relative positional relationship among a plurality of keypoints in an up-down direction in a skeleton region including the skeleton structure on the image, a relative positional relationship among a plurality of keypoints in the left-right direction in the skeleton structure, an and the like. The size of the skeleton structure is a height in the up-down direction, an area, and the like of a skeleton region including the skeleton structure on an image. The up-down direction (a height direction or a vertical direction) is a direction (Y-axis direction) of up and down in an image, and is, for example, a direction perpendicular to the ground (reference surface). Further, the left-right direction (a horizontal direction) is a direction (X-axis direction) of left and right in an image, and is, for example, a direction parallel to the ground.
Note that, in order to perform classification desired by a user, a feature value with robustness with respect to decision processing is preferably used. For example, in a case where a user desires decision that does not depend on an orientation and a body shape of a person, a feature value that is robust with respect to the orientation and the body shape of the person may be used. A feature value that does not depend on an orientation and a body shape of a person can be acquired by learning skeletons of persons facing in various directions with the same pose and skeletons of persons with various body shapes with the same pose, and extracting a feature only in the up-down direction of a skeleton. One example of the processing of computing a feature value of a skeleton structure is disclosed in Patent Document 1.
In this example, the feature value of the keypoint indicates a relative positional relationship among a plurality of keypoints in the up-down direction in a skeleton region including a skeleton structure on an image. Since the key point A2 of the neck is the reference point, a feature value of the key point A2 is 0.0 and a feature value of a key point A31 of a right shoulder and a key point A32 of a left shoulder at the same height as the neck is also 0.0. A feature value of a key point Al of a head higher than the neck is −0.2. A feature value of a key point A51 of a right hand and a key point A52 of a left hand lower than the neck is 0.4, and a feature value of the key point A81 of the right foot and the key point A82 of the left foot is 0.9. In a case where the person raises the left hand from this state, the left hand is higher than the reference point as in
There are various ways of computing a degree of similarity of a pose indicated by such a feature value. For example, after a degree of similarity between feature values is computed for each keypoint, a degree of similarity between poses may be computed based on the degree of similarity between the feature values of the plurality of keypoints. For example, an average value, a maximum value, a minimum value, a mode, a medium value, a weighted average value, a weighted sum, and the like of a degree of similarity between feature values of a plurality of keypoints may be computed as a degree of similarity between poses. In a case where a weighted average value and a weighted sum are computed, a weight of each keypoint may be able to be set by a user, or may be predetermined.
Further, a movement is represented as a time change in a plurality of poses. Thus, for example, the similarity degree computation unit 12 may compute a degree of similarity of a pose by the above-described technique for each combination of a plurality of frame images associated with each other, and then compute, as a degree of similarity of a movement, a statistic (such as an average value, a maximum value, a minimum value, a mode, a medium value, a weighted average value, and a weighted sum) of the degree of similarity of the pose computed for each combination of the plurality of frame images.
Returning to
First, processing of determining a human body (human body belonging to the groups of (2) and (3) in
The determination unit 13 compares a degree of similarity between a pose or a movement of a human body detected from an image and a pose or a movement of a human body indicated by each of a plurality of template images with a first threshold value. Then, the determination unit 13 determines a human body also with the degree of similarity to a pose or a movement of a human body indicated by any template image to be less than the first threshold value, based on a result of the comparison.
Note that, the decision apparatus decides a pose or a movement of a human body detected from an image, based on a pose or a movement of a human body indicated by a template image. Specifically, in a case where the above-described degree of similarity is equal to or more than the first threshold value, the decision apparatus decides that the pose or the movement of the human body detected from the image is the same or the same kind as the pose or the movement of the human body indicated by the template image. In other words, the above-described processing by the determination unit 13 determines a place in an image where, in a group of human bodies detected from the image, a human body not decided by the decision apparatus to have a pose or a movement being the same or the same kind as a pose or a movement of a human body indicated by any template image is captured.
Next, processing of determining a place in an image where a human body (human body belonging to the group of (2) in
The determination unit 13 determines a human body belonging to the groups of (2) and (3) in
The first similarity condition includes at least one of
In a case where the plurality of exemplified conditions described above are included, the first similarity condition can have a content in which the plurality of conditions are connected by a logical operator such as “or”. Hereinafter, each of the exemplified conditions described above will be described.
“A degree of similarity to a pose or a movement of a human body indicated by a template image is equal to or more than a second threshold value and is less than a first threshold value.”
A “degree of similarity” of the condition is a value computed by the same method as the computation method by the similarity degree computation unit 12 described above. Then, the second threshold value is a value smaller than the first threshold value.
By appropriately setting the second threshold value, a human body (human body belonging to the group of (2) in
“A degree of similarity to a pose or a movement of a human body indicated by a template image computed based on a part of keypoints among a plurality of keypoints (N keypoints) detected from each human body is equal to or more than a third threshold value”,
Whether to use any keypoint is a design manner, but may be able to be specified by a user, for example. The user can specify a keypoint of a body portion (for example, an upper body) to be seriously considered, and remove a keypoint of a body portion (for example, a lower body) not to be seriously considered from specification.
By appropriately setting the third threshold value, a human body (human body belonging to the group of (2) in
“A degree of similarity to a pose or a movement of a human body indicated by a template image computed in consideration of a weighted value provided to each of a plurality of keypoints detected from each human body is equal to or more than a fourth threshold value.”
A “degree of similarity” of the condition is a value computed by providing a weight to a plurality of keypoints (N keypoints) being a detection target. For example, after a degree of similarity between feature values is computed for each keypoint by adopting the same method as the computation method by the similarity degree computation unit 12 described above, a weighted average value or a weighted sum of the degree of similarity between the feature values of the plurality of keypoints is computed as a degree of similarity between poses by using the above-described weighted value. A weight of each keypoint may be able to be set by a user, or may be predetermined.
By appropriately setting the fourth threshold value, a human body (human body belonging to the group of (2) in
“Including a plurality of frame images indicating each human body with a pose in which a degree of similarity to a pose of a human body indicated by each of frame images in a predetermined proportion or more among a plurality of frame images included in a template image being a moving image is equal to or more than a fifth threshold value.”
The condition is used in a case where an image and a template image are a moving image, and a movement of a human body is indicated by a time change in a pose of the human body indicated by each of the plurality of template images included in the moving image.
For example, a template image is formed of M frame images, and a plurality of frame images including each human body with a pose similar to, at a predetermined level or higher (with a degree of similarity equal to or more than the fifth threshold value), a pose of a human body indicated by each of frame images in a predetermined proportion or more (for example, 70 percent or more) among the M frame images satisfy the condition. As a technique for computing a degree of similarity between poses for each combination of a plurality of frame images associated with each other, the same method as the computation method by the similarity degree computation unit 12 described above can be adopted.
By appropriately setting the fifth threshold value and the predetermined proportion, a human body (human body belonging to the group of (2) in
Note that, in a case where an image is a still image, a “place determined by the determination unit 13” is a partial region in one still image. In this case, for each still image, the above-described place is indicated by, for example, coordinates in a coordinate system set in the still image. On the other hand, in a case where an image is a moving image, a “place determined by the determination unit 13” is a partial region in each frame image being a part of a plurality of frame images in the moving image. In this case, for each moving image, the above-described place is indicated by, for example, information (such as frame identification information and an elapsed time from the beginning) indicating the frame image being a part of the plurality of frame images, and coordinates in a coordinate system set in the frame image.
The output unit 14 outputs information indicating the place determined by the determination unit 13 or a partial image acquired by cutting, out of the image, the place determined by the determination unit 13, as a candidate for the template image to be additionally registered in the decision apparatus. Note that, in a case where the output unit 14 outputs a partial image, the image processing apparatus 10 can include a processing unit that generates a partial image by cutting, out of an image, a place determined by the determination unit 13. Then, the output unit 14 can output the partial image generated by the processing unit.
A “place determined by the determination unit 13” described above, i.e., a place in the image where a human body also with the degree of similarity to a pose or a movement of a human body indicated by any template image to be less than a first threshold value and satisfying a first similarity condition to a pose or a movement of a human body indicated by any template image is captured is a candidate for the template image. A user can select, as the template image, a place including a human body with a desired pose and a desired movement from the candidates by viewing the above-described place, based on the above-described information or the above-described partial image, and the like.
In a column of the similar sample image, information (such as a file name of an image) indicating a template image satisfying a first similarity condition to each human body is entered. In this way, the output unit 14 can further output information indicating a template image satisfying the first similarity condition to a human body captured at a place determined by the determination unit 13.
Next, one example of a flow of processing of the image processing apparatus 10 will be described by using a flowchart in
After the image processing apparatus 10 performs processing of detecting a keypoint of a human body included in an image (S10), the image processing apparatus 10 computes a degree of similarity between a pose or a movement of a human body detected from the image and a pose or a movement of a human body indicated by a preregistered template image, based on the detected keypoint (S11).
Next, the image processing apparatus 10 determines, as a candidate for the template image to be additionally registered in the decision apparatus, a place in the image where a human body also with the degree of similarity to a pose or a movement of a human body indicated by any template image to be less than a first threshold value and satisfying a first similarity condition to a pose or a movement of a human body indicated by any template image is captured (S12).
Specifically, the image processing apparatus 10 compares the degree of similarity between the pose or the movement of the human body detected from the image and the pose or the movement of the human body indicated by each of the plurality of template images with the first threshold value. Then, the image processing apparatus 10 determines the human body (human body belonging to the groups of (2) and (3) in
The decision apparatus performs detection processing using a registered template image, and the like, and, in a case where the above-described degree of similarity is equal to or more than the first threshold value, the decision apparatus decides that the pose or the movement of the human body detected from the image is the same or the same kind as the pose or the movement of the human body indicated by the template image.
Then, the image processing apparatus 10 outputs information indicating the place determined in S12 or a partial image acquired by cutting the place determined in S12 out of the image (S13).
The image processing apparatus 10 according to the second example embodiment can achieve an advantageous effect similar to that in the first example embodiment. Further, the image processing apparatus 10 according to the second example embodiment can output information about a place in an image where, in a group of human bodies detected from the image, a human body not decided by the decision apparatus to have a pose or a movement being the same or the same kind as a pose or a movement of a human body indicated by any template image but with a similar pose or a similar movement is captured.
Description is given in more detail by using
While the example embodiments of the present invention have been described with reference to the drawings, the example embodiments are only exemplification of the present invention, and various configurations other than the above-described example embodiments can also be employed.
Further, the plurality of steps (pieces of processing) are described in order in the plurality of flowcharts used in the above-described description, but an execution order of steps performed in each of the example embodiments is not limited to the described order. In each of the example embodiments, an order of illustrated steps may be changed within an extent that there is no harm in context. Further, each of the example embodiments described above can be combined within an extent that a content is not inconsistent.
A part or the whole of the above-described example embodiment may also be described in supplementary notes below, which is not limited thereto.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2022/005695 | 2/14/2022 | WO |