The present disclosure relates to a height estimation apparatus, a height estimation method, and a non-transitory computer readable medium storing a program.
Recently, a technique in which an image of an animal such as a person is captured by a camera and an attribute of the person or the like is recognized from the captured image has been used. As a technique related to estimation of a height which is an attribute of a person or the like, for example, Patent Literature 1 to 3 is known. Patent Literature 1 describes a technique for estimating a height of a person based on a length of a long side or lengths of the long side and a short side of a person area in an image. Patent Literature 2 describes a technique for estimating a height of a person based on a distance image. Patent Literature 3 describes a technique for estimating a height using an imaging result captured by an X-ray CT apparatus. In addition, Non Patent Literature 1 is known as a technique related to skeleton estimation of a person.
As described above, in Patent Literature 1, since the height is estimated based on a size of the person area in the image, estimation accuracy of the height may be lowered depending on a posture of the person and an orientation of the person with respect to the camera. Further, in Patent Literature 2, it is essential to acquire the distance image, and in Patent Literature 3, a special contrast imaging has to be performed by an X-ray CT apparatus. For these reasons, there is a problem in the related art that it is difficult to accurately estimate the height from a two-dimensional image obtained by capturing the animal such as a person.
In view of such a problem, it is an object of the present disclosure to provide a height estimation apparatus, a height estimation method, and a non-transitory computer readable medium storing a program capable of improving accuracy of estimating a height.
A height estimation apparatus according to the present disclosure includes: acquisition means for acquiring a two-dimensional image obtained by capturing an animal; detection means for detecting a two-dimensional skeletal structure of the animal based on the acquired two-dimensional image; and estimation means for estimating a height of the animal in a three-dimensional real world based on the detected two-dimensional skeletal structure and an imaging parameter of the two-dimensional image.
A height estimation method according to the present disclosure includes: acquiring a two-dimensional image obtained by capturing an animal; detecting a two-dimensional skeletal structure of the animal based on the acquired two-dimensional image; and estimating a height of the animal in a three-dimensional real world based on the detected two-dimensional skeletal structure and an imaging parameter of the two-dimensional image.
A non-transitory computer readable medium storing a program according to the present disclosure for causing a computer to execute processing of: acquiring a two-dimensional image obtained by capturing an animal; detecting a two-dimensional skeletal structure of the animal based on the acquired two-dimensional image; and estimating a height of the animal in a three-dimensional real world based on the detected two-dimensional skeletal structure and an imaging parameter of the two-dimensional image.
According to the present disclosure, it is possible to provide a height estimation apparatus, a height estimation method, and a non-transitory computer readable medium storing a program capable of improving accuracy of estimating a height.
Example embodiments will be described below with reference to the drawings. In each drawing, the same elements are denoted by the same reference signs, and the repeated description is omitted if necessary.
Recently, image recognition technology utilizing machine learning has been applied to various systems. As an example, a monitoring system for performing monitoring using images captured by a monitoring camera will be discussed.
As shown in this example, there is a growing demand for easily obtaining attribute information such as age, gender, and height of a person from images or videos of a monitoring camera. Among these attributes, the height is useful information for identifying individuals and distinguishing adults from children. For example, the attribute information is used for investigation as characteristics of a criminal, such as 30s, male, 170 cm, for marketing as information of customers, and for searching for a lost child as a characteristic of the lost child.
As a result of a study on a method for recognizing a height of a person from an image by the inventors, they found that the related technique cannot always recognize or estimate the height accurately. For example, when a whole body of a person appears in the image, the height can be estimated to some extent. However, the person in the image is not always upright, or the top of the head and the foot do not always appear in the image. Especially in the case of a lost children, there is a high possibility that he/she is crouching down. In such cases, it is difficult to estimate the height.
Therefore, the inventors studied a method using a skeleton estimation technique by means of machine learning for estimating a height of a person. For example, in a skeleton estimation technique according to related art such as OpenPose disclosed in Non Patent Literature 1, a skeleton of a person is estimated by learning various patterns of annotated image data. In the following example embodiments, a height of a person can be accurately estimated by utilizing such a skeleton estimation technique.
The skeletal structure estimated by the skeleton estimation technique such as OpenPose is composed of “key points” which are characteristic points such as joints, and “bones, i.e., bone links” indicating links between the key points. Therefore, in the following example embodiments, the skeletal structure is described using the terms “key point” and “bone”, but unless otherwise specified, the “key point” corresponds to the “joint” of a person, and a “bone” corresponds to the “bone” of the person.
The acquisition unit 11 acquires a two-dimensional image obtained by capturing an animal such as a person. The detection unit 12 detects a two-dimensional skeletal structure of the animal based on the two-dimensional image acquired by the acquisition unit 11. The estimation unit 13 estimates the height of the animal in a three-dimensional real world based on the two-dimensional skeletal structure detected by the detection unit 12 and an imaging parameter of the two-dimensional image.
Thus, in the example embodiments, a two-dimensional skeletal structure of an animal such as a person is detected from a two-dimensional image, and a height of the animal in a real world is estimated based on the two-dimensional skeletal structure, whereby the height of the animal can be accurately estimated regardless of a posture of the animal.
A first example embodiment will be described below with reference to the drawings.
As shown in
The storage unit 106 stores information and data necessary for the operation and processing of the height estimation apparatus 100. For example, the storage unit 106 may be a non-volatile memory such as a flash memory or a hard disk apparatus. The storage unit 106 stores images acquired by the image acquisition unit 101, images processed by the skeletal structure detection unit 102, data for machine learning, and so on. The storage unit 106 may be an external storage apparatus or an external storage apparatus on the network. That is, the height estimation apparatus 100 may acquire necessary images, data for machine learning, and so on from the external storage apparatus.
The image acquisition unit 101 acquires a two-dimensional image captured by the camera 200 from the camera 200 which is connected to the height estimation apparatus 100 in a communicable manner. The camera 200 is an imaging unit such as a monitoring camera for capturing a person, and the image acquisition unit 101 acquires, from the camera 200, an image obtained by capturing the person.
The skeletal structure detection unit 102 detects a two-dimensional skeletal structure of the person in the image based on the acquired two-dimensional image. The skeletal structure detection unit 102 detects the skeletal structure of the person based on the characteristics such as joints of the person to be recognized using a skeleton estimation technique by means of machine learning. The skeletal structure detection unit 102 uses, for example, the skeleton estimation technique such as OpenPose of Non Patent Literature 1.
The height pixel count calculation unit 103 calculates the height, which is referred to as a height pixel count, of the person standing upright in the two-dimensional image based on the detected two-dimensional skeletal structure. The height pixel count can be said to be the height of the person in the two-dimensional image, i.e., the length of the whole body of the person in a two-dimensional image space. The height pixel count calculation unit 103 obtains the height pixel count, i.e., a pixel count, from the length, which is the length in the two-dimensional image space, of each bone of the detected skeletal structure. In this example embodiment, the height pixel count is obtained by summing up the lengths of respective bones from the head to the foot of the skeletal structure. When the skeletal structure detection unit 102, by means of the skeleton estimation technique, does not output the top of the head and the foot, the height pixel count may be corrected by multiplying the height pixel count by a constant as necessary.
The camera parameter calculation unit 104 calculates camera parameters, which are imaging conditions of the camera 200, based on the image captured by the camera 200. The camera parameters are imaging parameters of the image and are parameters for converting the length in the two-dimensional image into the length in a three-dimensional real world. For example, the camera parameters include a posture, a position, an imaging angle, a focal length, and the like of the camera 200. An image of an object whose length is known in advance is captured by the camera 200, and then the camera parameters can be obtained from the image.
The height estimation unit 105 estimates the height of the person in the three-dimensional real world based on the calculated camera parameters and the height pixel count in the two-dimensional image. The height estimation unit 105 obtains a relationship between the length of pixel in the image and the length in the real world from the camera parameters, and converts the height pixel count into the height of person in the real world.
As shown in
Next, the height estimation apparatus 100 detects the skeletal structure of the person based on the acquired image of the person (S202).
The skeletal structure detection unit 102 extracts, for example, characteristic points that can be the key points from the image, and detects each key point of the person by referring to information obtained by machine learning the image of the key point. In the example of
Next, the height estimation apparatus 100 performs the height pixel count calculation processing based on the detected skeletal structure (S203). In the height pixel count calculation processing, as shown in
In the example of
In the example of
In the example of
In the meantime, as shown in
Next, the height estimation apparatus 100 estimates the height of the person based on the height pixel count and the camera parameters (S204). The height estimation unit 105 obtains, from the camera parameters, the length in the three-dimensional real world with respect to one pixel in an area where the person is present in the two-dimensional image, namely, the actual length of the pixel unit. In particular, since the length in the real world with respect to one pixel in the image varies depending on the location in the image, the “length in the real world per pixel in the area where the person is present” in the image is obtained. The height pixel count is converted into the height from the obtained actual length of the pixel unit. For example, in
As described above, in this example embodiment, the skeletal structure of the person is detected from the two-dimensional image, the height pixel count is obtained by summing up the lengths of the bones in the two-dimensional image of the detected skeletal structure. Further, the height of the person in the real world is estimated in consideration of the camera parameters. The height can be obtained by summing the lengths of the bones from head to foot, and thus the height can be estimated in a simple way. In addition, since it is sufficient to detect at least the skeleton from the head to the foot by the skeleton estimation technique by means of machine learning, the height can be estimated with high accuracy even when the whole body of the person does not necessarily appear in the image such as when the person is crouching down.
Next, a second example embodiment will be described. In this example embodiment, in the height pixel count calculation processing according to the first example embodiment, the height pixel count is calculated using a human body model showing a relationship between a length of each bone and a length of a whole body, i.e., a height in the two-dimensional image space. The processing other than the height pixel count calculation processing is the same as that of the first example embodiment.
Next, the height pixel count calculation unit 103 calculates the height pixel count from the length of each bone based on the human body model (S302). The height pixel count calculation unit 103 obtains the height pixel count from the length of each bone with reference to the human body model 301 showing the relationship between each bone and the length of the whole body as shown in
The human body model to be referred to here is, for example, a human body model of an average person, but the human body model may be selected according to the attributes of the person such as age, gender, nationality, etc. For example, when a face of a person appears in the captured image, an attribute of the person is identified based on the face, and a human body model corresponding to the identified attribute is referred to. By referring to the information obtained by machine learning the face for each attribute, the attribute of the person can be recognized from the characteristics of the face of the image. When the attribute of the person cannot be identified from the image, a human body model of an average person may be used.
Next, the height pixel count calculation unit 103 calculates an optimum value of the height pixel count (S303). The height pixel count calculation unit 103 calculates the optimum value of the height pixel count from the height pixel count obtained for each bone. For example, as shown in
As described above, in this example embodiment, the height of the person in the real world is estimated by obtaining the height pixel count based on the bones of the detected skeletal structure using the human body model showing the relationship between the bones in the two-dimensional image space and the length of the whole body. In this way, even when all the skeletons from the head to the foot cannot be acquired, the height can be estimated from some of the bones. In particular, by employing a larger value of the height, i.e., a larger height pixel count, which is obtained from a plurality of bones, the height can be accurately estimated.
Next, a third example embodiment will be described. In this example embodiment, instead of the height pixel count calculation processing and the height estimation processing according to the first example embodiment, a height in the real world is estimated by fitting a three-dimensional human body model to a two-dimensional skeletal structure. Other aspects are the same as those of the first example embodiment.
The three-dimensional human body model 402 prepared here may be a model in a state close to the posture of the two-dimensional skeletal structure 401 as shown in
Next, the height estimation unit 105 fits the three-dimensional human body model to the two-dimensional skeletal structure (S402). As shown in
Next, the height estimation unit 105 calculates the height of the fitted three-dimensional human body model (S403). As shown in
As described above, in this example embodiment, the three-dimensional human body model is fitted to the two-dimensional skeletal structure based on the camera parameters, and the height of the person in the real world is estimated based on the three-dimensional human body model. Specifically, the height of the fitted three-dimensional human body model is used as it is as the estimated height. In this manner, even when all bones do not face the front in the image, that is, even when all bones are viewed diagonally and there is a large difference from actual lengths of the bones, the height can be accurately estimated. When the method according to the first to the third example embodiments is applicable, all of the methods or a combination of the methods may be used to obtain the height. In this case, a value closer to the average height of the person may be used as the optimum value.
Note that each of the configurations in the above-described example embodiments is constituted by hardware and/or software, and may be constituted by one piece of hardware or software, or may be constituted by a plurality of pieces of hardware or software. The functions and processing of the height estimation apparatuses 10 and 100 may be implemented by a computer 20 including a processor 21 such as a Central Processing Unit (CPU) and a memory 22 which is a storage device, as shown in
These programs can be stored and provided to a computer using any type of non-transitory computer readable media. Non-transitory computer readable media include any type of tangible storage media. Examples of non-transitory computer readable media include magnetic storage media (such as floppy disks, magnetic tapes, hard disk drives, etc.), optical magnetic storage media (e.g. magneto-optical disks), CD-ROM (Read Only Memory), CD-R, CD-R/W, and semiconductor memories (such as mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (Random Access Memory), etc.). The program may be provided to a computer using any type of transitory computer readable media. Examples of transitory computer readable media include electric signals, optical signals, and electromagnetic waves. Transitory computer readable media can provide the program to a computer via a wired communication line (e.g. electric wires, and optical fibers) or a wireless communication line.
Further, the present disclosure is not limited to the above-described example embodiments and may be modified as appropriate without departing from the purpose thereof. For example, although a height of a person is estimated in the above description, a height of an animal other than a person having a skeletal structure such as mammals, reptiles, birds, amphibians, fish, etc. may be estimated.
Although the present disclosure has been described above with reference to the example embodiments, the present disclosure is not limited to the example embodiments described above. The configurations and details of the present disclosure may be modified in various ways that would be understood by those skilled in the art within the scope of the present disclosure.
The whole or part of the example embodiments disclosed above can be described as, but not limited to, the following supplementary notes.
A height estimation apparatus comprising:
acquisition means for acquiring a two-dimensional image obtained by capturing an animal;
detection means for detecting a two-dimensional skeletal structure of the animal based on the acquired two-dimensional image; and
estimation means for estimating a height of the animal in a three-dimensional real world based on the detected two-dimensional skeletal structure and an imaging parameter of the two-dimensional image.
The height estimation apparatus according to Supplementary note 1, wherein
the estimation means estimates the height based on a length of a bone in a two-dimensional image space included in the two-dimensional skeletal structure.
The height estimation apparatus according to Supplementary note 2, wherein
the estimation means estimates the height based on a sum of the lengths of the bones from a foot to a head included in the two-dimensional skeletal structure.
The height estimation apparatus according to Supplementary note 2, wherein
the estimation means estimates the height based on a two-dimensional skeleton model showing a relationship between the length of the bone and a length of a whole body of the animal in the two-dimensional image space.
The height estimation apparatus according to Supplementary note 4, wherein
the estimation means estimates the height based on the two-dimensional skeleton model corresponding to an attribute of the animal.
The height estimation apparatus according to Supplementary note 4 or 5, wherein
the estimation means estimates the height based on a tallest height from among a plurality of the heights obtained based on the plurality of bones in the two-dimensional skeletal structure.
The height estimation apparatus according to Supplementary note 1, wherein
the estimation means estimates the height based on a three-dimensional skeleton model fitted to the two-dimensional skeletal structure based on the imaging parameter.
The height estimation apparatus according to Supplementary note 7, wherein
the estimation means uses a height of the fitted three-dimensional skeleton model as the estimated height.
A height estimation method comprising:
acquiring a two-dimensional image obtained by capturing an animal;
detecting a two-dimensional skeletal structure of the animal based on the acquired two-dimensional image; and
estimating a height of the animal in a three-dimensional real world based on the detected two-dimensional skeletal structure and an imaging parameter of the two-dimensional image.
The height estimation method according to Supplementary note 9, wherein
in the estimation of the height, the height is estimated based on a length of a bone in a two-dimensional image space included in the two-dimensional skeletal structure.
A height estimation program for causing a computer to execute processing of:
acquiring a two-dimensional image obtained by capturing an animal;
detecting a two-dimensional skeletal structure of the animal based on the acquired two-dimensional image; and
estimating a height of the animal in a three-dimensional real world based on the detected two-dimensional skeletal structure and an imaging parameter of the two-dimensional image.
The height estimation program according to Supplementary note 11, wherein
in the estimation of the height, the height is estimated based on a length of a bone in a two-dimensional image space included in the two-dimensional skeletal structure.
A height estimation system comprising:
a camera; and
a height estimation apparatus, wherein the height estimation apparatus comprises:
acquisition means for acquiring, from the camera, a two-dimensional image obtained by capturing an animal;
detection means for detecting a two-dimensional skeletal structure of the animal based on the acquired two-dimensional image; and
estimation means for estimating a height of the animal in a three-dimensional real world based on the detected two-dimensional skeletal structure and an imaging parameter of the two-dimensional image.
The height estimation apparatus according to Supplementary note 13, wherein
the estimation means estimates the height based on a length of a bone in a two-dimensional image space included in the two-dimensional skeletal structure.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/025269 | 6/26/2019 | WO |