This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2017-130202, filed on Jul. 3, 2017, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a part recognition method, an information processing apparatus, and an imaging control system.
Parts included in an object are recognized from an image imaged by a camera.
Related technologies are disclosed in Japanese Laid-open Patent Publication Nos. H8-214289, 2013-125402, and International Publication Pamphlet No. WO 2012/077287.
According to an aspect of the invention, a part recognition method includes: cutting, by a computer, out a plurality of partial images having different sizes using each of positions of an input image as a reference; calculating a probability that each of the partial images is an image indicating a part; calculating, for each of the positions, a score by integrating the probability for each of the partial images; and recognizing, based on the score for each of the positions, the part from the input image.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
For example, a time-series image analysis apparatus is provided with a buffer storage unit that stores a digital image signal as image data on a frame-by-frame basis, and a shape generation unit that extracts an object (moving image region) from the image data and generates a silhouette image. This apparatus is provided with an object model unit in which a model shape for representing the silhouette image as a geometric shape corresponded to each part of the object and information on a shape change due to a motion are stored. This apparatus is provided with a parameter calculation unit that conceptually calculates parameters of a position of a tilt angle of each part based on the outputs from the shape generation unit and the object model unit, a matching unit that causes the whole calculated parameter group to match a silhouette image group, and an output unit.
For example, a posture estimation apparatus is provided with an image input unit that acquires an image for which an object is photographed, and a posture information database that holds posture information defining arrangement of a plurality of parts for each posture. This apparatus is provided with a fitting unit that calculates a correlation degree for each part between the arrangement of the plurality of parts in the image and the posture information. This apparatus is provided with a difficulty level information table that holds an estimation difficulty level, which is a level of difficulty of estimation of a position of each part for each posture that is calculated based on a parallel line component of each part included in the posture information. In addition, this apparatus is provided with a posture estimation unit that applies weighting based on the estimation difficulty level with respect to a correlation degree, and performs estimation of the posture of the object based on the weighted correlation degree.
For example, a posture state estimation apparatus estimates a posture state of an object including joints with high accuracy. This apparatus is an apparatus that performs estimation of a posture state of the object based on image data in which an object including a plurality of parts coupled by joints is photographed. This apparatus is provided with a likelihood map generation unit that generates, from the image data, for at least two parts, a likelihood map indicating the distribution of plausibility of each part being positioned, and a learning likelihood map that is a likelihood map being associated in advance with the posture state. This apparatus is provided with a posture state estimation unit that estimates, when a coincidence degree between a learning likelihood map and an estimated likelihood map, which is a likelihood map generated based on the image data, is high, a posture state that is associated with the learning likelihood map, as a posture state of the object.
For example, when respective parts, such as a head, a right hand, and a left hand, with respect to an object such as a person, are recognized from an image that is imaged by a camera, a partial image having a predetermined size is cut out from the image, and using a part detector or the like, a probability that the partial image is an image indicating a target part is calculated.
However, when the extent of the size that the object is imaged in the image is indefinite, a redundant article such as a background is likely to be imaged into the cut-out partial image or a portion requested for recognition of the part is unlikely to be included. In this case, the recognition accuracy of the part may be lowered.
For example, a method or the like of recognizing a part from an image with good accuracy may be provided.
As illustrated in
The part recognition apparatus 10 includes, as function units, a cut-out unit 12, a creation unit 14, a correction unit 16, and an identification unit 18. The creation unit 14 is one example of the calculation unit in the techniques of this disclosure. In a predetermined memory area of the part recognition apparatus 10, a plurality of part detectors 30 and an inter-part relative coefficient 32 are stored. The following describes the respective function units in details.
The cut-out unit 12 cuts out a plurality of patch images having different sizes using each position on the input image 40 as the reference. Note that, a patch image is one example of the partial image in the techniques of this disclosure. For example, the cut-out unit 12 cuts out, centering on each pixel (x, y) in the input image 40, regions having respective sizes (hereinafter, referred to as “size k”) that are distinguished by a size number k (k=0, 1, 2, . . . , K (K−1 is the number of types of the size)), as patch images.
For example, the cut-out unit 12 is able to cut out, as illustrated in A in
The reason why the cut-out unit 12 cuts out a plurality of patch images having different sizes is described. When the extent of the size that a person as an object is imaged in the input image 40 is indefinite, problems as the following occur. For example, as illustrated in
When the ratio of the region indicating an object in the input image 40 is low, for example, when an object within the input image 40 is small, as illustrated in an upper view in
In contrast, when the ratio of the region indicating an object in the input image 40 is high, for example, when an object within the input image 40 is large, as illustrated in a lower view in
To solve this, for example, it is considered to maintain the size of a person by adding a process to recognize a region where the person is present within the input image 40 and normalize the size of the region where the person is present. However, for example, it is possible to recognize a region where a person is present with good accuracy when the person directs the front, however, it is difficult to recognize a region where a person is present with good accuracy when the person directs in the transverse direction or when a portion of the body is hidden by an obstacle.
The cut-out unit 12 cuts out the plurality of patch images 44 having different sizes independent of the size of an object in the input image 40, in order to allow a region appropriately containing a part as a recognition target that is included in the object to be cut out as a patch image.
The creation unit 14 inputs each of the plurality of patch images 44 that are cut out by the cut-out unit 12, as illustrated in B in
The part detector 30 outputs a probability that the inputted patch image 44 is an image indicating the part set in advance. In the present embodiment, the part detector 30 corresponded to the size k is a function that outputs a probability P (p| (x, y, k)) that each patch image 44 indicates a part p, as for each pixel (x, y) of the patch image 44 having the size k. Note that, x=0, 1, 2, . . . , xmax (xmax is the maximum value of an x coordinate of the patch image 44), y=0, 1, 2, . . . , ymax (ymax is the maximum value of a y coordinate of the patch image 44), k=0, 1, 2, . . . , K, and p=right hand, head, . . . , left hand, are set. For example, convolution neural networks (CNN) as illustrated in
The creation unit 14 uses input of each of the plurality of patch images 44 that are cut out centering on the pixel (x, y) of the input image 40 to calculate a score in which respectively probabilities are integrated for each part, which are outputted from the respective part detectors 30 each corresponded to the size of each patch image 44. The creation unit 14 is able to calculate, for example, in a case of k=0, 1, 2, . . . , K, the sum, the maximum value, the average, and the like of each P (p|(x, y, k)), as a score P (p|(x, y)). Further, the creation unit 14 creates, as illustrated in C in
For example, the creation unit 14 creates, as illustrated in
The correction unit 16 corrects the score P (p|(x, y)) of each pixel in the heat map H(p) of each part p, which is created by the creation unit 14, so as to obtain the integrity of a relative positional relationship between adjacent parts (D in
Specifically, the correction unit 16 uses a probability distribution model indicating a presence probability of a part q adjacent to the part p, relative to the part p, for each relative position. The correction unit 16 corrects each score of the heat map such that as the score of a heat map H(q) of the part q at a position where a presence probability of the part q relative to the position of the part p is high is higher, the score of a score map corresponding to the position of the part p becomes higher.
For example, as illustrated in
One example for implementing the correction as the above is described. The score map S(p) of the part p is defined as the following expression (2) and expression (3), for example, using Gaussian mixture distribution.
Sp (x, y) is a (x, y) component of the score map S(p), Hp (x, y) is a (x, y) component of the heat map H(p) indicated in the expression (1), and A(p) is a set (for example, in a case of p=right elbow, q={right hand, right shoulder}) of the part q adjacent to the part p. aq, g and Cq, g (q∈A (p)) are inter-part relative coefficients 32, are the coefficients to decide the Gaussian mixture distribution. The shape of the Gaussian mixture distribution is decided, as described above, based on the presence probability of the part q in the relative positional relationship between the part p and the part q. Note that, g is the number of Gaussian distributions included in the Gaussian mixture distribution. Moreover, among pixels of the heat map H(q), a pixel (z, w) is a pixel the score of which is reflected to the expression (3), and is a pixel included in a predetermined range that uses a pixel of a heat map H(q) corresponding to the pixel (x, y) of the heat map H(p), as the reference.
In the expression (3), the sum of a first term and a second term is obtained, however, the score of the score map S(p) may be calculated from the product or the weighted sum of the first term and the second term.
The identification unit 18 recognizes, based on the score map S(p) of each part p corrected by the correction unit 16, the part p from in the input image 40. Specifically, the identification unit 18 identifies, from the score map S(p), the position coordinates (xp, yp) of a pixel having the maximum score, which are indicated in the following expression (4), as position coordinates of the part p in the input image 40.
The identification unit 18 outputs the set of the position coordinates (xp, yp) that is identified for each part p as the recognition result 42 (E in
The part recognition apparatus 10 may be implemented, for example, by a computer 50 illustrated in
The storage unit 53 may be implemented by a hard disk drive (HDD), a solid state drive (SSD), a flash memory, or the like. In the storage unit 53 as a storage medium, a part recognition program 60 for causing the computer 50 to function as the part recognition apparatus 10 is stored. The part recognition program 60 includes a cut-out process 62, a creation process 64, a correction process 66, and an identification process 68. Moreover, the storage unit 53 includes the plurality of part detectors 30 each corresponded to the patch image 44 having each size k, and an information memory area 80 in which the inter-part relative coefficient 32 is stored.
The CPU 51 reads the part recognition program 60 from the storage unit 53 and develops the part recognition program 60 in the memory 52, and successively executes the processes included in the part recognition program 60. The CPU 51 executes the cut-out process 62 to operate as the cut-out unit 12 illustrated in
When the input image 40 is inputted into the part recognition apparatus 10, the part recognition apparatus 10 executes part recognition processing illustrated in
At step S11 in the part recognition processing illustrated in
At step S12, the creation unit 14 inputs each patch image 44 having a size k into the part detector 30 corresponded to the size k. The creation unit 14 obtains, as the output from the part detector 30 corresponded to the size k, as for each pixel (x, y) of the patch image 44 having the size k, a probability P (p|(x, y, k)) that the patch image 44 is an image indicating the part p.
At step S13, the creation unit 14 calculates a score P (p|(x, y)) in which the respective probabilities P (p|(x, y, k)) that are outputted from the part detectors 30 corresponded to the sizes k are integrated for each part p. As illustrated in the expression (1), the creation unit 14 creates a heat map H(p) in which the score P (p|(x, y)) that is calculated relative to a pixel (x, y) of the input image 40 is stored in a pixel corresponding to each pixel position of the input image 40, for each part p.
At step S14, the correction unit 16 corrects the score P (p|(x, y)) in each pixel of the heat map H(p) of each part p, for example, in accordance with the expression (3) so as to obtain the integrity of a relative positional relationship between the adjacent parts, and creates a score map S(p) as indicated by the expression (2). Note that, the process of this step may be repeated a predetermined times by using the component Sp (x, y) of the created score map S(p) as Hp (x, y) in the expression (3). This corrects the score with better accuracy.
At step S15, the identification unit 18 identifies, from the score map S(p) of each part p, position coordinates (xp, yp) of a pixel having the maximum score indicated in the expression (4) as position coordinates of the part p in the input image 40.
At step S16, the identification unit 18 outputs the set of the position coordinates (xp, yp) that is identified for each part p as the recognition result 42. The processing then return to the step S11.
Position coordinates of each part may be identified from the score map, or position coordinates of each part may be identified from the heat map without the score map being created.
As described in the foregoing, the part recognition apparatus 10 cuts out patch images having a plurality of sizes from an input image. This increases the possibility that the part recognition apparatus 10 may cut out a patch image appropriately containing the part without depending on the size of each part in the input image. The part recognition apparatus 10 inputs each patch image into a part detector corresponded to each size, and calculates a probability that each patch image is an image indicating each part. The part recognition apparatus 10 uses a score of each pixel in which probabilities calculated by the respective part detectors are integrated to recognize the part in the input image. This allows the part recognition accuracy to be improved.
The part recognition apparatus 10 corrects the score corresponding to each pixel based on the relative positional relationship between the parts, and identifies position coordinates of each part based on the corrected score. This allows the part recognition accuracy to be further improved.
For example, as illustrated in
As illustrated in
The acquisition unit 20 acquires each frame image of moving image data that is imaged and outputted by the camera 35. When the acquisition unit 20 acquires a frame image of moving image data that is imaged in a state where the magnification of the camera 35 is set to an initial value, the acquisition unit 20 transfers the acquired frame image as the input image 40 (whole image) to the cut-out unit 12. Moreover, when the acquisition unit 20 acquires a frame image of moving image data that is imaged in a state where the magnification of the camera 35 is set to an enlarged magnification (a detail is described later), the acquisition unit 20 outputs the frame image in association with a recognition result outputted from the identification unit 218, with the whole image.
The identification unit 218 determines, out of a plurality of parts p for which score maps S(p) are created by the correction unit 16, whether the maximum score of a score map S(p′) relative to a specific part p′ is a predetermined threshold or more. If the maximum score is the predetermined threshold or more, the identification unit 218 identifies position coordinates (xp′, yp′) of a pixel in which the maximum score is stored as position coordinates of the specific part p′. The identification unit 218 outputs the position coordinates (xp′, yp′) of the identified specific part p′ as a recognition result 242, and notifies the control unit 22 of the position coordinates (xp′, yp′) of the identified specific part p′. Note that, the recognition result 242 may include not only the position coordinates (xp′, yp′) of the specific part p′, but also position coordinates of another part p (xp, yp).
The control unit 22 controls the magnification and the angle of the camera 35, based on the position coordinates (xp′, yp′) notified from the identification unit 218 and information held in advance on the number of pixels and the installed position of the camera 35, such that the specific part p′ falls within the entire field angle of the camera 35. Specifically, the control unit 22 calculates, by centering on the notified position coordinates (xp′, yp′), the magnification in which a region set in advance as a range indicating the specific part p′ is the field angle of the camera 35, and angles (a pan angle and a tilt angle) of the camera 35 that implements the imaging direction. The control unit 22 sets the calculated magnification and angle to the camera 35. This drives a driving unit of the camera 35 so as to have the set magnification and angle, and an enlarged image of the specific part p′ is imaged by the camera 35.
The specific part p′ is decided in advance in accordance with the usage purpose of the imaging control system 200. For example, when the imaging control system 200 is applied to a camera system for crime prevention measures, images of a face of a suspicious person or a hand thereof by which a weapon is likely to be held are important, so that the head or the hand may be decided as the specific part p′.
The part recognition apparatus 210 may be implemented by the computer 50. For example as illustrated in
In the storage unit 53 as a storage medium, a part recognition program 260 for causing the computer 50 to function as the part recognition apparatus 210 is stored. The part recognition program 260 includes the cut-out process 62, the creation process 64, the correction process 66, an identification process 268, an acquisition process 70, and a control process 72.
The CPU 51 reads the part recognition program 260 from the storage unit 53 and develops the part recognition program 260 in the memory 52, and successively executes the processes included in the part recognition program 60. The CPU 51 executes the identification process 268 to operate as the identification unit 218 illustrated in
When a start of the imaging control system 200 is instructed, the part recognition apparatus 210 executes imaging control processing illustrated in
At step S21, the control unit 22 sets the magnification and the angle of the camera 35 to initial values, and instruct the camera 35 to start imaging. This causes the camera 35 to start imaging with the set magnification and angle, and output moving image data.
At step S22, the acquisition unit 20 acquires one frame image of the moving image data outputted from the camera 35, and transfers the frame image as the input image 40 (whole image) to the cut-out unit 12.
At step S23, the part recognition processing is executed. The part recognition processing may be similar to the steps S11 to S14 in the part recognition processing illustrated in
At step S24, the identification unit 218 determines, out of a plurality of parts p for which score maps S(p) are created by the correction unit 16, whether the maximum score of a score map S(p′) relative to a specific part p′ is a predetermined threshold or more. If the maximum score is the threshold or more, the processing is shifted to the step S25, whereas if being less than the threshold, the process returns to the step S21.
At step S25, the identification unit 218 identifies position coordinates (xp′, yp′) of a pixel in which the maximum score is stored in the score map S(p′) as position coordinates of the specific part p′. The identification unit 218 notifies the control unit 22 of the identified position coordinates (xp′, yp′) of the specific part p′.
At step S26, the control unit 22 calculates, by centering on the position coordinates (xp′, yp′) notified from the identification unit 218, a magnification and an angle such that a region in accordance with the specific part p′ is the field angle of the camera 35, and sets the calculated magnification and angle to the camera 35. This drives the driving unit of the camera 35 so as to have the set magnification and angle, and an enlarged image of the specific part p′ is imaged and outputted by the camera 35.
At step S27, the acquisition unit 20 acquires the enlarged image of the specific part p′ outputted from the camera 35.
At step S28, the identification unit 218 outputs the position coordinates (xp′, yp′) of the specific part p′ identified at the abovementioned step S25 as the recognition result 242. The acquisition unit 20 outputs the whole image acquired at the abovementioned step S22 and the enlarged image of the specific part p′ acquired at the abovementioned step S27, which are associated with the recognition result 242 outputted from the identification unit 218.
At step S29, the control unit 22 resets the magnification of the camera 35 to the initial value, and the processing returns to step S22. Here, the angle of the camera 35 remains to the angle set at the abovementioned step S25. This causes the same object including the specific part p′ that has been recognized in the previous frame image is highly likely to be detected also in the next frame image, which allows tracking of the object. Note that, the tracking process of an object is not limited to the abovementioned example, but the conventionally well known techniques such as a technique in which feature points are associated between frame images are applicable.
As described in the foregoing, with the imaging control system 200 illustrated in
The part as a recognition target may be each part of a human body, or may be not limited to this.
The part recognition programs 60, 260 may be stored (installed) in advance in the storage unit 53. The program related to the disclosed technique may be provided in a form of being stored in a storage medium such as a CD-ROM, a DVD-ROM, or a USB memory.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2017-130202 | Jul 2017 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
9798949 | Du | Oct 2017 | B1 |
20130301882 | Kawaguchi et al. | Nov 2013 | A1 |
20140301605 | Kawaguchi | Oct 2014 | A1 |
20180314908 | Lam | Nov 2018 | A1 |
Number | Date | Country |
---|---|---|
08-214289 | Aug 1996 | JP |
2013-125402 | Jun 2013 | JP |
2012077287 | Jun 2012 | WO |
Number | Date | Country | |
---|---|---|---|
20190005344 A1 | Jan 2019 | US |