The present invention relates to a method for estimating a direction of a person standing still.
It is necessary for an autonomous mobile apparatus to determine a moving direction of a person in order to move forward safely and effectively.
As a background art in the present technical field, there is JP 2007-229816 A (PTL 1). In PTL 1, a method for predicting a course of a pedestrian from a toe image is described. In the method, a pedestrian course model construction unit constructs a course model of a general pedestrian in advance by combining information of a toe image of a specific pedestrian and detected course information of the specific pedestrian, and a pedestrian course model storage unit stores information of the pedestrian course model.
Then, a pedestrian course prediction unit predicts a course of an unspecific pedestrian by collating information of a toe image of the unspecific pedestrian, which image is generated by a pedestrian toe image generation unit, and the information of a pedestrian course model stored in the pedestrian course model storage unit.
As a method to detect a course in construction of a pedestrian course model, it is described to detect a three-dimensional position of a pedestrian serially in certain time intervals and to detect the course of the pedestrian from a temporally change of the three-dimensional position.
PTL 1: JP 2007-229816 A
In PTL 1, a pedestrian course model is constructed from a positional change of a pedestrian in predetermined time intervals. However, there is no positional change in a person who stands still (person standing still), and thus, it is not possible to construct a course model and to estimate a direction. Also, by a method of performing pattern matching with a database which is like a general pedestrian model of PTL 1, it is not possible to estimate a direction when appearance, such as clothes, a physique, or the like, of a person standing still is greatly different from that of a person in the database.
However, in a case where an autonomous mobile apparatus such as a robot passes through an environment crowded with people standing still, it is necessary to estimate a direction in which a person standing still starts walking, in order to prevent the autonomous mobile apparatus from hitting the person or blocking movement of the person even when the person standing still suddenly starts walking. A direction in which a person standing still starts walking often matches a direction of a foot. The person standing still starts to move to a side or a backward of the foot for only about one or two steps. Thus, it is suitable to detect a direction in which a person starts to move by a direction of a foot.
A purpose of the present invention is to provide a method for estimating a direction of a person standing still, which method makes it possible to perform a safe movement control by estimating a direction, in which a person standing still starts to walk, from a momentary single still image of the person standing still and by moving through a region in which the person is not likely to be hit.
To achieve the above purpose, the present invention includes the steps of: detecting a boundary position between a foot and a lower leg of a person in an image acquired by an imaging unit, the boundary position being a substantial boundary part, in a lower limb, between the foot, which is a part from a malleolus to a tip part, and the lower leg; detecting a feature quantity which makes it possible to classify a ground and a part other than the ground in the image; setting, in a peripheral region around the boundary position, a plurality of local regions having positional information and/or direction information relative to the boundary position, and determining whether each of the local regions is the ground or the part other than the ground by using the feature quantity unique to the ground; determining a foot region from the local region determined as the part other than the ground; and estimating a direction of the foot of the person from the local region classified as the foot region and from the positional information.
Also, to achieve the above purpose, preferably in the present invention, the boundary position between the foot and the lower leg is specified by using a distance sensor.
Also, to achieve the above purpose, preferably in the present invention, the distance sensor is parallel to the ground and measures a plane surface at a height of the substantial boundary part, in the lower limb of the person, between the foot and the lower leg.
Also, to achieve the above purpose, preferably in the present invention, the feature quantity of the ground is calculated based on a histogram of data in each pixel in the image.
Also, to achieve the above purpose, preferably in the present invention, each of the local regions, which is set in the peripheral region around the boundary position between the foot and the lower leg, is a sector with the boundary position as a center.
Also, to achieve the above purpose, preferably in the present invention, when a distance between paired foot regions is smaller than a predetermined value and a difference in a feature quantity between the paired foot regions is equal to or smaller than a predetermined value, the paired foot regions are determined as the foot regions of the same person.
Also, to achieve the above purpose, preferably in the present invention, a direction of the person is estimated based on the information held in the local region which is included in the foot region of the same person.
According to the present invention, it is possible to provide a method for estimating a direction of a person standing still, which method makes it possible to perform a safe movement control by estimating a direction, in which a person standing still starts to walk, from a momentary single still image of the person standing still and by moving through a region in which the person is not likely to be hit.
a) to 3(c) are schematic appearance views illustrating the direction estimating apparatus according to the embodiment of the present invention.
FIGS. 7(1) and 7(2) are views for describing an estimation result of the foot-lower leg boundary position according to the embodiment of the present invention.
FIG. 8(1) is a view and FIG. 8(2) is a chart, which are for describing a method for extracting a feature quantity of a ground according to the embodiment of the present invention.
FIGS. 9(1) to 9(4) are views for describing a method for estimating a foot direction of a person according to the embodiment of the present invention.
a) and 13(b) are appearance views illustrating the direction estimating apparatus according to the different embodiment of the present invention.
In the following, embodiments will be described with reference to the drawings.
a) to 3(c) are appearance views of the direction estimating apparatus 1.
In
In
With reference to the flowchart in
In S1 in
In S2, a position indicating a boundary part between a foot and a lower leg (foot-lower leg boundary position OM), in the image G, of a person standing still in the image G is set. Processing in S2 is illustrated in a flowchart in
In SS101, the laser scanner 102 scans a plane surface F302 parallel to a ground T301, which is illustrated in
As illustrated in
First, in the coordinate data group acquired by the laser scanner 102, coordinate data points are separated into groups by regarding adjacent coordinate data points within a range of a certain distance as coordinate data points of the same object. Then, as illustrated in
For example, in a case where a coordinate data point group which belongs to a group k includes {d1, d2, d3, and d4}, three coordinate data points {di, dj, and dk} (i, j, and k are arbitrary natural numbers) are selected arbitrarily, and an intersection of perpendicular bisectors, each of which is formed by arbitrary two points among {di, dj, and dk}, is set as the horizontal plane foot-lower leg boundary position O′M. In SS103 in
As illustrated in
By determining real coefficients a0, b0, c0, a1, b1, c1, a2, b2, and c2, a mapping relationship between the plane surface F302 and the plane surface M303 is derived. By reducing a denominator and a numerator on the right-hand side, it can be regarded that the equation 1 includes eight independent variables.
Thus, by measuring four vertexes of a tetragon A′B′C′D′, which is a rectangle ABCD on the plane surface F302 imaged onto the plane surface M303 as illustrated in
In S3 in
Each pixel in the digital image G in FIG. 8(1) includes RGB intensity as a numerical value. When calculated, a histogram of the RGB intensity of the digital image G resembles FIG. 8(2). Since the ground occupies a great part of the digital image G, a color in the vicinity of each of the peaks Rm, Gm, and Bm of RGB in the histogram in FIG. 8(2), is estimated as a color of the ground, and RGB intensity which satisfies an equation 2 is set as the feature quantity Qf unique to the ground.
[Mathematical Formula 2]
Cf={C|Rm−ΔRm<R<Rm+ΔRf
∩Gm−ΔGf<G<Gm+ΔGf
∩Bm−ΔBf<B<Bm+ΔBf} equation 2
ΔRl, ΔRr, ΔGl, ΔGr, ΔBl, and ΔBr are arbitrary real numbers and are set suitably according to a condition of the ground. Note that when Qf is constant all the time, Qf may be extracted in advance and may be stored inside or outside the apparatus.
In S4 in
In the present embodiment, as illustrated in
rmin, rmax, Δθ, and the number of Dk are set suitably according to an environment.
In S5 in
In S7 in
In S8 in
For example, in a case of FIG. 9(3), at a time point of S8, Dp (p=1, 2, 3 . . . , 6) and D* are classified as the foot region K. Dp is a region including a tiptoe (tiptoe region T), and D* is a region including a lower leg (lower leg region L). A foot direction of a person is a direction of a tiptoe with the foot-lower leg boundary position OM as a basis, and thus, it is possible to identify a foot direction from a position of the tiptoe region T. An example of separation of the tiptoe region T and the lower leg region L will be described with reference to a flowchart in
In SS201, grouping is performed and local regions Dq, which belong to the foot region K and are continuously adjacent, are separated into the same group. In SS202, the number of groups is checked, and when there are two or more groups, a step goes to SS203. A group in a direction close to a front direction (−y direction in FIG. 9(3)) is determined as the tiptoe region T. When there is only one group, a step goes to SS204, and the group is determined as the tiptoe region T. In SS205, an average value in a direction θp which sets the local region Dp included in the tiptoe region T is regarded as the foot direction θM.
For example, in a case of FIG. 9(4), a direction which sets a local region DLn having the foot-lower leg boundary position O″ML as a basis is regarded as θLn, and an average value in θLn is regarded as a foot direction θML on O″ML. A foot direction θMR on the foot-lower leg boundary position O″MR is calculated in a similar manner.
All the foot directions estimated in such a manner are output from the output terminal 104. Also, in S8, when a distance between O″ML and O″MR is smaller than a certain value L and predetermined feature quantities QM of the tiptoe regions DLn and DRn, which respectively have O″MR and O″ML as centers, are close to each other, the tiptoe regions DLn and DRn are determined as those of the same person and the average value in θML and θMR may be estimated as the foot direction θM of the person 201. As the feature quantity QM, a feature point coordinate or the like by an RGB color histogram or edge detection is used suitably. Thus, even when the image G includes a plurality of people, it is possible to estimate a direction of each person independently.
In such a manner above, it becomes possible to estimate a foot direction of the person 201 from a single image without using a database.
In the present embodiment, an example of using a distance image will be described.
In
In the direction estimating apparatus 2 in
The direction estimating apparatus 2 illustrated in
In
A flow of processing in the second embodiment will be described with reference to the flowchart in
In S1, two digital images G1 and G2 are acquired from a stereo camera 104.
In S2, the distance image G3D is generated from the digital images G1 and G2. The generation of the distance image G3D is performed, for example, by the following method. First, edge extraction or the like is performed on a minute region a1n in the digital image G1, and a feature quantity s1n is given thereto. Next, a minute region a2n having a feature quantity s2n which is the same with the feature quantity s1n of a1n is searched from G2. Then, a distance z, to a minute region akn (k=1, 2) is calculated by an equation 4, and is regarded as a distance of a minim region a1n.
Here, gkn (k=1, 2) is a barycentric position of akn, f is a focal distance of a camera, and h is a space between two cameras. By performing the calculation on the whole digital image G1, the distance image G′3D from the camera can be obtained. The distance image G3D with the ground T301 basis can be easily acquired from G′3D.
In S3, a foot-lower leg boundary position is specified. In the distance image G3D acquired in S2, a pixel, in which a distance C is larger than the height from the ground T301 to an ankle of a person and the distance C is smaller than a predetermined height, is recognized as the foot-lower leg boundary position of the person 201, whereby a foot-lower leg boundary position in the image G1 or G2 can be specified immediately.
In S4, a feature quantity Qf of the ground is extracted. The feature quantity Qf of the ground indicates that a distance is in the vicinity of zero and is expressed in an equation 5.
[Mathematical Formula 5]
Cf={C∥C|<ε} equation 5
ε is an arbitrary real number and is set suitably according to a condition of the ground. After S4, processing similar to that of the first embodiment is performed on the image G1 or G2, and thus, a foot direction of a person can be estimated.
In the first embodiment, when a plurality of colors is included in the ground, there is a plurality of peaks in the histogram. In such a case, a color in the vicinity of each peak may be regarded as the feature quantity of the ground.
In the first, second, and third embodiments, in a case where the feature quantity of the ground varies depending on a position of each person, it is possible to correspond to the case by acquiring a feature quantity of a region not including a foot of each person from a local image around the foot of each person.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2011/078805 | 12/13/2011 | WO | 00 | 6/12/2014 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2013/088517 | 6/20/2013 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
7006950 | Greiffenhagen et al. | Feb 2006 | B1 |
7706571 | Das et al. | Apr 2010 | B2 |
8457357 | Foote et al. | Jun 2013 | B2 |
8471848 | Tschesnok | Jun 2013 | B2 |
8503727 | Naito et al. | Aug 2013 | B2 |
8514236 | Kobla et al. | Aug 2013 | B2 |
8712097 | Uchida et al. | Apr 2014 | B2 |
Number | Date | Country |
---|---|---|
10-269366 | Oct 1998 | JP |
2000-207568 | Jul 2000 | JP |
2007-229816 | Sep 2007 | JP |
2010-152873 | Jul 2010 | JP |
10-2011-0019948 | Mar 2011 | KR |
Entry |
---|
Machine translation of KR10-2011-0019948. |
Khan et al., “Tracking Multiple Occluding People by Localizing on Multiple Scene Planes”, 2008, Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol. 31, Iss. 3, 505-519. |
Chen et al., “Accurate self-calibration of two cameras by observations of a moving person on a ground plane”, Advanced Video and Signal Based Surveillance, 2007. AVSS 2007. IEEE Conference on, 129-134. |
Krahnstoever et al., “Gaze and body pose estimation from a distance”, Advanced Video and Signal-Based Surveillance (AVSS), 2011 8th IEEE International Conference on, Sep. 2011, 11-16. |
International Search Report dated Jan. 17, 2012 with English translation (four (4) pages). |
Number | Date | Country | |
---|---|---|---|
20140376780 A1 | Dec 2014 | US |