SYSTEM, DEVICE, METHOD, AND COMPUTER PROGRAM PRODUCT FOR POSITION ESTIMATION

CROSS-REFERENCE TO RELATED APPLICATION

The present application claims priority to Japanese patent application JP2023-186822, filed Oct. 31, 2023, the entire contents of which are incorporated herein by reference.

TECHNICAL FIELD

The present disclosure relates to a system, a device, a method, and a computer program product for estimating the position of an object in a three-dimensional space.

BACKGROUND ART

Conventionally, in an unmanned store selling commodities, it is important to identify which customer takes which commodity. Thus, a technology that automatically identifies the commodity a customer is purchasing and achieves an automated checkout, simply by having the customer take the commodity from a showcase and proceed to a checkout counter during shopping in a store, has been known.

For example, Japanese Patent No. 7225434 discloses an information processing system that causes a plurality of cameras (range sensors) provided in a store to track a customer, acquires position information on the customer, detects a taken out commodity with a weight sensor provided in a showcase, and manages the customer and the commodity in association with each other, whereby the commodity the customer is purchasing is identified and an automated checkout is achieved without an attendant or the like.

SUMMARY

However, according to the conventional art, it is difficult to accurately acquire the position information on the customer with the plurality of cameras (range sensors). Specifically, if the positions of an object captured by the respective cameras are shifted, more processing time is needed to determine whether or not such positions correspond to the same object.

The present disclosure has been made in view of the problem of the conventional art. The present disclosure addresses the problem, as discussed herein, with a system, a device, a method, and a computer program product for efficiently estimating the position of an object in a three-dimensional space.

A position estimation system according to one aspect of the present disclosure includes three imaging devices, and a position estimation device configured to estimate a position of an object in a three-dimensional space, based on images captured by the imaging devices. The three imaging devices are arranged such that imaging positions of the imaging devices form a triangle. The position estimation device includes a coordinate transformation unit configured to respectively transform coordinate points of the object in image coordinates of the imaging devices to coordinate points in stereo spherical image coordinates, a determination unit configured to determine whether or not the coordinate points of two of the imaging devices in the stereo spherical image coordinates satisfy an epipolar constraint, and an estimation unit configured to estimate that the coordinate points, which are determined by the determination unit to satisfy the epipolar constraint, correspond to a coordinate position of the object.

The objects, features, advantages, and technical and industrial significance of this disclosure will be better understood by the following description and the accompanying drawings of the disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram showing an outline of a position estimation system according to embodiment 1 of the present disclosure;

FIG. 2 is a diagram showing an outline of position estimation by the position estimation system according to embodiment 1;

FIG. 3 is a diagram showing the system configuration of the position estimation system shown in FIG. 1;

FIG. 4 is a functional block diagram showing the configuration of the position estimation device shown in FIG. 1;

FIG. 5 is a diagram showing an outline of a procedure for coordinate transformation;

FIG. 6 illustrates the coordinate transformation;

FIG. 7 illustrates a condition satisfying the epipolar constraint;

FIGS. 8A to 8C illustrate calculation of a and B;

FIG. 9 illustrates a case not satisfying the epipolar constraint;

FIG. 10 illustrates calculation of an intersection of vectors;

FIG. 11 is a flowchart showing a processing procedure of the position estimation device shown in FIG. 1;

FIG. 12 is a diagram showing an outline of a position estimation system according to embodiment 2 of the present disclosure;

FIG. 13 is a diagram showing the system configuration of the position estimation system shown in FIG. 12;

FIG. 14 is a functional block diagram showing the configuration of a position estimation device shown in FIG. 12;

FIGS. 15A and 15B illustrate distances from an article to joint points;

FIG. 16 is a diagram showing one example of a probability density function used as the likelihood;

FIG. 17 is a flowchart showing a processing procedure of the position estimation device shown in FIG. 12;

FIG. 18 is a flowchart showing a processing procedure of a joint position identification process;

FIG. 19 is a flowchart showing a processing procedure of a likelihood calculation process; and

FIG. 20 is a diagram showing one example of a configuration.

DESCRIPTION OF EMBODIMENTS

Hereinafter, a position estimation system, a position estimation device, a position estimation method, and a computer program product for position estimation according to embodiments of the present disclosure will be described in detail with reference to the drawings.

Embodiment 1
<Outline of Position Estimation System 10>

An outline of a position estimation system 10 according to embodiment 1 will be described. FIG. 1 is a diagram showing the outline of the position estimation system 10 according to embodiment 1. As shown in FIG. 1, the position estimation system 10 includes a position estimation device 20 and three imaging devices 30a, 30b, 30c (hereinafter, sometimes collectively referred to as “imaging devices 30”). Each imaging device 30 is a camera including an optical system and a sensor such as a charge coupled device (CCD). The three imaging devices 30 are arranged on the ceiling of a room 90 such that imaging positions of the imaging devices 30 form a triangle.

The position estimation device 20 acquires images of an object from the imaging devices 30 (S1), and transforms image data, acquired by each imaging device, from camera coordinates to stereo spherical image coordinates (S2). Then, the position estimation device 20 determines whether or not predetermined coordinate points of the imaged object satisfy the epipolar constraint (S3). Then, if the predetermined coordinate points satisfy the epipolar constraint, the position estimation device 20 estimates that the coordinate points correspond to the coordinates of the object (S4).

Here, a case where the predetermined coordinate points satisfy the epipolar constraint will be described. As shown in FIG. 2, the position estimation device 20 transforms an image captured by the first imaging device 30a to spherical image coordinates using C1 as the origin. Similarly, images captured by the second imaging device 30b and the third imaging device 30c are also transformed to spherical image coordinates using C2 and C3 as the origins, respectively.

To perform determination regarding the epipolar constraint for the predetermined coordinate points, stereo spherical image coordinates are constructed by using the spherical image coordinates of two of the three imaging devices 30. That is, the position estimation system using the three imaging devices 30 provides three pairs of stereo spherical image coordinates as follows: first stereo spherical image coordinates calculated based on the spherical image coordinates of the first imaging device 30a and the second imaging device 30b; second stereo spherical image coordinates calculated based on the spherical image coordinates of the second imaging device 30b and the third imaging device 30c; and third stereo spherical image coordinates calculated based on the spherical image coordinates of the first imaging device 30a and the third imaging device 30c.

P shown in FIG. 2 represents one point in a three-dimensional space. Assuming that a line connecting the origin C1 of the first spherical image coordinates and the origin C2 of the second spherical coordinates is a baseline L12, p1 and p2 respectively lie on great circles corresponding to epipolar lines E12 and E21 for the first imaging device 30a and the second imaging device 30b, and lie on a plane PC1C2. Assuming that a line connecting the origin C2 of the second spherical image coordinates and the origin C3 of the third spherical coordinates is a baseline L23, p2 and p3 respectively lie on great circles corresponding to epipolar lines E23 and E32 for the second imaging device 30b and the third imaging device 30c, and lie on a plane PC2C3. Assuming that a line connecting the origin C1 of the first spherical image coordinates and the origin C3 of the third spherical coordinates is a baseline L13, p1 and p3 respectively lie on great circles corresponding to epipolar lines E13 and E31 for the first imaging device 30a and the third imaging device 30c, and lie on a plane PC1C3. That is, three epipolar constraints hold, and p1, p2, and p3 indicate one point (position of object P) in the three-dimensional space.

As described above, the position estimation system 10 includes the position estimation device 20 and the three imaging devices 30. The three imaging devices 30 are arranged such that the imaging positions of the imaging devices 30 form a triangle. The position estimation device 20 acquires images from the three imaging devices 30, performs transformation from image coordinates to stereo spherical image coordinates, then determines whether or not the epipolar constraint is satisfied in each pair of the stereo spherical image coordinates, and identifies the position of the object P if the epipolar constraint is satisfied.

The system configuration of the position estimation system 10 shown in FIG. 1 will be described. FIG. 3 is a diagram showing the system configuration of the position estimation system 10 shown in FIG. 1. As shown in FIG. 3, the position estimation system 10 includes the position estimation device 20 and the imaging devices 30a, 30b, 30c which are connected via a network N.

The position estimation device 20 performs a process of acquiring images from the imaging devices 30, a process of transforming the image coordinates of the acquired images to stereo spherical image coordinates, a process of determining whether or not the epipolar constraint is satisfied in the stereo spherical image coordinates, a process of estimating the position of the object, etc. Each imaging device 30 is a BOX camera having a predetermined angle of view, and performs a process of capturing images of the object in a three-dimensional space.

The configuration of the position estimation device 20 shown in FIG. 1 will be described. FIG. 4 is a functional block diagram showing the configuration of the position estimation device 20 shown in FIG. 1. As shown in FIG. 4, the position estimation device 20 includes a display 21, an input unit 22, a communication I/F unit 23, a memory 24, and a control unit 25. The display 21 is a display device such as a liquid crystal display for displaying various information. The input unit 22 is composed of input devices such as a mouse and a keyboard. The communication I/F unit 23 is a communication interface for communicating with the imaging devices 30a, 30b, 30c via the network N.

The memory 24 is a memory device such as a hard disk device or a non-volatile memory, and stores therein image data 24a, stereo spherical image coordinate data 24b, determination data 24c, and estimated-position data 24d. The image data 24a refers to data of images acquired from the three imaging devices 30. The stereo spherical image coordinate data 24b refers to data of stereo spherical image coordinates generated using two of the three imaging devices 30. The stereo spherical image coordinate data 24b is stored as one set of data including: data generated using the first imaging device 30a and the second imaging device 30b; data generated using the second imaging device 30b and the third imaging device 30c; and data generated using the first imaging device 30a and the third imaging device 30c.

The determination data 24c refers to data of the determination result regarding whether or not the epipolar constraint is satisfied, determined based on the stereo spherical image coordinate data 24b. The estimated-position data 24d refers to data in which coordinate points, with respect to the position of the object imaged by the three imaging devices 30, which are determined to satisfy the epipolar constraint are estimated to correspond to the position of an object.

The control unit 25 is a controller for controlling the entire position estimation device 20, and includes an image acquisition unit 25a, a coordinate transformation unit 25b, a determination unit 25c, and an estimation unit 25d. Specifically, programs corresponding to these units are loaded in a CPU and executed, whereby processes respectively corresponding to the image acquisition unit 25a, the coordinate transformation unit 25b, the determination unit 25c, and the estimation unit 25d are executed.

The image acquisition unit 25a is a processing unit that acquires images of the object from the imaging devices 30. Specifically, the image acquisition unit 25a stores data of images transmitted from the imaging devices 30a, 30b, 30c as the image data 24a.

The coordinate transformation unit 25b is a processing unit that reads the image data 24a from the memory 24 and transforms the image coordinates captured by the imaging devices 30 to stereo spherical image coordinates. Specifically, as shown in FIG. 5, the coordinate transformation unit 25b reads out original image coordinates from the memory 24, and eliminates optical distortion due to the image devices therefrom. Then, the coordinate transformation unit 25b transforms the distortion-eliminated image coordinates to homogeneous coordinates, and calculates three-dimensional vectors. Thereafter, the coordinate transformation unit 25b transforms the image coordinates to stereo spherical image coordinates, and stores the same in the memory 24 as the stereo spherical image coordinate data 24b.

The determination unit 25c performs a process of determining whether or not predetermined coordinate points satisfy the epipolar constraint, based on the stereo spherical image coordinate data 24b. Specifically, the determination unit 25c performs determination regarding whether or not the epipolar constraint is satisfied in all the stereo spherical image coordinates, which are the first stereo spherical image coordinates calculated based on image data acquired from the first imaging device 30a and the second imaging device 30b, the second stereo spherical image coordinates calculated based on image data acquired from the second imaging device 30b and the third imaging device 30c, and the third stereo spherical image coordinates calculated based on image data acquired from the first imaging device 30a and the third imaging device 30c.

The determination unit 25c determines that it is “OK” if the epipolar constraint is satisfied in the three pairs of the stereo spherical image coordinates, and that it is “No Good” if the epipolar constraint is not satisfied in at least one pair of the three stereo spherical image coordinates. The estimation unit 25d is a processing unit that estimates that the predetermined coordinate points correspond to the position of an article if the determination unit 25c determines that it is “OK”.

Transformation of image coordinates by the position estimation device 20 will be described. FIG. 6 illustrates coordinate transformation. Here, it is assumed that distortion components of a camera have been corrected. As shown in FIG. 6, assuming that the origin of a camera coordinate system is C, the origin of a world coordinate system is O, a camera rotation vector is R, and a translation vector is t, transforming the world coordinates of a point P to camera coordinates Xcamera with respect to the point P can be expressed by Xcamera=R^T(P-t).

A point U in an image coordinate system can be expressed by equation (1) using a camera matrix K and the camera coordinates Xcamera.

$\begin{matrix} [Math . 1] &  \\ U K X_{camera} & (1) \end{matrix}$

$s [\begin{matrix} u \\ y \\ 1 \end{matrix}] = [\begin{matrix} f_{x} & 0 & o_{x} \\ 0 & f_{y} & o_{y} \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} X_{c} \\ Y_{c} \\ Z_{c} \end{matrix}])$

Here, equation (1) is expressed by equation (2) by normalizing the vectors of Xcamera with Zc.

$\begin{matrix} [Math . 2] &  \\ U K x_{camera} & (2) \end{matrix}$

$[\begin{matrix} u \\ y \\ 1 \end{matrix}] = [\begin{matrix} f_{x} & 0 & o_{x} \\ 0 & f_{y} & o_{y} \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} x_{c} \\ y_{c} \\ 1 \end{matrix}]$

Here, xc=Xc/Zc and yc=Yc/Zc hold. According to equation (2), u and v can be given by u=fxxc+ox and v=fyyctoy. Accordingly, xc and yc can be given by xc=(u−ox)/fx and yc=(v−oy)/fy.

On the other hand, a three-dimensional direction vector p can be expressed by equation (3). N[*] represents normalization of a vector with norm 1.

$\begin{matrix} [Math . 3] &  \\ p = N [P - t] = N [{RX}_{camera}] = RN [X_{camera}] & (3) \end{matrix}$

Thus, if the camera matrix K and the camera rotation vector R are known, the three-dimensional direction vector p can be calculated from the image coordinates U.

A condition satisfying the epipolar constraint will be described. Here, a case using the camera coordinates of the first imaging device 30a and the camera coordinates of the second imaging device 30b will be described. The same process is performed for transformation to stereo spherical coordinates using the camera coordinates of the second imaging device 30b and the camera coordinates of the third imaging device 30c and transformation to stereo spherical coordinates using the camera coordinates of the first imaging device 30a and the camera coordinates of the third imaging device 30c.

FIG. 7 illustrates a condition satisfying the epipolar constraint. As shown in FIG. 7, the camera coordinates [X1, Y1, Z1] of an article P1 imaged by the first imaging device 30a are rotated to obtain spherical coordinates [X′, Y′, Z′], and the camera coordinates [X2, Y2, Z2] of an article P2 imaged by the second imaging device 30b are rotated to obtain spherical coordinates [X′, Y′, Z′].

An angle α1 formed by a vector C1p1 and the X′ axis of the first spherical coordinates is obtained (see FIG. 8B). Similarly, an angle α2 formed by a vector C2p2 and the X′ axis of the second spherical coordinates is obtained. Then, the position estimation device 20 obtains an angle β1 formed by the Y′ axis and a straight line connecting p′1 and the origin C1. Here, p′1 is a point where a great circle using C1 as the origin and passing through p1 intersects a Y′Z′ plane (see FIG. 8C). Similarly, the position estimation device 20 obtains an angle β2 formed by the Y′ axis and a straight line connecting p′2 and the origin C2.

Here, cos α1=p1^TX′, p′1^TX′=0, cos β1=p′1^TY′, cos α2=p2^TX′, p′2′X′=0, and cos β2=p′2TY′ hold. Since p′1 and p′2 are expressed by p′1=N [p1−p1x′ X′] and p′2=N [p2−p2X′X′], if P1 and P2 are the same point, p′1=p′2 and 31=32 hold, and the epipolar constraint is satisfied. Similarly, calculation regarding whether or not the epipolar constraint is satisfied is performed also for the second stereo spherical image coordinates and the third stereo spherical image coordinates. If the epipolar constraint is satisfied in all the stereo spherical image coordinates, the objects P imaged by the three imaging devices 30 can be identified to be in the same position. As described above, the position estimation device 20 according to the present disclosure can determine whether or not the epipolar constraint is satisfied, without calculating a fundamental matrix required for determination regarding whether or not the epipolar constraint is satisfied.

Next, a case where the epipolar constraint is not satisfied will be described. FIG. 9 illustrates the case where the epipolar constraint is not satisfied. As shown in FIG. 9, the camera coordinates [X3, Y3, Z3] of an article P3 imaged by the first imaging device 30a are rotated to obtain spherical coordinates [X′, Y′, Z′], and the camera coordinates [X4, Y4, Z4] of an article P4 imaged by the second imaging device 30b are rotated to obtain spherical coordinates [X′, Y′, Z′].

An angle α3 formed by a vector C1p3 and the X′ axis is obtained. An angle α4 formed by a vector C2p4 and the X′ axis is obtained. Then, the position estimation device 20 obtains an angle β3 formed by the Y′ axis and a straight line connecting p′3 and the origin C1. The position estimation device 20 also obtains an angle β4 formed by the Y′ axis and a straight line connecting p′4 and the origin C2.

Here, p′3 and p′4 are expressed by p′3=N [p3-p3x′X′] and p′4=N [p4-p4x′ X′], but P3 and P4 are different points. Accordingly, p′3=p′4 and β3=β4 do not hold, and thus the epipolar constraint is not satisfied.

Next, calculation of an intersection of two three-dimensional vectors will be described. FIG. 10 illustrates calculation of the intersection of the vectors. As shown in FIG. 10, assuming that a length between C1 and P1 is 11 and a length between C2 and P2 is 12, a vector OP1 can be given by OP1=t1+11p1, and a vector OP2 can be given by OP2=t2+12p2. In addition, a vector P1P2 is orthogonal to a vector C1p1, and a vector P1P2 is orthogonal to a vector C2p2. Thus, 11 and 12 can be calculated by solving these simultaneous equations.

A processing procedure of the position estimation device 20 shown in FIG. 1 will be described. FIG. 11 is a flowchart showing the processing procedure of the position estimation device 20 shown in FIG. 1. As shown in FIG. 11, the position estimation device 20 acquires captured images (step S101). The position estimation device 20 performs coordinate transformation on the acquired images (step S102), and calculates the three-dimensional vectors of predetermined coordinate points (step S103).

Then, the position estimation device 20 transforms the image coordinates of each image to stereo spherical image coordinates, using image data acquired from the two imaging devices 30 (step S104). The position estimation device 20 determines whether or not the epipolar constraint is satisfied, based on vector information on the predetermined coordinate points in the stereo spherical image coordinates (step S105).

If the position estimation device 20 determines that the epipolar constraint is satisfied (step S105: Yes), coordinate points for which the determination has been made are estimated to correspond to the position of an object (step S106) and the process ends. If the position estimation device 20 determines that the epipolar constraint is not satisfied (step S105: No), the predetermined display 21 is controlled to display error information (step S107) and the process ends.

As described above, in embodiment 1, the position estimation system 10 includes the position estimation device 20 and the three imaging devices 30. The imaging devices 30 are arranged such that the imaging positions of the imaging devices 30 form a triangle. The position estimation device 20 acquires captured images of the object P from the imaging devices 30. The position estimation device 20 performs coordinate transformation based on the image coordinates, calculates the three-dimensional vectors for the object P, generates stereo spherical image coordinates based on the captured images of the two imaging devices 30, and determines whether or not the epipolar constraint is satisfied. Then, if the three-dimensional vectors for the object P satisfy the epipolar constraint, the position estimation device 20 estimates that the coordinate points satisfying the epipolar constraint correspond to the coordinates of the object P.

In embodiment 1, the case where the position estimation device 20 determines that the epipolar constraint is satisfied if p′1=p′2 and β1=β2 hold with respect to the relationship between p′1 and p′2, is described. However, an error in the arranged positions of the imaging devices 30 may be assumed, and if a difference between p′1 and p′2 and a difference between β1 and β2 are smaller than a predetermined threshold, the position estimation device 20 may determine that the epipolar constraint is satisfied.

Embodiment 2

In embodiment 1, the case where the object P is imaged by a plurality of the imaging devices 30 and the position of the object P is estimated based on the image data, is described. In embodiment 2, a case where the position of an article 70 is identified and the positions of joint points of a person A are estimated, and the person A taking the article 70 is identified, will be described. The same parts as those in embodiment 1 are denoted by the same reference characters and detailed description thereof is omitted.

FIG. 12 is a diagram showing an outline of a position estimation system 40 according to embodiment 2. As shown in FIG. 12, the position estimation system 40 includes a position estimation device 50 and three imaging devices 30a, 30b, 30c. The three imaging devices 30 are arranged on the ceiling of a room 90 such that imaging positions of the imaging devices 30 form a triangle. In the room 90, a shelf on which the article 70 is placed and which is provided with a weight sensor 60 is placed.

When the article 70 is taken out from the shelf by the person A, the position estimation device 50 detects the taking-out of the article, based on data from the weight sensor 60 (S11). The position estimation device 50 acquires images of the person A from the imaging devices 30 (S12). Then, the position estimation device 50 estimates a skeleton of the person A from the image data (S13), and identifies a joint point for a neck, an elbow, a wrist, or the like (S14).

The position estimation device 50 transforms the image data including the joint point, from camera coordinates to stereo spherical image coordinates (S15), and determines whether or not the epipolar constraint is satisfied (S16). If the epipolar constraint is satisfied, the position estimation device 50 estimates a distance between the article and the joint point (S17). Then, the position estimation device 50 calculates the likelihood of the joint point, based on an average distance and variance thereof, which are calculated in advance, and on the estimated distance between the article and the joint point, in order to determine whether or not the person A has taken out the article 70 (S18), and associates the article 70 with the person A, based on the distance and the likelihood (S19).

As described above, the position estimation system 40 includes the position estimation device 50, the three imaging devices 30, and the shelf provided with the weight sensor 60. When the article 70 is picked up by the person A, the position estimation device 50 detects the picking-up of the article 70, based on a change in weight, and acquires images including the person A from the imaging devices 30. The position estimation device 50 estimates a skeleton of the person A based on the image data, and identifies a joint point for a neck, an elbow, a wrist, or the like of the person A. Then, the position estimation device 50 performs transformation from image coordinates to stereo spherical image coordinates, determines whether or not the epipolar constraint is satisfied in each pair of the stereo spherical image coordinates, and estimates the position of the joint point of the person A if the epipolar constraint is satisfied. The position estimation device 50 calculates the distance between the joint point and the article 70. The position estimation device 50 calculates the likelihood, based on the average distance and variance thereof, which are prepared in advance, between the joint point and the article 70 and on the calculated distance between the joint point and the article 70, and associates the article 70 with the person A.

The system configuration of the position estimation system 40 shown in FIG. 12 will be described. FIG. 13 is a diagram showing the system configuration of the position estimation system 40 shown in FIG. 12. As shown in FIG. 13, the position estimation system 40 includes the three imaging devices 30, the position estimation device 50, and the weight sensor 60 which are connected via the network N.

The position estimation device 50 performs a process of receiving a signal from the weight sensor 60 and detecting picking-up of an article, a process of acquiring images from the imaging devices 30, a process of estimating a skeleton of a person based on the images, a process of identifying a joint point for a neck, an elbow, a wrist, or the like based on the result of the estimated skeleton, a process of transforming the image coordinates of the acquired images to stereo spherical image coordinates, a process of determining whether or not the epipolar constraint is satisfied in the stereo spherical image coordinates, a process of estimating the position of the joint point, a process of calculating a distance and variance thereof between the article and the joint point, a process of calculating the likelihood from the distance and the variance thereof, a process of associating the article with the person based on the likelihood, etc. When the article is picked up, the weight sensor 60 performs a process of notifying the position estimation device 50 of a change in weight.

The configuration of the position estimation device 50 shown in FIG. 12 will be described. FIG. 14 is a functional block diagram showing the configuration of the position estimation device 50 shown in FIG. 12. As shown in FIG. 14, the position estimation device 50 includes a display 21, an input unit 22, a communication I/F unit 53, a memory 54, and a control unit 55. The communication I/F unit 53 is a communication interface for communicating with the imaging devices 30a, 30b, 30c and the weight sensor 60 via the network N.

The memory 54 is a memory device such as a hard disk device or a non-volatile memory, and stores therein image data 24a, stereo spherical image coordinate data 24b, determination data 24c, estimated-position data 24d, article position data 54a, average distance/variance data 54b, estimated-skeleton data 54c, likelihood data 54d, and association data 54e. The article position data 54a refers to data in which the weight sensor 60 is associated with the position of the article 70. The average distance/variance data 54b refers to data of an average distance and variance thereof, calculated in advance, between the joint point and the article 70 when the article is taken out. The estimated-skeleton data 54c refers to data in which a skeleton of the person A is estimated. The likelihood data 54d refers to data in which a probability density function is calculated as the likelihood, from the distance between the article 70 and the joint point of the person A and the average distance and the variance thereof, calculated in advance, between the article 70 and the joint point. The association data 54e refers to data in which the person A is associated with the article 70, based on the likelihood data 54d.

The control unit 55 is a controller for controlling the entire position estimation device 50, and includes an image acquisition unit 25a, a coordinate transformation unit 25b, a determination unit 25c, an article-taking-out detection unit 55a, a skeleton estimation unit 55b, an estimation unit 55c, a likelihood calculation unit 55d, and an association unit 55e. Specifically, programs corresponding to these units are loaded in a CPU and are executed, whereby processes respectively corresponding to the image acquisition unit 25a, the coordinate transformation unit 25b, the determination unit 25c, the article-taking-out detection unit 55a, the skeleton estimation unit 55b, the estimation unit 55c, the likelihood calculation unit 55d, and the association unit 55e, are executed.

The article-taking-out detection unit 55a is a processing unit that detects taking-out of an article when the article is taken out from a shelf. Specifically, when the article is taken out from the shelf provided with the weight sensor 60, the article-taking-out detection unit 55a detects the taking-out of the article 70 by receiving a signal indicating a change in weight from the weight sensor 60. A plurality of the weight sensors 60 are provided with sensor IDs, and each of sensor IDs is associated with each of the articles 70. Thus, upon receiving the sensor ID from the weight sensor 60, the article-taking-out detection unit 55a can identify the position of the taken out article 70 by referring to the article position data 54a in which the sensor ID is associated with the position of the article 70.

The skeleton estimation unit 55b performs a process of estimating a skeleton of the person A included in the image data 24a, based on the image data 24a acquired from the imaging devices 30. Specifically, a trained model for estimating a skeleton is downloaded in advance from a server device or the like (not shown), and the skeleton estimation unit 55b inputs the acquired image data 24a to the trained model to estimate a skeleton of the person A. The trained model is generated by inputting teacher data composed of many sets of data to a convolutional neural network (CNN), and, for example, performing back propagation based on correct answer data and repeating supervised learning to determine the weight of each path.

The estimation unit 55c identifies joint points for a neck, an elbow, and a wrist in the skeleton estimated by the skeleton estimation unit 55b, and performs coordinate transformation of the acquired image data 24a in the coordinate transformation unit 25b. If the determination unit 25c determines that the epipolar constraint is satisfied with respect to the coordinates of a joint point, the estimation unit 55c estimates that acquired coordinate points of the joint point correspond to the actual coordinate position of the joint point.

The likelihood calculation unit 55d is a processing unit that calculates the likelihood, based on the distance between the article 70 and the position of the joint point estimated by the estimation unit 55c, and on the average distance/variance data 54b. Specifically, the likelihood calculation unit 55d calculates the distance between the article 70 and the position, estimated by the estimation unit 55c, of each of the joint points for a neck, an elbow, and a wrist. Then, the likelihood calculation unit 55d reads out the average distance/variance data 54b referring to the data of the average distance and variance thereof, calculated in advance, between the article 70 and each joint point, and calculates the likelihood using the probability density function, based on the distance, the average distance, and the variance thereof.

The association unit 55e performs a process of associating a person with the article 70, based on the distance between the article 70 and each joint point and the likelihood calculated by the likelihood calculation unit 55d. Specifically, if the image data 24a includes only one person, the person is associated with the article. In a case where the image data 24a includes two or more persons, if the estimation unit 55c can estimate the corresponding joint points of the two or more persons, the article 70 is associated with the person whose joint point is closest to the article 70.

On the other hand, in a case where the corresponding joint points cannot be estimated, if the wrist of any one of the persons can be estimated as a joint point, the article 70 is associated with the person whose joint point is closest to the article 70. In a case where the corresponding joint points cannot be estimated, if the wrist of any one of the persons cannot be estimated as a joint point, the article 70 is associated with the person with a largest likelihood.

Calculation of the likelihood by the position estimation device 50 will be described with reference to FIGS. 15A and 15B, and FIG. 16. FIGS. 15A and 15B illustrate the distance between each joint point and the article. As shown in FIG. 15A, assuming that a distance between the article 70 and a wrist 80a is μ1, a distance between the article 70 and an elbow 80b is μ2, and a distance between the article 70 and a neck 80c is μ3, a plurality of distances are measured for each of μ1, μ2, μ3 when the article 70 is actually taken out, and an average distance to each joint point and variance of the distance are calculated in advance, and are stored in the memory 54 as the average distance/variance data 54b.

For example, as shown in FIG. 15B, a joint point “neck” is associated with an average distance u “0.5749” and a variance α “0.144913”, a joint point “elbow” is made to correspond to an average distance μ “0.3852” and a variance α “0.101852”, and a joint point “wrist” is made to correspond to an average distance μ “0.1” and a variance α “0.093589”.

The likelihood calculation unit 55d reads the average distance/variance data 54b from the memory 54, and calculates the likelihood using a probability density function, based on the average distance, the variance x, and the distance between the article 70 and the position, estimated by the estimation unit 55c, of each joint point. FIG. 16 shows one example of calculation of the probability density function where μ=0 and α=1. As shown in FIG. 16, the probability density function gives a larger probability value with respect to a distance between each joint and the article 70 when the article 70 is being taken out, and gives a smaller probability value with respect to a distance between each joint and the article 70 when the article 70 is not being taken out.

A processing procedure of the position estimation device 50 will be described. FIG. 17 is a flowchart showing the processing procedure of the position estimation device 50 shown in FIG. 12. As shown in FIG. 17, the position estimation device 50 determines whether or not a change in weight occurs in the weight sensor 60 (step S201). The position estimation device 50 performs a joint position identification process (step S202).

Thereafter, the position estimation device 50 performs a likelihood calculation process (step S203). The position estimation device 50 determines whether or not the number of persons (targets) to be associated with the article 70 is two or more (step S204). If the number of the targets is less than two (step S204: No), the position estimation device 50 associates the article 70 with the target (step S205), and ends the process.

On the other hand, if the number of the targets is two or more (step S204: Yes), the position estimation device 50 determines whether or not the identified joint positions of the targets indicate a common joint point (step S206). If the identified joint positions of the targets indicate a common joint point (step S206: Yes), the position estimation device 50 associates the article 70 with the target whose joint point is closest to the article 70 (step S207), and the process ends.

On the other hand, if the identified joint point positions of the targets do not indicate a common joint point (S206: No), the position estimation device 50 determines whether or not a wrist is included in the joint points of any one of the targets (step S208). If a wrist is not included in the joint points of any one of the targets (step S208: No), the position estimation device 50 associates the article 70 with the target with a largest likelihood (step S209), and ends the process. On the other hand, if a wrist is included in the joint points of any one of the targets (step S208: Yes), the position estimation device 50 associates the article with the target whose wrist is closest to the article 70 (step S210), and ends the process.

A processing procedure of the joint position identification process shown in FIG. 17 will be described. FIG. 18 is a flowchart showing the processing procedure of the joint position identification process. As shown in FIG. 18, the position estimation device 50 acquires captured images from the imaging devices 30 (step S301). The position estimation device 50 estimates a skeleton of a person included in each image (step S302). Thereafter, the position estimation device 50 identifies a joint point for a wrist, an elbow, a neck, or the like of the target from the estimated skeleton (S303).

The position estimation device 50 performs coordinate transformation on the image including the identified joint point (step S304), and calculates three-dimensional vectors (step S305). Thereafter, the position estimation device 50 uses the image data of the two imaging devices 30 to transform the image coordinates of each image to stereo spherical image coordinates (step S306). The position estimation device 50 determines whether or not the epipolar constraint is satisfied, based on vector information on the joint points in the stereo spherical image coordinates (step S307).

If the position estimation device 50 determines that the epipolar constraint is satisfied (step S307: Yes), the position estimation device 50 estimates that coordinate points for which the determination is made correspond to the position of the joint point (step S308), and the process proceeds to step S203 in FIG. 17. If the position estimation device 50 determines that the epipolar constraint is not satisfied (step S307: No), the process proceeds to step S203 in FIG. 17.

A processing procedure of the likelihood calculation process shown in FIG. 19 will be described. FIG. 19 is a flowchart showing the processing procedure of the likelihood calculation process. As shown in FIG. 19, the position estimation device 50 calculates distances from the article 70 to joint points (step S401). The position estimation device 50 identifies the joint point closest to the article 70 (step S402).

The position estimation device 50 calculates the likelihood (probability density value) of the identified joint point (step S403). Thereafter, if the likelihood is smaller than a threshold, the position estimation device 50 excludes the joint point from the determination process (step S404), and the process proceeds to step S204 in FIG. 17.

As described above, in embodiment 2, the position estimation system 40 includes the three imaging devices 30, the position estimation device 50, and the weight sensor 60. The imaging devices 30 are arranged such that the imaging positions of the imaging devices 30 form a triangle. When the article 70 is picked up by the person A, the position estimation device 50 detects the picking-up of the article 70 based on a change in weight, and identifies the position of the article from the sensor ID. Then, the position estimation device 50 acquires images including the person A from the imaging devices 30. The position estimation device 50 estimates a skeleton of the person based on the image data, and identifies a joint point for a neck, an elbow, a wrist, or the like of the person. Thereafter, the position estimation device 50 performs transformation from image coordinates to stereo spherical image coordinates, and then determines whether or not the epipolar constraint is satisfied in each pair of the stereo spherical image coordinates. If the epipolar constraint is satisfied, the position of a joint of the person A is estimated. Then, the position estimation device 50 calculates the likelihood, based on the distance between the joint point and the position of the article 70 and on the average distance of the joint point and variance thereof, calculated in advance, and associates the article 70 with the person A.

The correspondence between the position estimation device 20 of the position estimation system 10 according to embodiment 1 and the main hardware configuration of a computer will be described. FIG. 20 is a diagram showing one example of the hardware configuration.

In general, a computer is configured such that a CPU 91, a ROM 92, a RAM 93, a non-volatile memory 94, etc. are connected via a bus 95. A hard disk device may be provided instead of the non-volatile memory 94. For convenience of description, only the basic hardware configuration is shown in FIG. 20.

Here, a program, etc. required to boot an operating system (hereinafter, simply referred to as “OS”) is stored in the ROM 92 or the non-volatile memory 94, and the CPU 91 reads and executes the program for the OS from the ROM 92 or the non-volatile memory 94 when power is supplied.

On the other hand, various application programs to be operated on the OS is stored in the non-volatile memory 94, and the CPU 91 uses the RAM 93 as a main memory and executes an application program to execute a process corresponding to the application.

The position estimation program of the position estimation device 20 of the position estimation system 10 according to embodiment 1 is also stored in the non-volatile memory 94 or the like, similar to the other application programs, and the CPU 91 loads and executes a corresponding operating management program. In the case of the position estimation device 20 of the position estimation system 10 according to embodiment 1, a position estimation program including routines respectively corresponding to the image acquisition unit 25a, the coordinate transformation unit 25b, the determination unit 25c, and the estimation unit 25d shown in FIG. 4 is stored in the non-volatile memory 94 or the like. The position estimation program is loaded and executed by the CPU 91, whereby a position estimation process corresponding to the image acquisition unit 25a, the coordinate transformation unit 25b, the determination unit 25c, and the estimation unit 25d is generated.

In the above configuration, the coordinate transformation unit transforms a coordinate point of the object in image coordinates captured by a first imaging device to a coordinate point p1 in the stereo spherical image coordinates, transforms a coordinate point of the object in image coordinates captured by a second imaging device to a coordinate point p2 in the stereo spherical image coordinates, and transforms a coordinate point of the object in image coordinates captured by a third imaging device to a coordinate point p3 in the stereo spherical image coordinates.

In the above configuration, if a coordinate point p′1 as an intersection of a yz plane and a great circle passing through the coordinate point p1 in a stereo spherical image coordinate system using an imaging position of the first imaging device as an origin C1 is equal to a coordinate point p′2 as an intersection of a yz plane and a great circle passing through the coordinate point p2 in a stereo spherical image coordinate system using an imaging position of the second imaging device as an origin C2, and an angle β1 formed by a y axis and a straight line connecting the coordinate point p′1 and the origin of the stereo spherical image coordinates is equal to an angle β2 formed by a y axis and a straight line connecting the coordinate point p′2 and the origin of the stereo spherical image coordinates, the determination unit determines that the coordinate point p′1 of the first imaging device and the coordinate point p′2 of the second imaging device in the stereo spherical image coordinates satisfy the epipolar constraint.

In the above configuration, the position estimation device further includes a skeleton estimation unit configured to estimate a skeleton of a person, as the object, imaged by each imaging device. The estimation unit estimates a coordinate position of a neck or a wrist in the skeleton estimated by the skeleton estimation unit.

In the above configuration, the position estimation device further includes an article-take-out detection unit configured to detect, based on the coordinate position of the wrist estimated by the estimation unit and on position information acquired when a predetermined article has been taken out, whether or not the article has been taken out, and an association unit configured to associate, when the article-take-out detection unit detects that the article has been taken out, the taken out article with a person whose wrist is closest to the article.

In the above configuration, the position estimation device further includes a skeleton estimation unit configured to estimate a skeleton of a person, as the object, imaged by each imaging device. The position estimation device includes a second estimation unit configured to estimate coordinate positions of a neck, an elbow, and a wrist in the skeleton estimated by the skeleton estimation unit, a likelihood calculation unit configured to calculate likelihood, based on average distances and variance thereof from the article to the coordinate positions of the neck, the elbow, and the wrist, and a second association unit configured to associate the article with the person, based on a calculation result by the likelihood calculation unit.

A position estimation device according to one aspect of the present disclosure estimates a position of an object in a three-dimensional space, based on images captured by three imaging devices arranged such that imaging positions of the imaging devices form a triangle. The position estimation device includes a coordinate transformation unit configured to respectively transform coordinate points of the object in image coordinates of the imaging devices to coordinate points in stereo spherical image coordinates, a determination unit configured to determine whether or not the coordinate points of two of the imaging devices in the stereo spherical image coordinates satisfy an epipolar constraint, and an estimation unit configured to estimate that the coordinate points, determined by the determination unit to satisfy the epipolar constraint, correspond to a coordinate position of the object.

A position estimation method according to one aspect of the present disclosure is performed in a position estimation system including three imaging devices and a position estimation device configured to estimate a position of an object in a three-dimensional space based on images captured by the imaging devices. The three imaging devices are arranged such that imaging positions of the imaging devices form a triangle. The position estimation method includes a coordinate transformation step in which the position estimation device respectively transforms coordinate points of the object in image coordinates of the imaging devices to coordinate points in stereo spherical image coordinates, a determination step in which the position estimation device determines whether or not the coordinate points of two of the imaging devices in the stereo spherical image coordinates satisfy an epipolar constraint, and an estimation step in which the position estimation device estimates that the coordinate points, determined by the determination to satisfy the epipolar constraint, correspond to a coordinate position of the object.

A computer program product for position estimation according to one aspect of the present disclosure is used in a device configured to estimate a position of an object in a three-dimensional space based on images captured by three imaging devices arranged such that imaging positions of the imaging devices form a triangle. The computer program product causes the device to execute a coordinate transformation procedure for respectively transforming coordinate points of the object in image coordinates of the imaging devices to coordinate points in stereo spherical image coordinates, a determination procedure for determining whether or not the coordinate points of two of the imaging devices in the stereo spherical image coordinates satisfy an epipolar constraint, and an estimation procedure for estimating that the coordinate points, determined by the determination procedure to satisfy the epipolar constraint, correspond to a coordinate position of the object.

According to the present disclosure, the position of an object in a three-dimensional space can be efficiently estimated.

The constituent elements described in each embodiment described above are conceptually functional constituent elements, and thus may not be necessarily configured as physical constituent elements, as illustrated in the drawings. That is, distributed or integrated forms of each device are not limited to the forms illustrated in the drawings, and all or some of the forms may be distributed or integrated functionally or physically in any unit depending on various loads, use statuses, or the like.

The position estimation system, the position estimation device, the position estimation method, and the position estimation program according to the present disclosure are suitable for efficiently estimating the position of an object in a three-dimensional space.

SYSTEM, DEVICE, METHOD, AND COMPUTER PROGRAM PRODUCT FOR POSITION ESTIMATION

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)