The present application claims priority to Japanese patent application JP2023-186822, filed Oct. 31, 2023, the entire contents of which are incorporated herein by reference.
The present disclosure relates to a system, a device, a method, and a computer program product for estimating the position of an object in a three-dimensional space.
Conventionally, in an unmanned store selling commodities, it is important to identify which customer takes which commodity. Thus, a technology that automatically identifies the commodity a customer is purchasing and achieves an automated checkout, simply by having the customer take the commodity from a showcase and proceed to a checkout counter during shopping in a store, has been known.
For example, Japanese Patent No. 7225434 discloses an information processing system that causes a plurality of cameras (range sensors) provided in a store to track a customer, acquires position information on the customer, detects a taken out commodity with a weight sensor provided in a showcase, and manages the customer and the commodity in association with each other, whereby the commodity the customer is purchasing is identified and an automated checkout is achieved without an attendant or the like.
However, according to the conventional art, it is difficult to accurately acquire the position information on the customer with the plurality of cameras (range sensors). Specifically, if the positions of an object captured by the respective cameras are shifted, more processing time is needed to determine whether or not such positions correspond to the same object.
The present disclosure has been made in view of the problem of the conventional art. The present disclosure addresses the problem, as discussed herein, with a system, a device, a method, and a computer program product for efficiently estimating the position of an object in a three-dimensional space.
A position estimation system according to one aspect of the present disclosure includes three imaging devices, and a position estimation device configured to estimate a position of an object in a three-dimensional space, based on images captured by the imaging devices. The three imaging devices are arranged such that imaging positions of the imaging devices form a triangle. The position estimation device includes a coordinate transformation unit configured to respectively transform coordinate points of the object in image coordinates of the imaging devices to coordinate points in stereo spherical image coordinates, a determination unit configured to determine whether or not the coordinate points of two of the imaging devices in the stereo spherical image coordinates satisfy an epipolar constraint, and an estimation unit configured to estimate that the coordinate points, which are determined by the determination unit to satisfy the epipolar constraint, correspond to a coordinate position of the object.
The objects, features, advantages, and technical and industrial significance of this disclosure will be better understood by the following description and the accompanying drawings of the disclosure.
Hereinafter, a position estimation system, a position estimation device, a position estimation method, and a computer program product for position estimation according to embodiments of the present disclosure will be described in detail with reference to the drawings.
An outline of a position estimation system 10 according to embodiment 1 will be described.
The position estimation device 20 acquires images of an object from the imaging devices 30 (S1), and transforms image data, acquired by each imaging device, from camera coordinates to stereo spherical image coordinates (S2). Then, the position estimation device 20 determines whether or not predetermined coordinate points of the imaged object satisfy the epipolar constraint (S3). Then, if the predetermined coordinate points satisfy the epipolar constraint, the position estimation device 20 estimates that the coordinate points correspond to the coordinates of the object (S4).
Here, a case where the predetermined coordinate points satisfy the epipolar constraint will be described. As shown in
To perform determination regarding the epipolar constraint for the predetermined coordinate points, stereo spherical image coordinates are constructed by using the spherical image coordinates of two of the three imaging devices 30. That is, the position estimation system using the three imaging devices 30 provides three pairs of stereo spherical image coordinates as follows: first stereo spherical image coordinates calculated based on the spherical image coordinates of the first imaging device 30a and the second imaging device 30b; second stereo spherical image coordinates calculated based on the spherical image coordinates of the second imaging device 30b and the third imaging device 30c; and third stereo spherical image coordinates calculated based on the spherical image coordinates of the first imaging device 30a and the third imaging device 30c.
P shown in
As described above, the position estimation system 10 includes the position estimation device 20 and the three imaging devices 30. The three imaging devices 30 are arranged such that the imaging positions of the imaging devices 30 form a triangle. The position estimation device 20 acquires images from the three imaging devices 30, performs transformation from image coordinates to stereo spherical image coordinates, then determines whether or not the epipolar constraint is satisfied in each pair of the stereo spherical image coordinates, and identifies the position of the object P if the epipolar constraint is satisfied.
The system configuration of the position estimation system 10 shown in
The position estimation device 20 performs a process of acquiring images from the imaging devices 30, a process of transforming the image coordinates of the acquired images to stereo spherical image coordinates, a process of determining whether or not the epipolar constraint is satisfied in the stereo spherical image coordinates, a process of estimating the position of the object, etc. Each imaging device 30 is a BOX camera having a predetermined angle of view, and performs a process of capturing images of the object in a three-dimensional space.
The configuration of the position estimation device 20 shown in
The memory 24 is a memory device such as a hard disk device or a non-volatile memory, and stores therein image data 24a, stereo spherical image coordinate data 24b, determination data 24c, and estimated-position data 24d. The image data 24a refers to data of images acquired from the three imaging devices 30. The stereo spherical image coordinate data 24b refers to data of stereo spherical image coordinates generated using two of the three imaging devices 30. The stereo spherical image coordinate data 24b is stored as one set of data including: data generated using the first imaging device 30a and the second imaging device 30b; data generated using the second imaging device 30b and the third imaging device 30c; and data generated using the first imaging device 30a and the third imaging device 30c.
The determination data 24c refers to data of the determination result regarding whether or not the epipolar constraint is satisfied, determined based on the stereo spherical image coordinate data 24b. The estimated-position data 24d refers to data in which coordinate points, with respect to the position of the object imaged by the three imaging devices 30, which are determined to satisfy the epipolar constraint are estimated to correspond to the position of an object.
The control unit 25 is a controller for controlling the entire position estimation device 20, and includes an image acquisition unit 25a, a coordinate transformation unit 25b, a determination unit 25c, and an estimation unit 25d. Specifically, programs corresponding to these units are loaded in a CPU and executed, whereby processes respectively corresponding to the image acquisition unit 25a, the coordinate transformation unit 25b, the determination unit 25c, and the estimation unit 25d are executed.
The image acquisition unit 25a is a processing unit that acquires images of the object from the imaging devices 30. Specifically, the image acquisition unit 25a stores data of images transmitted from the imaging devices 30a, 30b, 30c as the image data 24a.
The coordinate transformation unit 25b is a processing unit that reads the image data 24a from the memory 24 and transforms the image coordinates captured by the imaging devices 30 to stereo spherical image coordinates. Specifically, as shown in
The determination unit 25c performs a process of determining whether or not predetermined coordinate points satisfy the epipolar constraint, based on the stereo spherical image coordinate data 24b. Specifically, the determination unit 25c performs determination regarding whether or not the epipolar constraint is satisfied in all the stereo spherical image coordinates, which are the first stereo spherical image coordinates calculated based on image data acquired from the first imaging device 30a and the second imaging device 30b, the second stereo spherical image coordinates calculated based on image data acquired from the second imaging device 30b and the third imaging device 30c, and the third stereo spherical image coordinates calculated based on image data acquired from the first imaging device 30a and the third imaging device 30c.
The determination unit 25c determines that it is “OK” if the epipolar constraint is satisfied in the three pairs of the stereo spherical image coordinates, and that it is “No Good” if the epipolar constraint is not satisfied in at least one pair of the three stereo spherical image coordinates. The estimation unit 25d is a processing unit that estimates that the predetermined coordinate points correspond to the position of an article if the determination unit 25c determines that it is “OK”.
Transformation of image coordinates by the position estimation device 20 will be described.
A point U in an image coordinate system can be expressed by equation (1) using a camera matrix K and the camera coordinates Xcamera.
Here, equation (1) is expressed by equation (2) by normalizing the vectors of Xcamera with Zc.
Here, xc=Xc/Zc and yc=Yc/Zc hold. According to equation (2), u and v can be given by u=fxxc+ox and v=fyyctoy. Accordingly, xc and yc can be given by xc=(u−ox)/fx and yc=(v−oy)/fy.
On the other hand, a three-dimensional direction vector p can be expressed by equation (3). N[*] represents normalization of a vector with norm 1.
Thus, if the camera matrix K and the camera rotation vector R are known, the three-dimensional direction vector p can be calculated from the image coordinates U.
<Case where Epipolar Constraint is Satisfied>
A condition satisfying the epipolar constraint will be described. Here, a case using the camera coordinates of the first imaging device 30a and the camera coordinates of the second imaging device 30b will be described. The same process is performed for transformation to stereo spherical coordinates using the camera coordinates of the second imaging device 30b and the camera coordinates of the third imaging device 30c and transformation to stereo spherical coordinates using the camera coordinates of the first imaging device 30a and the camera coordinates of the third imaging device 30c.
An angle α1 formed by a vector C1p1 and the X′ axis of the first spherical coordinates is obtained (see
Here, cos α1=p1TX′, p′1TX′=0, cos β1=p′1TY′, cos α2=p2TX′, p′2′X′=0, and cos β2=p′2TY′ hold. Since p′1 and p′2 are expressed by p′1=N [p1−p1x′ X′] and p′2=N [p2−p2X′X′], if P1 and P2 are the same point, p′1=p′2 and 31=32 hold, and the epipolar constraint is satisfied. Similarly, calculation regarding whether or not the epipolar constraint is satisfied is performed also for the second stereo spherical image coordinates and the third stereo spherical image coordinates. If the epipolar constraint is satisfied in all the stereo spherical image coordinates, the objects P imaged by the three imaging devices 30 can be identified to be in the same position. As described above, the position estimation device 20 according to the present disclosure can determine whether or not the epipolar constraint is satisfied, without calculating a fundamental matrix required for determination regarding whether or not the epipolar constraint is satisfied.
Next, a case where the epipolar constraint is not satisfied will be described.
An angle α3 formed by a vector C1p3 and the X′ axis is obtained. An angle α4 formed by a vector C2p4 and the X′ axis is obtained. Then, the position estimation device 20 obtains an angle β3 formed by the Y′ axis and a straight line connecting p′3 and the origin C1. The position estimation device 20 also obtains an angle β4 formed by the Y′ axis and a straight line connecting p′4 and the origin C2.
Here, p′3 and p′4 are expressed by p′3=N [p3-p3x′X′] and p′4=N [p4-p4x′ X′], but P3 and P4 are different points. Accordingly, p′3=p′4 and β3=β4 do not hold, and thus the epipolar constraint is not satisfied.
Next, calculation of an intersection of two three-dimensional vectors will be described.
A processing procedure of the position estimation device 20 shown in
Then, the position estimation device 20 transforms the image coordinates of each image to stereo spherical image coordinates, using image data acquired from the two imaging devices 30 (step S104). The position estimation device 20 determines whether or not the epipolar constraint is satisfied, based on vector information on the predetermined coordinate points in the stereo spherical image coordinates (step S105).
If the position estimation device 20 determines that the epipolar constraint is satisfied (step S105: Yes), coordinate points for which the determination has been made are estimated to correspond to the position of an object (step S106) and the process ends. If the position estimation device 20 determines that the epipolar constraint is not satisfied (step S105: No), the predetermined display 21 is controlled to display error information (step S107) and the process ends.
As described above, in embodiment 1, the position estimation system 10 includes the position estimation device 20 and the three imaging devices 30. The imaging devices 30 are arranged such that the imaging positions of the imaging devices 30 form a triangle. The position estimation device 20 acquires captured images of the object P from the imaging devices 30. The position estimation device 20 performs coordinate transformation based on the image coordinates, calculates the three-dimensional vectors for the object P, generates stereo spherical image coordinates based on the captured images of the two imaging devices 30, and determines whether or not the epipolar constraint is satisfied. Then, if the three-dimensional vectors for the object P satisfy the epipolar constraint, the position estimation device 20 estimates that the coordinate points satisfying the epipolar constraint correspond to the coordinates of the object P.
In embodiment 1, the case where the position estimation device 20 determines that the epipolar constraint is satisfied if p′1=p′2 and β1=β2 hold with respect to the relationship between p′1 and p′2, is described. However, an error in the arranged positions of the imaging devices 30 may be assumed, and if a difference between p′1 and p′2 and a difference between β1 and β2 are smaller than a predetermined threshold, the position estimation device 20 may determine that the epipolar constraint is satisfied.
In embodiment 1, the case where the object P is imaged by a plurality of the imaging devices 30 and the position of the object P is estimated based on the image data, is described. In embodiment 2, a case where the position of an article 70 is identified and the positions of joint points of a person A are estimated, and the person A taking the article 70 is identified, will be described. The same parts as those in embodiment 1 are denoted by the same reference characters and detailed description thereof is omitted.
When the article 70 is taken out from the shelf by the person A, the position estimation device 50 detects the taking-out of the article, based on data from the weight sensor 60 (S11). The position estimation device 50 acquires images of the person A from the imaging devices 30 (S12). Then, the position estimation device 50 estimates a skeleton of the person A from the image data (S13), and identifies a joint point for a neck, an elbow, a wrist, or the like (S14).
The position estimation device 50 transforms the image data including the joint point, from camera coordinates to stereo spherical image coordinates (S15), and determines whether or not the epipolar constraint is satisfied (S16). If the epipolar constraint is satisfied, the position estimation device 50 estimates a distance between the article and the joint point (S17). Then, the position estimation device 50 calculates the likelihood of the joint point, based on an average distance and variance thereof, which are calculated in advance, and on the estimated distance between the article and the joint point, in order to determine whether or not the person A has taken out the article 70 (S18), and associates the article 70 with the person A, based on the distance and the likelihood (S19).
As described above, the position estimation system 40 includes the position estimation device 50, the three imaging devices 30, and the shelf provided with the weight sensor 60. When the article 70 is picked up by the person A, the position estimation device 50 detects the picking-up of the article 70, based on a change in weight, and acquires images including the person A from the imaging devices 30. The position estimation device 50 estimates a skeleton of the person A based on the image data, and identifies a joint point for a neck, an elbow, a wrist, or the like of the person A. Then, the position estimation device 50 performs transformation from image coordinates to stereo spherical image coordinates, determines whether or not the epipolar constraint is satisfied in each pair of the stereo spherical image coordinates, and estimates the position of the joint point of the person A if the epipolar constraint is satisfied. The position estimation device 50 calculates the distance between the joint point and the article 70. The position estimation device 50 calculates the likelihood, based on the average distance and variance thereof, which are prepared in advance, between the joint point and the article 70 and on the calculated distance between the joint point and the article 70, and associates the article 70 with the person A.
The system configuration of the position estimation system 40 shown in
The position estimation device 50 performs a process of receiving a signal from the weight sensor 60 and detecting picking-up of an article, a process of acquiring images from the imaging devices 30, a process of estimating a skeleton of a person based on the images, a process of identifying a joint point for a neck, an elbow, a wrist, or the like based on the result of the estimated skeleton, a process of transforming the image coordinates of the acquired images to stereo spherical image coordinates, a process of determining whether or not the epipolar constraint is satisfied in the stereo spherical image coordinates, a process of estimating the position of the joint point, a process of calculating a distance and variance thereof between the article and the joint point, a process of calculating the likelihood from the distance and the variance thereof, a process of associating the article with the person based on the likelihood, etc. When the article is picked up, the weight sensor 60 performs a process of notifying the position estimation device 50 of a change in weight.
The configuration of the position estimation device 50 shown in
The memory 54 is a memory device such as a hard disk device or a non-volatile memory, and stores therein image data 24a, stereo spherical image coordinate data 24b, determination data 24c, estimated-position data 24d, article position data 54a, average distance/variance data 54b, estimated-skeleton data 54c, likelihood data 54d, and association data 54e. The article position data 54a refers to data in which the weight sensor 60 is associated with the position of the article 70. The average distance/variance data 54b refers to data of an average distance and variance thereof, calculated in advance, between the joint point and the article 70 when the article is taken out. The estimated-skeleton data 54c refers to data in which a skeleton of the person A is estimated. The likelihood data 54d refers to data in which a probability density function is calculated as the likelihood, from the distance between the article 70 and the joint point of the person A and the average distance and the variance thereof, calculated in advance, between the article 70 and the joint point. The association data 54e refers to data in which the person A is associated with the article 70, based on the likelihood data 54d.
The control unit 55 is a controller for controlling the entire position estimation device 50, and includes an image acquisition unit 25a, a coordinate transformation unit 25b, a determination unit 25c, an article-taking-out detection unit 55a, a skeleton estimation unit 55b, an estimation unit 55c, a likelihood calculation unit 55d, and an association unit 55e. Specifically, programs corresponding to these units are loaded in a CPU and are executed, whereby processes respectively corresponding to the image acquisition unit 25a, the coordinate transformation unit 25b, the determination unit 25c, the article-taking-out detection unit 55a, the skeleton estimation unit 55b, the estimation unit 55c, the likelihood calculation unit 55d, and the association unit 55e, are executed.
The article-taking-out detection unit 55a is a processing unit that detects taking-out of an article when the article is taken out from a shelf. Specifically, when the article is taken out from the shelf provided with the weight sensor 60, the article-taking-out detection unit 55a detects the taking-out of the article 70 by receiving a signal indicating a change in weight from the weight sensor 60. A plurality of the weight sensors 60 are provided with sensor IDs, and each of sensor IDs is associated with each of the articles 70. Thus, upon receiving the sensor ID from the weight sensor 60, the article-taking-out detection unit 55a can identify the position of the taken out article 70 by referring to the article position data 54a in which the sensor ID is associated with the position of the article 70.
The skeleton estimation unit 55b performs a process of estimating a skeleton of the person A included in the image data 24a, based on the image data 24a acquired from the imaging devices 30. Specifically, a trained model for estimating a skeleton is downloaded in advance from a server device or the like (not shown), and the skeleton estimation unit 55b inputs the acquired image data 24a to the trained model to estimate a skeleton of the person A. The trained model is generated by inputting teacher data composed of many sets of data to a convolutional neural network (CNN), and, for example, performing back propagation based on correct answer data and repeating supervised learning to determine the weight of each path.
The estimation unit 55c identifies joint points for a neck, an elbow, and a wrist in the skeleton estimated by the skeleton estimation unit 55b, and performs coordinate transformation of the acquired image data 24a in the coordinate transformation unit 25b. If the determination unit 25c determines that the epipolar constraint is satisfied with respect to the coordinates of a joint point, the estimation unit 55c estimates that acquired coordinate points of the joint point correspond to the actual coordinate position of the joint point.
The likelihood calculation unit 55d is a processing unit that calculates the likelihood, based on the distance between the article 70 and the position of the joint point estimated by the estimation unit 55c, and on the average distance/variance data 54b. Specifically, the likelihood calculation unit 55d calculates the distance between the article 70 and the position, estimated by the estimation unit 55c, of each of the joint points for a neck, an elbow, and a wrist. Then, the likelihood calculation unit 55d reads out the average distance/variance data 54b referring to the data of the average distance and variance thereof, calculated in advance, between the article 70 and each joint point, and calculates the likelihood using the probability density function, based on the distance, the average distance, and the variance thereof.
The association unit 55e performs a process of associating a person with the article 70, based on the distance between the article 70 and each joint point and the likelihood calculated by the likelihood calculation unit 55d. Specifically, if the image data 24a includes only one person, the person is associated with the article. In a case where the image data 24a includes two or more persons, if the estimation unit 55c can estimate the corresponding joint points of the two or more persons, the article 70 is associated with the person whose joint point is closest to the article 70.
On the other hand, in a case where the corresponding joint points cannot be estimated, if the wrist of any one of the persons can be estimated as a joint point, the article 70 is associated with the person whose joint point is closest to the article 70. In a case where the corresponding joint points cannot be estimated, if the wrist of any one of the persons cannot be estimated as a joint point, the article 70 is associated with the person with a largest likelihood.
Calculation of the likelihood by the position estimation device 50 will be described with reference to
For example, as shown in
The likelihood calculation unit 55d reads the average distance/variance data 54b from the memory 54, and calculates the likelihood using a probability density function, based on the average distance, the variance x, and the distance between the article 70 and the position, estimated by the estimation unit 55c, of each joint point.
A processing procedure of the position estimation device 50 will be described.
Thereafter, the position estimation device 50 performs a likelihood calculation process (step S203). The position estimation device 50 determines whether or not the number of persons (targets) to be associated with the article 70 is two or more (step S204). If the number of the targets is less than two (step S204: No), the position estimation device 50 associates the article 70 with the target (step S205), and ends the process.
On the other hand, if the number of the targets is two or more (step S204: Yes), the position estimation device 50 determines whether or not the identified joint positions of the targets indicate a common joint point (step S206). If the identified joint positions of the targets indicate a common joint point (step S206: Yes), the position estimation device 50 associates the article 70 with the target whose joint point is closest to the article 70 (step S207), and the process ends.
On the other hand, if the identified joint point positions of the targets do not indicate a common joint point (S206: No), the position estimation device 50 determines whether or not a wrist is included in the joint points of any one of the targets (step S208). If a wrist is not included in the joint points of any one of the targets (step S208: No), the position estimation device 50 associates the article 70 with the target with a largest likelihood (step S209), and ends the process. On the other hand, if a wrist is included in the joint points of any one of the targets (step S208: Yes), the position estimation device 50 associates the article with the target whose wrist is closest to the article 70 (step S210), and ends the process.
A processing procedure of the joint position identification process shown in
The position estimation device 50 performs coordinate transformation on the image including the identified joint point (step S304), and calculates three-dimensional vectors (step S305). Thereafter, the position estimation device 50 uses the image data of the two imaging devices 30 to transform the image coordinates of each image to stereo spherical image coordinates (step S306). The position estimation device 50 determines whether or not the epipolar constraint is satisfied, based on vector information on the joint points in the stereo spherical image coordinates (step S307).
If the position estimation device 50 determines that the epipolar constraint is satisfied (step S307: Yes), the position estimation device 50 estimates that coordinate points for which the determination is made correspond to the position of the joint point (step S308), and the process proceeds to step S203 in
A processing procedure of the likelihood calculation process shown in
The position estimation device 50 calculates the likelihood (probability density value) of the identified joint point (step S403). Thereafter, if the likelihood is smaller than a threshold, the position estimation device 50 excludes the joint point from the determination process (step S404), and the process proceeds to step S204 in
As described above, in embodiment 2, the position estimation system 40 includes the three imaging devices 30, the position estimation device 50, and the weight sensor 60. The imaging devices 30 are arranged such that the imaging positions of the imaging devices 30 form a triangle. When the article 70 is picked up by the person A, the position estimation device 50 detects the picking-up of the article 70 based on a change in weight, and identifies the position of the article from the sensor ID. Then, the position estimation device 50 acquires images including the person A from the imaging devices 30. The position estimation device 50 estimates a skeleton of the person based on the image data, and identifies a joint point for a neck, an elbow, a wrist, or the like of the person. Thereafter, the position estimation device 50 performs transformation from image coordinates to stereo spherical image coordinates, and then determines whether or not the epipolar constraint is satisfied in each pair of the stereo spherical image coordinates. If the epipolar constraint is satisfied, the position of a joint of the person A is estimated. Then, the position estimation device 50 calculates the likelihood, based on the distance between the joint point and the position of the article 70 and on the average distance of the joint point and variance thereof, calculated in advance, and associates the article 70 with the person A.
<Relationship with Hardware>
The correspondence between the position estimation device 20 of the position estimation system 10 according to embodiment 1 and the main hardware configuration of a computer will be described.
In general, a computer is configured such that a CPU 91, a ROM 92, a RAM 93, a non-volatile memory 94, etc. are connected via a bus 95. A hard disk device may be provided instead of the non-volatile memory 94. For convenience of description, only the basic hardware configuration is shown in
Here, a program, etc. required to boot an operating system (hereinafter, simply referred to as “OS”) is stored in the ROM 92 or the non-volatile memory 94, and the CPU 91 reads and executes the program for the OS from the ROM 92 or the non-volatile memory 94 when power is supplied.
On the other hand, various application programs to be operated on the OS is stored in the non-volatile memory 94, and the CPU 91 uses the RAM 93 as a main memory and executes an application program to execute a process corresponding to the application.
The position estimation program of the position estimation device 20 of the position estimation system 10 according to embodiment 1 is also stored in the non-volatile memory 94 or the like, similar to the other application programs, and the CPU 91 loads and executes a corresponding operating management program. In the case of the position estimation device 20 of the position estimation system 10 according to embodiment 1, a position estimation program including routines respectively corresponding to the image acquisition unit 25a, the coordinate transformation unit 25b, the determination unit 25c, and the estimation unit 25d shown in
A position estimation system according to one aspect of the present disclosure includes three imaging devices, and a position estimation device configured to estimate a position of an object in a three-dimensional space, based on images captured by the imaging devices. The three imaging devices are arranged such that imaging positions of the imaging devices form a triangle. The position estimation device includes a coordinate transformation unit configured to respectively transform coordinate points of the object in image coordinates of the imaging devices to coordinate points in stereo spherical image coordinates, a determination unit configured to determine whether or not the coordinate points of two of the imaging devices in the stereo spherical image coordinates satisfy an epipolar constraint, and an estimation unit configured to estimate that the coordinate points, which are determined by the determination unit to satisfy the epipolar constraint, correspond to a coordinate position of the object.
In the above configuration, the coordinate transformation unit transforms a coordinate point of the object in image coordinates captured by a first imaging device to a coordinate point p1 in the stereo spherical image coordinates, transforms a coordinate point of the object in image coordinates captured by a second imaging device to a coordinate point p2 in the stereo spherical image coordinates, and transforms a coordinate point of the object in image coordinates captured by a third imaging device to a coordinate point p3 in the stereo spherical image coordinates.
In the above configuration, if a coordinate point p′1 as an intersection of a yz plane and a great circle passing through the coordinate point p1 in a stereo spherical image coordinate system using an imaging position of the first imaging device as an origin C1 is equal to a coordinate point p′2 as an intersection of a yz plane and a great circle passing through the coordinate point p2 in a stereo spherical image coordinate system using an imaging position of the second imaging device as an origin C2, and an angle β1 formed by a y axis and a straight line connecting the coordinate point p′1 and the origin of the stereo spherical image coordinates is equal to an angle β2 formed by a y axis and a straight line connecting the coordinate point p′2 and the origin of the stereo spherical image coordinates, the determination unit determines that the coordinate point p′1 of the first imaging device and the coordinate point p′2 of the second imaging device in the stereo spherical image coordinates satisfy the epipolar constraint.
In the above configuration, the position estimation device further includes a skeleton estimation unit configured to estimate a skeleton of a person, as the object, imaged by each imaging device. The estimation unit estimates a coordinate position of a neck or a wrist in the skeleton estimated by the skeleton estimation unit.
In the above configuration, the position estimation device further includes an article-take-out detection unit configured to detect, based on the coordinate position of the wrist estimated by the estimation unit and on position information acquired when a predetermined article has been taken out, whether or not the article has been taken out, and an association unit configured to associate, when the article-take-out detection unit detects that the article has been taken out, the taken out article with a person whose wrist is closest to the article.
In the above configuration, the position estimation device further includes a skeleton estimation unit configured to estimate a skeleton of a person, as the object, imaged by each imaging device. The position estimation device includes a second estimation unit configured to estimate coordinate positions of a neck, an elbow, and a wrist in the skeleton estimated by the skeleton estimation unit, a likelihood calculation unit configured to calculate likelihood, based on average distances and variance thereof from the article to the coordinate positions of the neck, the elbow, and the wrist, and a second association unit configured to associate the article with the person, based on a calculation result by the likelihood calculation unit.
A position estimation device according to one aspect of the present disclosure estimates a position of an object in a three-dimensional space, based on images captured by three imaging devices arranged such that imaging positions of the imaging devices form a triangle. The position estimation device includes a coordinate transformation unit configured to respectively transform coordinate points of the object in image coordinates of the imaging devices to coordinate points in stereo spherical image coordinates, a determination unit configured to determine whether or not the coordinate points of two of the imaging devices in the stereo spherical image coordinates satisfy an epipolar constraint, and an estimation unit configured to estimate that the coordinate points, determined by the determination unit to satisfy the epipolar constraint, correspond to a coordinate position of the object.
A position estimation method according to one aspect of the present disclosure is performed in a position estimation system including three imaging devices and a position estimation device configured to estimate a position of an object in a three-dimensional space based on images captured by the imaging devices. The three imaging devices are arranged such that imaging positions of the imaging devices form a triangle. The position estimation method includes a coordinate transformation step in which the position estimation device respectively transforms coordinate points of the object in image coordinates of the imaging devices to coordinate points in stereo spherical image coordinates, a determination step in which the position estimation device determines whether or not the coordinate points of two of the imaging devices in the stereo spherical image coordinates satisfy an epipolar constraint, and an estimation step in which the position estimation device estimates that the coordinate points, determined by the determination to satisfy the epipolar constraint, correspond to a coordinate position of the object.
A computer program product for position estimation according to one aspect of the present disclosure is used in a device configured to estimate a position of an object in a three-dimensional space based on images captured by three imaging devices arranged such that imaging positions of the imaging devices form a triangle. The computer program product causes the device to execute a coordinate transformation procedure for respectively transforming coordinate points of the object in image coordinates of the imaging devices to coordinate points in stereo spherical image coordinates, a determination procedure for determining whether or not the coordinate points of two of the imaging devices in the stereo spherical image coordinates satisfy an epipolar constraint, and an estimation procedure for estimating that the coordinate points, determined by the determination procedure to satisfy the epipolar constraint, correspond to a coordinate position of the object.
According to the present disclosure, the position of an object in a three-dimensional space can be efficiently estimated.
The constituent elements described in each embodiment described above are conceptually functional constituent elements, and thus may not be necessarily configured as physical constituent elements, as illustrated in the drawings. That is, distributed or integrated forms of each device are not limited to the forms illustrated in the drawings, and all or some of the forms may be distributed or integrated functionally or physically in any unit depending on various loads, use statuses, or the like.
The position estimation system, the position estimation device, the position estimation method, and the position estimation program according to the present disclosure are suitable for efficiently estimating the position of an object in a three-dimensional space.
| Number | Date | Country | Kind |
|---|---|---|---|
| 2023-186822 | Oct 2023 | JP | national |