This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2015-190353, filed on Sep. 28, 2015, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a position estimating device and a position estimating method.
A technology is known that detects, on the basis of a video captured by a camera, that a customer picked up a product in a retail store (see, for example, Patent Documents 1 to 3). Information indicating that a customer picked up a product can be used as information indicating purchase behavior of the customer. For example, if a product is identified that a customer picked up once but did not purchase but instead returned to a product shelf where it was before, it is possible to, for example, create more effective advertising so as to increase sales, which promises marketing effects.
Patent Document 1: Japanese Laid-open Patent Publication No. 2009-48430
Patent Document 2: Japanese Laid-open Patent Publication No. 2009-3701
Patent Document 3: Japanese Laid-open Patent Publication No. 2014-26350
According to an aspect of the embodiments, a position estimating device includes a memory and a processor.
The memory stores a first image of an object in an image-capturing target region and a second image, the first image being captured by a first imaging device, and the second image being captured by a second imaging device by use of a reflected electromagnetic wave from the image-capturing target region, using an electromagnetic source that radiates an electromagnetic wave onto the image-capturing target region. When a position of a strongly reflective region in the second image corresponds to a prescribed position in the second image, the processor estimates a position of the object on the basis of the first image and complementary information that complements an image of the strongly reflective region.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
Embodiments will now be described in detail with reference to the drawings.
An image of one camera is two-dimensional information and does not include depth information. Thus, it is difficult to determine a position of an object in three-dimensional space on the basis of an image of one camera. On the other hand, if images of two cameras installed away from each other are used, it is possible to determine a three-dimensional position of an object using the principle of triangulation.
A visible security camera may be installed in a retail store. Further, an infrared camera may also be installed near a product shelf in order to detect, for example, a line of sight to a product from a customer in front of the product shelf. The visible camera captures a visible image on the basis of visible light that is reflected by a subject, and the infrared camera captures an infrared image on the basis of an infrared ray that is radiated from an infrared source and reflected by the subject.
As described above, a visible camera and an infrared camera are installed for different purposes from each other in a retail store. Thus, the inventors have realized that it is possible to utilize a visible image of a visible camera and an infrared image of an infrared camera to estimate a three-dimensional position of an object because the installation positions of the two cameras are known and the two cameras have different optical-axis directions from each other. If the visible image and the infrared image are used in combination, it is possible to detect, for example, a behavior of a customer reaching out for a product.
However, an infrared camera for detecting a line of sight detects an infrared ray that is radiated from an infrared source installed around the infrared camera, and that is reflected by a subject. Thus, when a subject is located at not greater than a prescribed distance from a camera, or when there exists an object made of a material that can easily reflect an infrared ray in the vicinity of the subject, a strongly reflective region in which a brightness value is extremely large may appear in an infrared image because a reflected infrared ray is too strong. In such a strongly reflective region, a phenomenon called whiteout in which an image fades to white and a contour of an object is made unclear occurs.
When a strongly reflective region appears around a subject, it is difficult to determine a three-dimensional position of the subject using triangulation because a correspondence relationship in a position of a subject between a visible image and an infrared image is unclear.
For example, when the hand of a customer exists very close to an infrared source and an infrared ray is reflected by the hand or a sleeve of the customer, that region is a strongly reflective region, so it is difficult to detect a shape of the hand accurately. When such a strong reflection continues to occur for a long time, there is a possibility of determining in error that the customer picked up a product even though he/she did not pick it up, or determining in error that the customer did not pick up a product even though he/she did pick it up.
This problem occurs not only in an infrared image but also in an image that is captured using other electromagnetic sources including a visible light source. Further, the problem occurs not only when an image of the hand of a customer is captured but also when images of other objects in an image-capturing target region are captured.
When the position of a strongly reflective region in the second image corresponds to a prescribed position in the second image, the position estimating device 114 estimates a position of the object 122 on the basis of the first image and complementary information that complements an image in the strongly reflective region (Step 204).
According to the position estimating system 101 described above, it is possible to estimate a position of the object 122 on the basis of a first image captured by the imaging device 111 and a second image captured by the imaging device 112 using the electromagnetic source 113.
The visible camera 311 captures a visible video 331 of the image-capturing target region 121. The infrared source 313 radiates an infrared ray onto the image-capturing target region 121, and the infrared camera 312 captures an infrared video 332 by use of a reflected infrared ray from the image-capturing target region 121. The radiated infrared ray may be a near-infrared ray. Further, the infrared ray may be radiated indirectly, for example, by reflecting the infrared ray by a mirror. The visible video 331 includes visible images at a plurality of times, and the infrared video 332 includes infrared images at a plurality of times. An image at each time may also be referred to as a frame.
When a whiteout has occurred in a region, in an infrared image, in which it is not surprising if a whiteout occurs, this indicates that there exists the object 122 in the region. A prescribed region in which a whiteout occurs can be predicted from a relationship between an installation position of the infrared camera 312 and an installation position of the infrared source 313, so it is possible to estimate an approximate range in which the object 122 exists when a whiteout occurs in the prescribed region. In this case, the position estimating device 314 can estimate a three-dimensional position of the object 122 on the basis of a position of the object 122 in a visible image and a range in which the object 122 exists in an infrared image.
If a position estimation is performed when a whiteout occurs in a prescribed region, instead of being always not performed when a whiteout occurs in an infrared image, the number of cases in which a position estimation is not performed will be reduced, and the number of cases in which a position estimation is performed will be increased.
The position estimating device 314 includes a storage 321, a video capturing unit 322, a state change detector 323, a feature amount calculator 324, a similarity determination unit 325, a region determination unit 326, and a position estimator 327.
The storage 321 stores therein the visible video 331, the infrared video 332, a key visible image 333, a key infrared image 334, and region information 335. The key visible image 333 is a reference image that is compared with a visible image included in the visible video 331, and the key infrared image 334 is a reference image that is compared with an infrared image included in the infrared video 332. The key visible image 333 and the key infrared image 334 may respectively be, for example, a visible image and an infrared image respectively captured by the visible camera 311 and the infrared camera 312 when there does not exist the object 122 in the image-capturing target region 121.
The region information 335 indicates a position and a shape of a prescribed region in which a whiteout is predicted to occur due to the infrared source 313. The prescribed region is preset in a prescribed position in an infrared image on the basis of a positional relationship between the infrared camera 312 and the infrared source 313.
The video capturing unit 322 stores, in the storage 321, the visible video 331 and the infrared video 332 that are input from the visible camera 311 and the infrared camera 312, respectively.
The state change detector 323 detects a change in the state in the image-capturing target region 121 on the basis of the visible video 331, the infrared video 332, the key visible image 333, and the key infrared image 334. The change in state may be, for example, the occurrence of the object 122 or the movement of the object 122.
The feature amount calculator 324 calculates, in each of a visible image and an infrared image, a feature amount of a region in which a change in state has been detected. For example, in each of the visible image and the infrared image, a coordinate that represents a representative position of a region that corresponds to the object 122 can be used as the feature amount. The representative position of a region may be a point on the periphery of the region, a point situated in the region, or a center of gravity of the region.
The similarity determination unit 325 calculates a similarity between a temporal change in the feature amount in the visible video 331 and a temporal change in the feature amount in the infrared video 332, and compares the calculated similarity with a threshold. Then, when the similarity is greater than the threshold, the similarity determination unit 325 determines that a position estimation based on a visible image and an infrared image is to be performed.
When the similarity is not greater than the threshold, the region determination unit 326 determines whether the region in which a change in state has been detected is a strongly reflective region that corresponds to a prescribed region indicated by the region information 335. Then, when the region in which a change in state has been detected is a strongly reflective region that corresponds to the prescribed region, the region determination unit 326 determines that a position estimation based on a visible image and an infrared image is to be performed. Further, when the region in which a change in state has been detected is not a strongly reflective region that corresponds to the prescribed region, the region determination unit 326 determines that a position estimation based on a visible image and an infrared image is not to be performed.
When the similarity determination unit 325 or the region determination unit 326 determines that a position estimation is to be performed, the position estimator 327 estimates a three-dimensional position of the object 122 on the basis of a visible image and an infrared image when a change in state has been detected.
In this case, the visible camera 311 and the eye tracking sensor 411 communicate with the position estimating device 314 through a wired or wireless communication network. The position estimating device 314 may be installed in the store or in a different building located away from the retail store.
Next, the state change detector 323 calculates the sizes of difference regions that are respectively included in the difference visible image and the difference infrared image (Step 602). The size of a difference region may be a length of the difference region in an image in a horizontal or vertical direction, or may be an area of the difference region.
When a visible image 711 and an infrared image 712 are extracted from the visible video 331 and the infrared video 332, respectively, a difference visible image 721 is generated from the visible image 711 and the key visible image 701, and a difference infrared image 722 is generated from the infrared image 712 and the key infrared image 702. The arm 403 of the customer appears in both the visible image 711 and the infrared image 712, and both a difference region 741 in the difference visible image 721 and a difference region 742 in the difference infrared image 722 correspond to a region of the arm 403.
In Step 602, the state change detector 323 may only extract, as a difference region, a flesh-colored portion from a region that represents a difference in a difference visible image. This permits a more accurate extraction of the region of the arm 403.
Next, the state change detector 323 compares the size of each difference region with a threshold TH1 (Step 603). When both of the sizes of difference regions in a difference visible image and a difference infrared image are greater than the threshold TH1 (Step 603, YES), the state change detector 323 determines that a state has changed. On the other hand, if at least one of the sizes of the difference regions in the difference visible image and the difference infrared image is not greater than the threshold TH1 (Step 603, NO), the state change detector 323 determines that a state has not changed.
TH1 is a threshold used to determine whether the object 122 has appeared in a visible image or an infrared image. For example, when the size of a difference region represents an area, the value equivalent to or more than 10% of the area of an entire image can be used as TH1.
When a state has not changed (Step 603, NO), the state change detector 323 checks whether it has extracted a visible image and an infrared image at a last time from the visible video 331 and the infrared video 332, respectively (Step 610). When it has still not extracted the visible image and the infrared image at the last time (Step 610, NO), the state change detector 323 repeats the processes of and after Step 601 with respect to a visible image and an infrared image at a next time.
On the other hand, when a state has changed (Step 603, YES), the feature amount calculator 324 calculates a feature amount of each difference region (Step 604). For example, a coordinate that represents a representative position such as a center of gravity of a difference region can be used as a feature amount of the difference region. When an image has an X-axis in its horizontal direction and a Y-axis in its vertical direction, the X-coordinate of a center of gravity is an average of X-coordinates of all pixels in a difference region, and the Y-coordinate of the center of gravity is an average of Y-coordinates of all of the pixels in the difference region.
Next, the similarity determination unit 325 calculates a similarity between a temporal change in the feature amount in the visible video 331 and a temporal change in the feature amount in the infrared video 332 (Step 605).
Thus, when changes in state have been detected from a visible image and an infrared image at a time t, the similarity determination unit 325 compares a temporal change in the feature amount in the visible video 331 with a temporal change in the feature amount in the infrared video 332 in an interval from a prescribed time t0 to the time t. This permits an estimation of whether a difference region in a difference visible image and a difference region in a difference infrared image represent the same object. The prescribed time t0 may be a first time in the visible video 331 and in the infrared video 332.
For example, the sum or the dispersion of a difference in feature amount at each time, or the reciprocal of its standard deviation can be used as a similarity between temporal changes in two feature amounts. Here, a normalized feature amount may be used to compare a visible image with an infrared image.
Next, the similarity determination unit 325 compares the similarity of a temporal change in feature amount with a threshold TH2 (Step 606). When the similarity is greater than the threshold TH2 (Step 606, YES), the similarity determination unit 325 determines that a difference region in a difference visible image and a difference region in a difference infrared image represent the same object. On the other hand, when the similarity is not greater than the threshold TH2 (Step 606, NO), the similarity determination unit 325 determines that a correspondence relationship between the difference region in the difference visible image and the difference region in the difference infrared image is unclear.
When the similarity is greater than the threshold TH2 (Step 606, YES), the position estimator 327 estimates a three-dimensional position of the object 122 using triangulation, on the basis of a correspondence relationship between the difference region in the difference visible image and the difference region in the difference infrared image (Step 611). Then, the state change detector 323 performs the processes of and after Step 610.
For example, when the object 122 is the arm 403 of the customer, the position estimator 327 may estimate a three-dimensional position of a fingertip using a coordinate of a position, in each difference region, which corresponds to the fingertip. This permits a determination of whether the hand of the customer has reached the product 402 on the product shelf 401.
On the other hand, when the similarity is not greater than the threshold TH2 (Step 606, NO), the region determination unit 326 checks whether the difference region in the difference infrared image includes a high brightness region (Step 607). The high brightness region is, for example, a region that corresponds to a collection of pixels that has a brightness value greater than a prescribed value, and that has an area having a value greater than a prescribed value.
In this case, the hand exists very close to the eye tracking sensor 411, so a whiteout occurs in a region 941 of the hand that is included in the difference region 932, and the region 941 is a high brightness region. On the other hand, a whiteout does not occur in a region 942 of the arm that is included in the difference region 932, and the region 942 is a low brightness region.
When the difference region includes a high brightness region (Step 607, YES), the region determination unit 326 determines that the high brightness region is a strongly reflective region. Then, the region determination unit 326 checks whether the strongly reflective region corresponds to a prescribed region indicated by the region information 335 (Step 608). For example, when at least one of the following conditions (a) and (b) is satisfied, the region determination unit 326 can determine that the strongly reflective region corresponds to the prescribed region.
(a) A distance between a representative position of the strongly reflective region and a position of the prescribed region is less than a threshold.
(b) The proportion of the area of an overlapping portion of the strongly reflective region and the prescribed region to the area of the prescribed region is greater than the threshold.
In Step 608, the region determination unit 326 may further check whether the infrared source 313 appears in a visible image at the same time. When the hand of the customer exists very close to the infrared source 313, the infrared source 313 is often hiding behind the customer in the visible video 331. Thus, if it is confirmed that the infrared source 313 does not appear in the visible image at the same time, the confidence that the strongly reflective region corresponds to the prescribed region is improved.
When the difference region does not include a high brightness region (Step 607, NO), or when the strongly reflective region does not correspond to the prescribed region (Step 608, NO), the region determination unit 326 determines that the difference region in the difference visible image and the difference region in the difference infrared image do not represent the same object. Then, the state change detector 323 performs the processes of and after Step 610.
On the other hand, when the strongly reflective region corresponds to the prescribed region (Step 608, YES), the region determination unit 326 determines that the difference region in the difference visible image and the difference region in the difference infrared image represent the same object. Then, the position estimator 327 estimates a three-dimensional position of the object 122 using triangulation, by use of the difference region in the difference visible image and complementary information that complements an image in the strongly reflective region (Step 609), and the state change detector 323 performs the processes of and after Step 610.
For example, an estimated difference region when it is assumed that a whiteout does not occur in an infrared image can be used as complementary information. For example, the position estimator 327 generates an estimated difference region at a current time on the basis of the feature amount of a difference region in a difference visible image at the current time or the feature amount of a difference region in a difference infrared image in the past. The difference infrared image in the past is a difference infrared image that corresponds to an infrared image that was captured at a time before the current time.
In this case, the position estimator 327 can estimate the Y-coordinate of the center of gravity in the interval 1011 on the basis of a temporal change in the feature amount in the visible video 331. For example, if the Y-coordinate of a center of gravity is complemented by use of the shape of the polygonal line 801 of
For example, the position estimator 327 can obtain, from a difference region in a difference infrared image at a time before the interval 1011, a shape of an object region that represents the object 122 and a value X1 of the X-coordinate of a center of gravity of the object region. Then, the position estimator 327 generates an estimated difference region by arranging the object region such that the center of gravity of the object region coincides with a point (X1,Y1) in the difference infrared image.
The position estimator 327 may estimate the Y-coordinate of the center of gravity in the interval 1011 on the basis of a temporal change in the feature amount in an infrared video in the past, in which a whiteout has not occurred, instead of a temporal change in the feature amount in the visible video 331. When an estimated value of the Y-coordinate at the time t1 is Y1′, the position estimator 327 can generate an estimated difference region, for example, by arranging the object region such that the center of gravity of the object region is coincides with a point (X1,Y1′) in the difference infrared image.
Further, if the shape of a difference region is not used but only the feature amount of the difference region is used upon estimation of a three-dimensional position of the object 122, an estimated difference region does not always have to be generated. For example, when the feature amount of a difference region is a coordinate of a fingertip, it is possible to estimate a three-dimensional position of the fingertip without generating an estimated difference region. In this case, the feature amount of a difference region in a difference visible image at a current time or the feature amount of a difference region in a difference infrared image in the past can be used as complementary information without any change.
According to the position estimating processing of
When the strongly reflective region corresponds to the prescribed region (Step 1208, YES), the region determination unit 326 redetermines a similarity of a temporal change in the feature amount on the assumption that there exists a strongly reflective region in the difference region (Step 1209).
When the difference region includes a low brightness region (Step 1301, YES), the region determination unit 326 changes the difference region to a difference region only corresponding to a low brightness region (Step 1302). In the case of the difference region 932 of
Next, the feature amount calculator 324 calculates a feature amount of each difference region (Step 1303). When the Y-coordinate of a center of gravity of a difference region is used as the feature amount of a difference region after the change, as is the case with Step 1204 of
Thus, in Step 1303, the feature amount calculator 324 calculates the feature amount of each difference region in a different way than that of Step 1204. In this case, for example, an indicator that represents a temporal change in a representative position of a difference region can be used as the feature amount. The reason is that, even if the shape of a difference region is changed, a temporal change in the representative position is not changed as long as the difference region represents the same object before and after the change of the shape. The indicator that represents a temporal change in a representative position may be a difference between two coordinates at two successive times that represent a representative position of a difference region.
Next, the similarity determination unit 325 calculates a similarity between a temporal change in the feature amount in the visible video 331 and a temporal change in the feature amount in the infrared video 332 (Step 1304), and compares the similarity of a temporal change in the feature amount with a threshold TH2 (Step 1305).
When the similarity is greater than the threshold TH2 (Step 1305, YES), the position estimator 327 estimates a three-dimensional position of the object 122 using triangulation, by use of the difference region in the difference visible image and complementary information that complements an image in the strongly reflective region (Step 1306). On the other hand, when the similarity is not greater than the threshold TH2 (Step 1305, NO), the position estimator 327 does not perform a position estimation.
When the difference region does not include a low brightness region (Step 1301, NO), the position estimator 327 performs the process of Step 1306.
According to such redetermination processing, it is possible to compare again, when a whiteout occurs in a prescribed region, a temporal change in the feature amount in the visible video 331 with a temporal change in the feature amount in the infrared video 332, restricting to a difference region in which a whiteout does not occur. This permits a more accurate determination of a similarity, which results in an improved accuracy in a position estimation based on complementary information.
The position estimating system 301 of
The configurations of the position estimating system 101 of
When the position estimating device 314 of
The object 122 of
The flowcharts of
The prescribed region 502 of
The position estimating device 114 of
The information processing device of
The memory 1402 is, for example, a semiconductor memory such as a read only memory (ROM), a random access memory (RAM), and a flash memory, and stores therein a program and data used for performing the position estimation processing. The memory 1402 can be used as the storage 321 of
For example, the CPU 1401 (processor) operates as the video capturing unit 322, the state change detector 323, the feature amount calculator 324, the similarity determination unit 325, the region determination unit 326, and the position estimator 327 of
The input device 1403 is, for example, a keyboard or a pointing device, and is used for inputting instructions or information from an operator or a user. The output device 1404 is, for example, a display, a printer, or a speaker, and is used for outputting inquiries or instructions to the operator or the user, or outputting a result of processing. The result of processing may be a result of estimating a three-dimensional position of the object 122.
The auxiliary storage 1405 is, for example, a magnetic disk device, an optical disk device, a magneto-optical disk device, or a tape device. The auxiliary storage 1405 may be a hard disk drive. The information processing device stores the program and the data in the auxiliary storage 1405 so as to load them into the memory 1402 and use them. The auxiliary storage 1405 can be used as the storage 321 of
The medium driving device 1406 drives a portable recording medium 1409 so as to access the recorded content. The portable recording medium 1409 is, for example, a memory device, a flexible disk, an optical disc, or a magneto-optical disk. The portable recording medium 1409 may be, for example, a compact disk read only memory (CD-ROM), a digital versatile disk (DVD), or a universal serial bus (USB) memory. The operator or the user can store the program and the data in the portable recording medium 1409 so as to load them into the memory 1402 and use them.
As described above, a computer-readable recording medium that stores therein a program and data used for the position estimating processing is a physical (non-transitory) recording medium such as the memory 1402, the auxiliary storage 1405, and the portable storage medium 1409.
The network connecting device 1407 is a communication interface that is connected to a communication network such as a local area network or a wide area network and makes a data conversion associated with communication. The information processing device can receive the program and the data from an external device via the network connecting device 1407 so as to load them into the memory 1402 and use them. The information processing device can also receive a processing request from a user terminal, perform the position estimating processing, and transmit a result of processing to the user terminal.
The information processing device does not necessarily include all of the components in
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2015-190353 | Sep 2015 | JP | national |