The present invention relates to systems that track inventory items in an area of real space including inventory display structures.
Determining quantities and locations of different inventory items stocked in inventory display structures in an area of real space, such as a shopping store is required for efficient operation of the shopping store. Subjects in the area of real space, such as customers, take items from shelves and put the items in their respective shopping carts or baskets. Customers may also put items back on the same shelf, or another shelf, if they do not want to buy the item. Thus, over a period of time, the inventory items are taken off from their designated locations on shelves and can be dispersed to other shelves in the shopping store. In some systems, the quantity of stocked items is available after considerable delay as it requires consolidation of sale receipts with the stocked inventory. The delay in availability of information regarding quantities of items stocked in a shopping store can affect customers' purchase decisions as well as store management's action to order more quantities of inventory items that are in high demand.
It is desirable to provide a system that can more effectively and automatically provide, in real time, quantities of items stocked on shelves and also identify location of items on the shelves.
A system, and method for operating a system, are provided for tracking inventory item an area of real space. A plurality of cameras or other sensors produce respective sequences of images of corresponding fields of view in the real space. The system is coupled to the plurality of sensors and includes processing logic that uses the sequences of images produced by at least two sensors to identify inventory events. The system tracks inventory items in the area of real space in response to the inventory events.
A system and method are provided for tracking inventory items in an area of real space. A plurality of cameras or other sensors produces respective sequences of images of corresponding fields of view in the real space. The field of view of each sensor overlaps with the field of view of at least one other sensor in the plurality of sensors. The system uses the sequences of images produced by at least two sensors in the plurality of sensors to identify inventory events. The system tracks locations of inventory items in the area of real space in response to the inventory events.
In one embodiment, the inventory events include an item identifier, a put or take indicator, locations represented by positions along three axes of the area of real space, and a timestamp. The system can include or have access to memory storing a data set defining plurality of cells having coordinates in the area of real space. The system includes logic to match locations of inventory items with coordinates of cells and maintains a data representing inventory items matched with cells in the plurality of cells. The area of real space can include a plurality of inventory locations. The coordinates of cells in the plurality of cells can correlate with inventory locations or portions of inventory locations in the plurality of inventory locations. The system includes logic to calculate scores at a scoring time, for inventory items having locations matching particular cells using respective counts of inventory events. The logic that calculates scores for cells uses sums of puts and takes of inventory items weighted by a separation between timestamps of the puts and takes and the scoring time. The system includes the logic to store scores in the memory.
In one embodiment, the system includes logic to render a display image representing cells in the plurality of cells and the scores for the cells. In this embodiment, the scores are represented by variations in color in the display image representing cells. The system includes logic to select a set of inventory items for each cell based on the scores. In one embodiment, the area of real space includes a plurality of inventory locations, and the coordinates of cells in the plurality of cells correlate with inventory locations in the plurality of inventory locations. In this embodiment, a data set in memory defines a plurality of cells having coordinates in the area of real space.
The system can include or have access to memory storing a planogram identifying inventory locations in the area of real space and inventory items to be positioned on inventory locations. The planogram can also include information about portions of inventory locations designated for particular inventory items. The planogram can be produced based on a plan for the arrangement of inventory items on the inventory locations in the area of real space.
The system includes logic to maintain data representing inventory items matched with cells in the plurality of cells. The system can also include logic to determine misplaced items by comparing the data representing inventory items matched with cells with the planogram.
The system can generate and store in memory a data structure referred to herein as a “realogram,” identifying the locations of inventory items in the area of real space based on accumulation of data about the items identified in, and the locations of, the inventory events detected as discussed herein. The data in the realogram can be compared to data in a planogram, to determine how inventory items are disposed in the area compared to the plan, such as to locate misplaced items. Also, the realogram can be processed to locate inventory items in three dimensional cells, and correlate those cells with inventory locations in the store, such as can be determined from a planogram or other map of the inventory locations. Also, the realogram can be processed to track activity related to specific inventory items in different locations in the area. Other uses of realograms are possible as well.
A system and method are provided for tracking inventory items in an area of real space including inventory display structures. The system includes a plurality of cameras disposed above the inventory display structures. The cameras produce respective sequences of images of inventory display structures in corresponding fields of view in the real space. The field of view of each camera overlaps with the field of view of at least one other camera in the plurality of cameras. A data set defines a plurality of cells having coordinates in the area of real space. The data set is stored in memory. The system processes the sequences of images produced by the plurality of cameras to find locations of inventory events in three dimensions in the area of real space. In response to the inventory events, the system includes logic to determine a nearest cell in the data set based on the locations the inventory events. The system includes logic to system including logic that calculates scores at a scoring time for inventory items associated with the inventory events having locations matching particular cells using respective counts of inventory events.
In one embodiment, the system includes logic to select a set of inventory items for each cell based on the scores. In one embodiment, the inventory events include an item identifier, a put or a take indicator, locations represented by positions along three axes of the area of real space, and a timestamp. In one embodiment, the system includes data sets defining a plurality of cells represented as two dimensional grids having coordinates in the area of real space. The cells can correlate with portions of front plan of inventory locations. The processing system includes logic that determines the nearest cell based on the location of inventory event. In one embodiment, the system includes data set defining a plurality of cells represented as three dimensional grids having coordinates in the area of real space. The cells can correlate with portions of volume on inventory locations. The processing system includes logic that determines the nearest cell based on the location of inventory event. The put indicator identifies that the item is placed on an inventory location and the take indicator identifies that the item is taken off from the inventory location.
In one embodiment, the logic that processes the sequences of images produced by the plurality of cameras comprises image recognition engines. The image recognition engines generate data sets representing elements in the images corresponding to hands. The system includes logic that executes analysis of the data sets from sequences of images from at least two cameras to determine locations of inventory events in three dimensions. The image recognition engines comprise convolutional neural networks.
In one embodiment, the system includes logic that calculates scores for cells using sums of puts and takes of inventory items weighted by a separation between timestamps of the puts and takes and the scoring time. The scores are stored in the memory. In one embodiment, the logic to determine the nearest cell in the data set based on the location the inventory events includes calculating a distance from the location of the inventory event to cells in the data set and matching the inventory event with a cell based on the calculated distance.
Methods and computer program products which can be executed by computer systems are also described herein.
Functions described herein, including but not limited to identifying and linking an inventory event including the item associated with the inventory event to a cell in the plurality of cells having coordinates in the area of real space and of updating the store realogram present complex problems of computer engineering, relating for example to the type of image data to be processed, what processing of the image data to perform, and how to determine actions from the image data with high reliability.
Other aspects and advantages of the present invention can be seen on review of the drawings, the detailed description and the claims, which follow.
The following description is presented to enable any person skilled in the art to make and use the invention, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Thus, the present invention is not intended to be limited to the embodiments shown but is to be accorded the widest scope consistent with the principles and features disclosed herein.
System Overview
A system and various implementations of the subject technology is described with reference to
The discussion of
As used herein, a network node is an addressable hardware device or virtual device that is attached to a network, and is capable of sending, receiving, or forwarding information over a communications channel to or from other network nodes. Examples of electronic devices which can be deployed as hardware network nodes include all varieties of computers, workstations, laptop computers, handheld computers, and smartphones. Network nodes can be implemented in a cloud-based server system. More than one virtual device configured as a network node can be implemented using a single physical device.
For the sake of clarity, only three network nodes hosting image recognition engines are shown in the system 100. However, any number of network nodes hosting image recognition engines can be connected to the subject tracking engine 110 through the network(s) 181. Similarly, the image recognition engine, the subject tracking engine, the store inventory engine, the store realogram engine and other processing engines described herein can execute using more than one network node in a distributed architecture.
The interconnection of the elements of system 100 will now be described. Network(s) 181 couples the network nodes 101a, 101b, and 101n, respectively, hosting image recognition engines 112a, 112b, and 112n, the network node 104 hosting the store inventory engine 180, the network node 106 hosting the store realogram engine 190, the network node 102 hosting the subject tracking engine 110, the maps database 140, the inventory events database 150, the inventory database 160, and the realogram database 170. Cameras 114 are connected to the subject tracking engine 110 through network nodes hosting image recognition engines 112a, 112b, and 112n. In one embodiment, the cameras 114 are installed in a shopping store such that sets of cameras 114 (two or more) with overlapping fields of view are positioned over each aisle to capture images of real space in the store. In
Cameras 114 can be synchronized in time with each other, so that images are captured at the same time, or close in time, and at the same image capture rate. The cameras 114 can send respective continuous streams of images at a predetermined rate to network nodes hosting image recognition engines 112a-112n. Images captured in all the cameras covering an area of real space at the same time, or close in time, are synchronized in the sense that the synchronized images can be identified in the processing engines as representing different views of subjects having fixed positions in the real space. For example, in one embodiment, the cameras send image frames at the rates of 30 frames per second (fps) to respective network nodes hosting image recognition engines 112a-112n. Each frame has a timestamp, identity of the camera (abbreviated as “camera_id”), and a frame identity (abbreviated as “frame_id”) along with the image data. Other embodiments of the technology disclosed can use different types of sensors such as infrared image sensors, RF image sensors, ultrasound sensors, thermal sensors, Lidars, etc., to generate this data. Multiple types of sensors can be used, including for example ultrasound or RF sensors in addition to the cameras 114 that generate RGB color output. Multiple sensors can be synchronized in time with each other, so that frames are captured by the sensors at the same time, or close in time, and at the same frame capture rate. In all of the embodiments described herein sensors other than cameras, or sensors of multiple types, can be used to produce the sequences of images utilized.
Cameras installed over an aisle are connected to respective image recognition engines. For example, in
In one embodiment, each image recognition engine 112a, 112b, and 112n is implemented as a deep learning algorithm such as a convolutional neural network (abbreviated CNN). In such an embodiment, the CNN is trained using training database. In an embodiment described herein, image recognition of subjects in the real space is based on identifying and grouping joints recognizable in the images, where the groups of joints can be attributed to an individual subject. For this joints-based analysis, the training database has a large collection of images for each of the different types of joints for subjects. In the example embodiment of a shopping store, the subjects are the customers moving in the aisles between the shelves. In an example embodiment, during training of the CNN, the system 100 is referred to as a “training system.” After training the CNN using the training database, the CNN is switched to production mode to process images of customers in the shopping store in real time.
In an example embodiment, during production, the system 100 is referred to as a runtime system (also referred to as an inference system). The CNN in each image recognition engine produces arrays of joints data structures for images in its respective stream of images. In an embodiment as described herein, an array of joints data structures is produced for each processed image, so that each image recognition engine 112a-112n produces an output stream of arrays of joints data structures. These arrays of joints data structures from cameras having overlapping fields of view are further processed to form groups of joints, and to identify such groups of joints as subjects. The subjects can be identified and tracked by the system using an identifier “subject_id” during their presence in the area of real space.
The subject tracking engine 110, hosted on the network node 102 receives, in this example, continuous streams of arrays of joints data structures for the subjects from image recognition engines 112a-112n. The subject tracking engine 110 processes the arrays of joints data structures and translates the coordinates of the elements in the arrays of joints data structures corresponding to images in different sequences into candidate joints having coordinates in the real space. For each set of synchronized images, the combination of candidate joints identified throughout the real space can be considered, for the purposes of analogy, to be like a galaxy of candidate joints. For each succeeding point in time, movement of the candidate joints is recorded so that the galaxy changes over time. The output of the subject tracking engine 110 identifies subjects in the area of real space at a moment in time.
The subject tracking engine 110 uses logic to identify groups or sets of candidate joints having coordinates in real space as subjects in the real space. For the purposes of analogy, each set of candidate points is like a constellation of candidate joints at each point in time. The constellations of candidate joints can move over time. A time sequence analysis of the output of the subject tracking engine 110 over a period of time identifies movements of subjects in the area of real space.
In an example embodiment, the logic to identify sets of candidate joints comprises heuristic functions based on physical relationships amongst joints of subjects in real space. These heuristic functions are used to identify sets of candidate joints as subjects. The sets of candidate joints comprise individual candidate joints that have relationships according to the heuristic parameters with other individual candidate joints and subsets of candidate joints in a given set that has been identified, or can be identified, as an individual subject.
In the example of a shopping store the customers (also referred to as subjects above) move in the aisles and in open spaces. The customers take items from inventory locations on shelves in inventory display structures. In one example of inventory display structures, shelves are arranged at different levels (or heights) from the floor and inventory items are stocked on the shelves. The shelves can be fixed to a wall or placed as freestanding shelves forming aisles in the shopping store. Other examples of inventory display structures include, pegboard shelves, magazine shelves, lazy susan shelves, warehouse shelves, and refrigerated shelving units. The inventory items can also be stocked in other types of inventory display structures such as stacking wire baskets, dump bins, etc. The customers can also put items back on the same shelves from where they were taken or on another shelf.
The system includes the store inventory engine 180 (hosted on the network node 104) to update the inventory in inventory locations in the shopping store as customers put and take items from the shelves. The store inventory engine updates the inventory data structure of the inventory locations by indicating the identifiers (such as stock keeping units or SKUs) of inventory items placed on the inventory location. The inventory consolidation engine also updates the inventory data structure of the shopping store by updating their quantities stocked in the store. The inventory locations and store inventory data along with the customer's inventory data (also referred to as log data structure of inventory items or shopping cart data structure) are stored in the inventory database 160.
The store inventory engine 180 provides a status of the inventory items in inventory locations. It is difficult to determine at any moment in time, however, which inventory items are placed on what portion of the shelf. This is important information for the shopping store management and employees. The inventory items can be arranged in inventory locations according to a planogram which identifies the shelves and locations on the shelf where the inventory items are planned to be stocked. For example, a ketchup bottle may be stocked on a predetermined left portion of all shelves in an inventory display structure forming a column-wise arrangement. With the passage of time, customers take ketchup bottles from the shelves and place in their respective baskets or shopping carts. Some customers may put the ketchup bottles back on another portion of the same shelf in the same inventory display structure. The customers may also put back the ketchup bottles on shelves in other inventory display structures in the shopping store. The store realogram engine 190 (hosted on the network node 106) generates a realogram, which can be used to identify portions of shelves where the ketchup bottles are positioned at a time “t”. This information can be used by the system to generate notifications to employees with locations of misplaced ketchup bottles.
Also, this information can be used across the inventory items in the area of real space to generate a data structure, referred to as a realogram herein, that tracks locations in time of the inventory items in the area of real space. The realogram of the shopping store generated by the store realogram engine 190 reflecting the current status of inventory items, and in some embodiments, reflecting the status of inventory items at a specified times “t” over an interval of time, can be saved in the realogram database 170.
The actual communication path to the network nodes 104 hosting the store inventory engine 170 and the network node 106 hosting the store realogram engine 190 through the network 181 can be point-to-point over public and/or private networks. The communications can occur over a variety of networks 181, e.g., private networks, VPN, MPLS circuit, or Internet, and can use appropriate application programming interfaces (APIs) and data interchange formats, e.g., Representational State Transfer (REST), JavaScript™ Object Notation (JSON), Extensible Markup Language (XML), Simple Object Access Protocol (SOAP), Java™ Message Service (JMS), and/or Java Platform Module System. All of the communications can be encrypted. The communication is generally over a network such as a LAN (local area network), WAN (wide area network), telephone network (Public Switched Telephone Network (PSTN), Session Initiation Protocol (SIP), wireless network, point-to-point network, star network, token ring network, hub network, Internet, inclusive of the mobile Internet, via protocols such as EDGE, 3G, 4G LTE, Wi-Fi, and WiMAX. Additionally, a variety of authorization and authentication techniques, such as username/password, Open Authorization (OAuth), Kerberos, SecureID, digital certificates and more, can be used to secure the communications
The technology disclosed herein can be implemented in the context of any computer-implemented system including a database system, a multi-tenant environment, or a relational database implementation like an Oracle™ compatible database implementation, an IBM DB2 Enterprise Server™ compatible relational database implementation, a MySQL™ or PostgreSQL™ compatible relational database implementation or a Microsoft SQL Server™ compatible relational database implementation or a NoSQL™ non-relational database implementation such as a Vampire™ compatible non-relational database implementation, an Apache Cassandra™ compatible non-relational database implementation, a BigTable™ compatible non-relational database implementation or an HBase™ or DynamoDB™ compatible non-relational database implementation. In addition, the technology disclosed can be implemented using different programming models like MapReduce™, bulk synchronous programming, MPI primitives, etc. or different scalable batch and stream management systems like Apache Storm™, Apache Spark™, Apache Kafka™, Apache Flink™, Truviso™, Amazon Elasticsearch Service™, Amazon Web Services™ (AWS), IBM Info-Sphere™, Borealis™, and Yahoo! S4™.
Camera Arrangement
The cameras 114 are arranged to track multi-joint subjects (or entities) in a three dimensional (abbreviated as 3D) real space. In the example embodiment of the shopping store, the real space can include the area of the shopping store where items for sale are stacked in shelves. A point in the real space can be represented by an (x, y, z) coordinate system. Each point in the area of real space for which the system is deployed is covered by the fields of view of two or more cameras 114.
In a shopping store, the shelves and other inventory display structures can be arranged in a variety of manners, such as along the walls of the shopping store, or in rows forming aisles or a combination of the two arrangements.
In the example embodiment of the shopping store, the real space can include all of the floor 220 in the shopping store. Cameras 114 are placed and oriented such that areas of the floor 220 and shelves can be seen by at least two cameras. The cameras 114 also cover floor space in front of the shelves 202 and 204. Camera angles are selected to have both steep perspective, straight down, and angled perspectives that give more full body images of the customers. In one example embodiment, the cameras 114 are configured at an eight (8) foot height or higher throughout the shopping store.
In
Three Dimensional Scene Generation
A location in the real space is represented as a (x, y, z) point of the real space coordinate system. “x” and “y” represent positions on a two-dimensional (2D) plane which can be the floor 220 of the shopping store. The value “z” is the height of the point above the 2D plane at floor 220 in one configuration. The system combines 2D images from two or cameras to generate the three dimensional positions of joints and inventory events (puts and takes of items from shelves) in the area of real space. This section presents a description of the process to generate 3D coordinates of joints and inventory events. The process is also referred to as 3D scene generation.
Before using the system 100 in training or inference mode to track the inventory items, two types of camera calibrations: internal and external, are performed. In internal calibration, the internal parameters of the cameras 114 are calibrated. Examples of internal camera parameters include focal length, principal point, skew, fisheye coefficients, etc. A variety of techniques for internal camera calibration can be used. One such technique is presented by Zhang in “A flexible new technique for camera calibration” published in IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 22, No. 11, November 2000.
In external calibration, the external camera parameters are calibrated in order to generate mapping parameters for translating the 2D image data into 3D coordinates in real space. In one embodiment, one multi-joint subject, such as a person, is introduced into the real space. The multi-joint subject moves through the real space on a path that passes through the field of view of each of the cameras 114. At any given point in the real space, the multi-joint subject is present in the fields of view of at least two cameras forming a 3D scene. The two cameras, however, have a different view of the same 3D scene in their respective two-dimensional (2D) image planes. A feature in the 3D scene such as a left-wrist of the multi-joint subject is viewed by two cameras at different positions in their respective 2D image planes.
A point correspondence is established between every pair of cameras with overlapping fields of view for a given scene. Since each camera has a different view of the same 3D scene, a point correspondence is two pixel locations (one location from each camera with overlapping field of view) that represent the projection of the same point in the 3D scene. Many point correspondences are identified for each 3D scene using the results of the image recognition engines 112a to 112n for the purposes of the external calibration. The image recognition engines identify the position of a joint as (x, y) coordinates, such as row and column numbers, of pixels in the 2D image planes of respective cameras 114. In one embodiment, a joint is one of 19 different types of joints of the multi-joint subject. As the multi-joint subject moves through the fields of view of different cameras, the tracking engine 110 receives (x, y) coordinates of each of the 19 different types of joints of the multi-joint subject used for the calibration from cameras 114 per image.
For example, consider an image from a camera A and an image from a camera B both taken at the same moment in time and with overlapping fields of view. There are pixels in an image from camera A that correspond to pixels in a synchronized image from camera B. Consider that there is a specific point of some object or surface in view of both camera A and camera B and that point is captured in a pixel of both image frames. In external camera calibration, a multitude of such points are identified and referred to as corresponding points. Since there is one multi-joint subject in the field of view of camera A and camera B during calibration, key joints of this multi-joint subject are identified, for example, the center of left wrist. If these key joints are visible in image frames from both camera A and camera B then it is assumed that these represent corresponding points. This process is repeated for many image frames to build up a large collection of corresponding points for all pairs of cameras with overlapping fields of view. In one embodiment, images are streamed off of all cameras at a rate of 30 FPS (frames per second) or more and a resolution of 720 pixels in full RGB (red, green, and blue) color. These images are in the form of one-dimensional arrays (also referred to as flat arrays).
The large number of images collected above for a multi-joint subject are used to determine corresponding points between cameras with overlapping fields of view. Consider two cameras A and B with overlapping field of view. The plane passing through camera centers of cameras A and B and the joint location (also referred to as feature point) in the 3D scene is called the “epipolar plane”. The intersection of the epipolar plane with the 2D image planes of the cameras A and B defines the “epipolar line”. Given these corresponding points, a transformation is determined that can accurately map a corresponding point from camera A to an epipolar line in camera B's field of view that is guaranteed to intersect the corresponding point in the image frame of camera B. Using the image frames collected above for a multi-joint subject, the transformation is generated. It is known in the art that this transformation is non-linear. The general form is furthermore known to require compensation for the radial distortion of each camera's lens, as well as the non-linear coordinate transformation moving to and from the projected space. In external camera calibration, an approximation to the ideal non-linear transformation is determined by solving a non-linear optimization problem. This non-linear optimization function is used by the subject tracking engine 110 to identify the same joints in outputs (arrays of joint data structures) of different image recognition engines 112a to 112n, processing images of cameras 114 with overlapping fields of view. The results of the internal and external camera calibration are stored in the calibration database 170.
A variety of techniques for determining the relative positions of the points in images of cameras 114 in the real space can be used. For example, Longuet-Higgins published, “A computer algorithm for reconstructing a scene from two projections” in Nature, Volume 293, 10 Sep. 1981. This paper presents computing a three-dimensional structure of a scene from a correlated pair of perspective projections when spatial relationship between the two projections is unknown. Longuet-Higgins paper presents a technique to determine the position of each camera in the real space with respect to other cameras. Additionally, their technique allows triangulation of a multi-joint subject in the real space, identifying the value of the z-coordinate (height from the floor) using images from cameras 114 with overlapping fields of view. An arbitrary point in the real space, for example, the end of a shelf unit in one corner of the real space, is designated as a (0, 0, 0) point on the (x, y, z) coordinate system of the real space.
In an embodiment of the technology, the parameters of the external calibration are stored in two data structures. The first data structure stores intrinsic parameters. The intrinsic parameters represent a projective transformation from the 3D coordinates into 2D image coordinates. The first data structure contains intrinsic parameters per camera as shown below. The data values are all numeric floating point numbers. This data structure stores a 3×3 intrinsic matrix, represented as “K” and distortion coefficients. The distortion coefficients include six radial distortion coefficients and two tangential distortion coefficients. Radial distortion occurs when light rays bend more near the edges of a lens than they do at its optical center. Tangential distortion occurs when the lens and the image plane are not parallel. The following data structure shows values for the first camera only. Similar data is stored for all the cameras 114.
The second data structure stores per pair of cameras: a 3×3 fundamental matrix (F), a 3×3 essential matrix (E), a 3×4 projection matrix (P), a 3×3 rotation matrix (R) and a 3×1 translation vector (t). This data is used to convert points in one camera's reference frame to another camera's reference frame. For each pair of cameras, eight homography coefficients are also stored to map the plane of the floor 220 from one camera to another. A fundamental matrix is a relationship between two images of the same scene that constrains where the projection of points from the scene can occur in both images. Essential matrix is also a relationship between two images of the same scene with the condition that the cameras are calibrated. The projection matrix gives a vector space projection from 3D real space to a subspace. The rotation matrix is used to perform a rotation in Euclidean space. Translation vector “t” represents a geometric transformation that moves every point of a figure or a space by the same distance in a given direction. The homography_floor_coefficients are used to combine images of features of subjects on the floor 220 viewed by cameras with overlapping fields of views. The second data structure is shown below. Similar data is stored for all pairs of cameras. As indicated previously, the x's represents numeric floating point numbers.
Two Dimensional and Three Dimensional Maps
An inventory location, such as a shelf, in a shopping store can be identified by a unique identifier (e.g., shelf_id). Similarly, a shopping store can also be identified by a unique identifier (e.g., store_id). The two dimensional (2D) and three dimensional (3D) maps database 140 identifies inventory locations in the area of real space along the respective coordinates. For example, in a 2D map, the locations in the maps define two dimensional regions on the plane formed perpendicular to the floor 220 i.e., XZ plane as shown in
In a 3D map, the locations in the map define three dimensional regions in the 3D real space defined by X, Y, and Z coordinates. The map defines a volume for inventory locations where inventory items are positioned. In
In one embodiment, the map identifies a configuration of units of volume which correlate with portions of inventory locations on the inventory display structures in the area of real space. Each portion is defined by stating and ending positions along the three axes of the real space. Similar configuration of portions of inventory locations can also be generated using a 2D map of inventory locations dividing the front plan of the display structures.
Joints Data Structure
The image recognition engines 112a-112n receive the sequences of images from cameras 114 and process images to generate corresponding arrays of joints data structures. The system includes processing logic that uses the sequences of images produced by the plurality of camera to track locations of a plurality of subjects (or customers in the shopping store) in the area of real space. In one embodiment, the image recognition engines 112a-112n identify one of the 19 possible joints of a subject at each element of the image, usable to identify subjects in the area who may be taking and putting inventory items. The possible joints can be grouped in two categories: foot joints and non-foot joints. The 19th type of joint classification is for all non-joint features of the subject (i.e. elements of the image not classified as a joint). In other embodiments, the image recognition engine may be configured to identify the locations of hands specifically. Also, other techniques, such as a user check-in procedure or biometric identification processes, may be deployed for the purposes of identifying the subjects and linking the subjects with detected locations of their hands as they move throughout the store.
Foot Joints:
Non-Foot Joints:
Not a Joint
An array of joints data structures for a particular image classifies elements of the particular image by joint type, time of the particular image, and the coordinates of the elements in the particular image. In one embodiment, the image recognition engines 112a-112n are convolutional neural networks (CNN), the joint type is one of the 19 types of joints of the subjects, the time of the particular image is the timestamp of the image generated by the source camera 114 for the particular image, and the coordinates (x, y) identify the position of the element on a 2D image plane.
The output of the CNN is a matrix of confidence arrays for each image per camera. The matrix of confidence arrays is transformed into an array of joints data structures. A joints data structure 400 as shown in
A confidence number indicates the degree of confidence of the CNN in predicting that joint. If the value of confidence number is high, it means the CNN is confident in its prediction. An integer-Id is assigned to the joints data structure to uniquely identify it. Following the above mapping, the output matrix of confidence arrays per image is converted into an array of joints data structures for each image. In one embodiment, the joints analysis includes performing a combination of k-nearest neighbors, mixture of Gaussians, and various image morphology transformations on each input image. The result comprises arrays of joints data structures which can be stored in the form of a bit mask in a ring buffer that maps image numbers to bit masks at each moment in time.
Subject Tracking Engine
The tracking engine 110 is configured to receive arrays of joints data structures generated by the image recognition engines 112a-112n corresponding to images in sequences of images from cameras having overlapping fields of view. The arrays of joints data structures per image are sent by image recognition engines 112a-112n to the tracking engine 110 via the network(s) 181. The tracking engine 110 translates the coordinates of the elements in the arrays of joints data structures corresponding to images in different sequences into candidate joints having coordinates in the real space. A location in the real space is covered by the field of views of two or more cameras. The tracking engine 110 comprises logic to identify sets of candidate joints having coordinates in real space (constellations of joints) as subjects in the real space. In one embodiment, the tracking engine 110 accumulates arrays of joints data structures from the image recognition engines for all the cameras at a given moment in time and stores this information as a dictionary in a subject database, to be used for identifying a constellation of candidate joints. The dictionary can be arranged in the form of key-value pairs, where keys are camera ids and values are arrays of joints data structures from the camera. In such an embodiment, this dictionary is used in heuristics-based analysis to determine candidate joints and for assignment of joints to subjects. In such an embodiment, a high-level input, processing and output of the tracking engine 110 is illustrated in table 1. Details of the logic applied by the subject tracking engine 110 to create subjects by combining candidate joints and track movement of subjects in the area of real space are presented in U.S. Pat. No. 10,055,853, issued 21 Aug. 2018, titled, “Subject Identification and Tracking Using Image Recognition Engine” which is incorporated herein by reference.
Subject Data Structure
The subject tracking engine 110 uses heuristics to connect joints of subjects identified by the image recognition engines 112a-112n. In doing so, the subject tracking engine 110 creates new subjects and updates the locations of existing subjects by updating their respective joint locations. The subject tracking engine 110 uses triangulation techniques to project the locations of joints from 2D space coordinates (x, y) to 3D real space coordinates (x, y, z).
In one embodiment, the system identifies joints of a subject and creates a skeleton of the subject. The skeleton is projected into the real space indicating the position and orientation of the subject in the real space. This is also referred to as “pose estimation” in the field of machine vision. In one embodiment, the system displays orientations and positions of subjects in the real space on a graphical user interface (GUI). In one embodiment, the subject identification and image analysis are anonymous, i.e., a unique identifier assigned to a subject created through joints analysis does not identify personal identification information of the subject as described above.
For this embodiment, the joints constellation of an identified subject, produced by time sequence analysis of the joints data structures, can be used to locate the hand of the subject. For example, the location of a wrist joint alone, or a location based on a projection of a combination of a wrist joint with an elbow joint, can be used to identify the location of hand of an identified subject.
Inventory Events
The data sets comprising subjects identified by joints in subject data structures 500 and corresponding image frames from sequences of image frames per camera are given as input to a bounding box generator. The bounding box generator implements the logic to process the data sets to specify bounding boxes which include images of hands of identified subjects in images in the sequences of images. The bounding box generator identifies locations of hands in each source image frame per camera using for example, locations of wrist joints (for respective hands) and elbow joints in the multi-joints data structures 500 corresponding to the respective source image frame. In one embodiment, in which the coordinates of the joints in subject data structure indicate location of joints in 3D real space coordinates, the bounding box generator maps the joint locations from 3D real space coordinates to 2D coordinates in the image frames of respective source images.
The bounding box generator creates bounding boxes for hands in image frames in a circular buffer per camera 114. In one embodiment, the bounding box is a 128 pixels (width) by 128 pixels (height) portion of the image frame with the hand located in the center of the bounding box. In other embodiments, the size of the bounding box is 64 pixels×64 pixels or 32 pixels×32 pixels. For m subjects in an image frame from a camera, there can be a maximum of 2 m hands, thus 2 m bounding boxes. However, in practice fewer than 2 m hands are visible in an image frame because of occlusions due to other subjects or other objects. In one example embodiment, the hand locations of subjects are inferred from locations of elbow and wrist joints. For example, the right hand location of a subject is extrapolated using the location of the right elbow (identified as p1) and the right wrist (identified as p2) as extrapolation_amount*(p2−p1)+p2 where extrapolation_amount equals 0.4. In another embodiment, the joints CNN 112a-112n are trained using left and right hand images. Therefore, in such an embodiment, the joints CNN 112a-112n directly identify locations of hands in image frames per camera. The hand locations per image frame are used by the bounding box generator to create a bounding box per identified hand.
WhatCNN is a convolutional neural network trained to process the specified bounding boxes in the images to generate a classification of hands of the identified subjects. One trained WhatCNN processes image frames from one camera. In the example embodiment of the shopping store, for each hand in each image frame, the WhatCNN identifies whether the hand is empty. The WhatCNN also identifies a SKU (stock keeping unit) number of the inventory item in the hand, a confidence value indicating the item in the hand is a non-SKU item (i.e. it does not belong to the shopping store inventory) and a context of the hand location in the image frame.
The outputs of WhatCNN models for all cameras 114 are processed by a single WhenCNN model for a pre-determined window of time. In the example of a shopping store, the WhenCNN performs time series analysis for both hands of subjects to identify whether a subject took a store inventory item from a shelf or put a store inventory item on a shelf. The technology disclosed uses the sequences of images produced by at least two cameras in the plurality of cameras to find a location of an inventory event. The WhenCNN executes analysis of data sets from sequences of images from at least two cameras to determine locations of inventory events in three dimensions and to identify item associated with the inventory event. A time series analysis of the output of WhenCNN per subject over a period of time is performed to identify inventory events and their time of occurrence. A non-maximum suppression (NMS) algorithm is used for this purpose. As one inventory event (i.e. put or take of an item by a subject) is detected by WhenCNN multiple times (both from the same camera and from multiple cameras), the NMS removes superfluous events for a subject. NMS is a rescoring technique comprising two main tasks: “matching loss” that penalizes superfluous detections and “joint processing” of neighbors to know if there is a better detection close-by.
The true events of takes and puts for each subject are further processed by calculating an average of the SKU logits for 30 image frames prior to the image frame with the true event. Finally, the arguments of the maxima (abbreviated arg max or argmax) are used to determine the largest value. The inventory item classified by the argmax value is used to identify the inventory item put on the shelf or taken from the shelf. The technology disclosed attributes the inventory event to a subject by assigning the inventory item associated with the inventory to a log data structure (or shopping cart data structure) of the subject. The inventory item is added to a log of SKUs (also referred to as shopping cart or basket) of respective subjects. The image frame identifier “frame_id,” of the image frame which resulted in the inventory event detection is also stored with the identified SKU. The logic to attribute the inventory event to the customer matches the location of the inventory event to a location of one of the customers in the plurality of customers. For example, the image frame can be used to identify 3D position of the inventory event, represented by the position of the subject's hand in at least one point of time during the sequence that is classified as an inventory event using the subject data structure 500, which can be then used to determine the inventory location from where the item was taken from or put on. The technology disclosed uses the sequences of images produced by at least two cameras in the plurality of cameras to find a location of an inventory event and creates an inventory event data structure. In one embodiment, the inventory event data structure stores item identifier, a put or take indicator, coordinates in three dimensions of the area of real space and a time stamp. In one embodiment, the inventory events are stored in the inventory events database 150.
The locations of inventory events (puts and takes of inventory items by subjects in an area of space) can be compared with a planogram or other map of the store to identify an inventory location, such as a shelf, from which the subject has taken the item or placed the item on. An illustration 660 shows the determination of a shelf in a shelf unit by calculating a shortest distance from the position of the hand 640 associated with the inventory event. This determination of shelf is then used to update the inventory data structure of the shelf. An example inventory data structure 700 (also referred to as a log data structure) shown in
When the shelf inventory data structure is consolidated with the subject's log data structure, the shelf inventory is reduced to reflect the quantity of item taken by the customer from the shelf. If the item was put on the shelf by a customer or an employee stocking items on the shelf, the items get added to the respective inventory locations' inventory data structures. Over a period of time, this processing results in updates to the shelf inventory data structures for all inventory locations in the shopping store. Inventory data structures of inventory locations in the area of real space are consolidated to update the inventory data structure of the area of real space indicating the total number of items of each SKU in the store at that moment in time. In one embodiment, such updates are performed after each inventory event. In another embodiment, the store inventory data structures are updated periodically.
Detailed implementation of the implementations of WhatCNN and WhenCNN to detect inventory events is presented in U.S. patent application Ser. No. 15/907,112, filed 27 Feb. 2018, titled, “Item Put and Take Detection Using Image Recognition” which is incorporated herein by reference as if fully set forth herein.
Realtime Shelf and Store Inventory Update
The system can use the location of the hand of the subject (step 806) associated with the inventory event to locate a nearest shelf in an inventory display structure (also referred to as a shelf unit above) at step 808. The store inventory engine 180 calculates distance of the hand to two dimensional (2D) regions or areas on xz planes (perpendicular to the floor 220) of inventory locations in the shopping store. The 2D regions of the inventory locations are stored in the map database 140 of the shopping store. Consider the hand is represented by a point E (xevent, yevent, zevent) in the real space. The shortest distance D from a point E in the real space to any point P on the plane can be determined by projecting the vector PE on a normal vector n to the plane. Existing mathematical techniques can be used to calculate the distance of the hand to all planes representing 2D regions of inventory locations.
In one embodiment, the technology disclosed matches location of the inventory event with an inventory location by executing a procedure including calculating a distance from the location of the inventory event to inventory locations on inventory display structures and matching the inventory event with an inventory location based on the calculated distance. For example, the inventory location (such as a shelf) with the shortest distance from the location of the inventory event is selected and this shelf's inventory data structure is updated at step 810. In one embodiment, the location of the inventory events is determined by position of the hand of the subject along three coordinates of the real space. If the inventory event is a take event (or a minus event) indicating a bottle of ketchup is taken by the subject, the shelf's inventory is updated by decreasing the number of ketchup bottles by one. Similarly, if the inventory event is a put event indicating a subject put a bottle of ketchup on the shelf, the shelf's inventory is updated by increasing the number of ketchup bottles by one. Similarly, the store's inventory data structure is also updated accordingly. The quantities of items put on the inventory locations are incremented by the same number in the store inventory data structure. Likewise, the quantities of items taken from the inventory locations are subtracted from the store's inventory data structure in the inventory database 160.
At step 812, it is checked if a planogram is available for the shopping store, or alternatively the planogram can be known to be available. A planogram is a data structure that maps inventory items to inventory locations in the shopping store, which can be based on a plan for distribution of inventory items in the store. If the planogram for the shopping store is available, the item put on the shelf by the subject is compared with the items on the shelf in the planogram at step 814. In one embodiment, the technology disclosed includes logic to determine misplaced items if the inventory event is matched with an inventory location that does not match the planogram. For example, If the SKU of the item associated with the inventory event matches distribution of inventory items in the inventory locations, the location of the item is correct (step 816), otherwise the item is misplaced. In one embodiment, a notification is sent to an employee in step 818 to take the misplaced item from the current inventory location (such as a shelf) and move it to its correct inventory location according to the planogram. The system checks if the subject is exiting the shopping store at step 820 by using the speed, orientation and proximity to the store exit. If the subject is not existing from the store (step 820), the process continues at step 804. Otherwise, if it is determined that the subject is exiting the store, the log data structure (or the shopping cart data structure) of the subject, and the store's inventory data structures are consolidated at step 822.
In one embodiment, the consolidation includes subtracting the items in subject's shopping cart data structure from the store inventory data structure if these items are not subtracted from the store inventory at the step 810. At this step, the system can also identify items in the shopping cart data structure of a subject that have low identification confidence scores and send a notification to a store employee positioned near the store exit. The employee can then confirm the items with low identification confidence scores in shopping cart of the customer. The process does not require the store employee to compare all items in the shopping cart of the customer with the customer's shopping cart data structure, only the item that has a low confidence score is identified by the system to the store employee which is then confirmed by the store employee. The process ends at step 824.
Architecture for Realtime Shelf and Store Inventory Update
An example architecture of a system in which customer inventory, inventory location (e.g. shelf) inventory and the store inventory (e.g. store wide) data structures are updated using the puts and takes of items by customers in the shopping store is presented in
A “subject identification” subsystem 904 (also referred to as first image processors) processes image frames received from cameras 114 to identify and track subjects in the real space. The first image processors include subject image recognition engines to detect joints of subjects in the area of real. The joints are combined to form subjects which are then tracked as the move in the area of real space. The subjects are anonymous and are tracked using an internal identifier “subject_id”.
A “region proposals” subsystem 908 (also referred to as third image processors) includes foreground image recognition engines, receives corresponding sequences of images from the plurality of cameras 114, and recognizes semantically significant objects in the foreground (i.e. customers, their hands and inventory items) as they relate to puts and takes of inventory items, for example, over time in the images from each camera. The region proposals subsystem 908 also receives output of the subject identification subsystem 904. The third image processors process sequences of images from cameras 114 to identify and classify foreground changes represented in the images in the corresponding sequences of images. The third image processors process identified foreground changes to make a first set of detections of takes of inventory items by identified subjects and of puts of inventory items on inventory display structures by identified subjects. In one embodiment, the third image processors comprise convolutional neural network (CNN) models such as WhatCNNs described above. The first set of detections are also referred to as foreground detection of puts and takes of inventory items. In this embodiment, the outputs of WhatCNNs are processed a second convolutional neural network (WhenCNN) to make the first set of detections which identify put events of inventory items on inventory locations and take events of inventory items on inventory locations in inventory display structures by customers and employees of the store. The details of a region proposal subsystem are presented in U.S. patent application Ser. No. 15/907,112, filed 27 Feb. 2018, titled, “Item Put and Take Detection Using Image Recognition” which is incorporated herein by reference as if fully set forth herein.
In another embodiment, the architecture includes a “semantic diffing” subsystem (also referred to as second image processors) that can be used in parallel to the third image processors to detect puts and takes of inventory items and to associate these puts and takes to subjects in the shopping store. This semantic diffing subsystem includes background image recognition engines, which receive corresponding sequences of images from the plurality of cameras and recognize semantically significant differences in the background (i.e. inventory display structures like shelves) as they relate to puts and takes of inventory items, for example, over time in the images from each camera. The second image processors receive output of the subject identification subsystem 904 and image frames from cameras 114 as input. Details of “semantic diffing” subsystem are presented in U.S. Pat. No. 10,127,438, filed 4 Apr. 2018, titled, “Predicting Inventory Events using Semantic Diffing,” and U.S. patent application Ser. No. 15/945,473, filed 4 Apr. 2018, titled, “Predicting Inventory Events using Foreground/Background Processing,” both of which are incorporated herein by reference as if fully set forth herein. The second image processors process identified background changes to make a second set of detections of takes of inventory items by identified subjects and of puts of inventory items on inventory display structures by identified subjects. The second set of detections are also referred to as background detections of puts and takes of inventory items. In the example of a shopping store, the second detections identify inventory items taken from the inventory locations or put on the inventory locations by customers or employees of the store. The semantic diffing subsystem includes the logic to associate identified background changes with identified subjects.
In such an embodiment, the system described in
A subject exit detection engine 910 determines if a customer is moving towards the exit door and sends a signal to the store inventory engine 190. The store inventory engine determines if one or more items in the log data structure 700 of the customer has a low identification confidence score as determined by the second or third image processors. If so, the inventory consolidation engine sends a notification to a store employee positioned close to the exit to confirm the item purchased by the customer. The inventory data structures of the subjects, inventory locations and the shopping stores are stored in the inventory database 160.
The technology disclosed uses the sequences of images produced by the plurality of cameras to detect departure of the customer from the area of real space. In response to the detection of the departure of the customer, the technology disclosed updates the store inventory in the memory for items associated with inventory events attributed to the customer. When the exit detection engine 910 detects departure of customer “C” from the shopping store, the items purchased by the customer “C” as shown in the log data structure 922 are consolidated with the inventory data structure of the store 924 to generate an updated store inventory data structure 926. For example, as shown in
In one embodiment, the departure detection of the customer, also triggers updating of the inventory data structures of the inventory locations (such as shelves in the shopping store) from where the customer has taken items. In such an embodiment, the inventory data structures of the inventory locations are not updated immediately after the take or a put inventory event as described above. In this embodiment, when the system detects the departure of customer, the inventory events associated with the customer are traversed linking the inventory events with respective inventory locations in the shopping store. The inventory data structures of the inventory locations determined by this process are updated. For example, if the customer has taken two quantities of item 1 from inventory location 27, then, the inventory data structure of inventory location 27 is updated by reducing the quantity of item 1 by two. Note that, an inventory item can be stocked on multiple inventory locations in the shopping store. The system identifies the inventory location corresponding to the inventory event and therefore, the inventory location from where the item is taken is updated.
Store Realograms
The locations of inventory items throughout the real space in a store, including at inventory locations in the shopping store, change over a period of time as customers take items from the inventory locations and put the items that they do not want to buy, back on the same location on the same shelf from which the item is taken, a different location on the same shelf from which the item is taken, or on a different shelf. The technology disclosed uses the sequences of images produced by at least two cameras in the plurality of cameras to identify inventory events, and in in response to the inventory events, tracks locations of inventory items in the area of real space. The items in a shopping store are arranged in some embodiments according to a planogram which identifies the inventory locations (such as shelves) on which a particular item is planned to be placed. For example, as shown in an illustration 910 in
The technology disclosed can calculate a “realogram” of the shopping store at any time “t” which is the real time map of locations of inventory items in the area of real space, which can be correlated in addition in some embodiments with inventory locations in the store. A realogram can be used to create a planogram by identifying inventory items and a position in the store, and mapping them to inventory locations. In an embodiment, the system or method can create a data set defining a plurality of cells having coordinates in the area of real space. The system or method can divide the real space into a data set defining a plurality of cells using the length of the cells along the coordinates of the real space as an input parameter. In one embodiment, the cells are represented as two dimensional grids having coordinates in the area of real space. For example, the cells can correlate with 2D grids (e.g. at 1 foot spacing) of front plan of inventory locations in shelf units (also referred to as inventory display structures) as shown in the illustration 960 in
In another embodiment, the cells are represented as three dimensional (3D) grids having coordinates in the area of real space. In one example, the cells can correlate with volume on inventory locations (or portions of inventory locations) in shelf units in the shopping store as shown in
The illustration in
The system calculates SKU scores (also referred as scores) at a scoring time, for inventory items having locations matching particular cells using respective counts of inventory event. Calculation of scores for cells uses sums of puts and takes of inventory items weighted by a separation between timestamps of the puts and takes and the scoring time. In one embodiment, the scores are weighted averages of the inventory events per SKU. In other embodiments, different scoring calculations can be used such as a count of inventory events per SKU. In one embodiment, the system displays the realogram as an image representing cells in the plurality of cells and the scores for the cells. For example illustration in
In one embodiment, the system renders a display image representing cells in the plurality of cells and the scores for the cells.
Calculation of Store Realogram
The system uses the location of hand of the subject (step 1206) associated with the inventory event to determine a location. In some embodiments, the inventory event can be matched with a nearest shelf, or otherwise likely inventory location, in a shelf unit or an inventory display structure in step 1208. The process step 808 in the flowchart in
The technology disclosed includes a data set stored in memory defining a plurality of cells having coordinates in the area of real space. The cells define regions in the area of real space bounded by starting and ending positions along the coordinate axes. The area of real space includes a plurality of inventory locations, and the coordinates of cells in the plurality of cells can be correlated with inventory locations in the plurality of inventory locations. The technology disclosed matches locations of inventory items, associated with inventory events, with coordinates of cells and maintains a data representing inventory items matched with cells in the plurality of cells. In one embodiment, the system determines the nearest cell in the data set based on the location the inventory events by executing a procedure (such as described in step 808 in the flowchart of
The SKU score calculated by equation (1) is the sum of scores for all point cloud data points of the SKU in the cell such that each data point is weighted down by the time point_t in days since the timestamp of the put and take event. Consider there are two point cloud data points for “ketchup” item in a grid. The first data point has a timestamp which indicates that this inventory event occurred two days before the time “t” at which the realogram is calculated, therefore the value of point_t is “2”. The second data point corresponds to an inventory event that occurred one day before the time “t”, therefore, point_t is “1”. The score of ketchup for the cell (identified by a cell_id which maps to a shelf identified by a shelf_id) is calculated as:
As the point cloud data points corresponding to inventory events become older (i.e. more days have passed since the event) their contribution to the SKU score decreases. At step 1216, a top “N” SKUs are selected for a cell with the highest SKU scores. In one embodiment, the system includes logic to select a set of inventory items for each cell based on the scores. For example, the value of “N” can be selected as 10 (ten) to select top ten inventory items per call based on their SKU scores. In this embodiment, the realogram stores top ten items per cell. The updated realogram at time t is then stored in step 1218 in the realogram database 170 which indicates top “N” SKUs per cell in a shelf at time t. The process ends at step 1220.
In another embodiment, the technology disclosed does not use 2D or 3D maps of portions of shelves stored in the maps database 140 to calculate point cloud data in portions of shelves corresponding to inventory events. In this embodiment, the 3D real space representing a shopping store is partitioned in cells represented as 3D cubes (e.g., 1 foot cubes). The 3D hand positions are mapped to the cells (using their respective positions along the three axes). The SKU scores for all items are calculated per cell using equation 1 as explained above. The resulting realogram shows items in cells in the real space representing the store without requiring the positions of shelves in the store. In this embodiment, a point cloud data point may be at the same position on the coordinates in the real space as the hand position corresponding to the inventory event, or at the location of a cell in the area close to or encompassing the hand position. This is because there may be no map of shelves therefore; the hand positions are not mapped to the nearest shelf. Because of this, the point cloud data points in this embodiment are not necessarily co-planar. All point cloud data points within the unit of volume (e.g., 1 foot cube) in the real space are included in calculation of SKU scores.
In some embodiments, the realogram can be computed iteratively, and used for time of day analysis of activity in the store, or used to produce animation (like stop motion animation) for display of the movement of inventory items in the store over time.
Applications of Store Realogram
A store realogram can be used in many operations of the shopping store. A few applications of the realogram are presented in the following paragraphs.
Re-Stocking of Inventory Items
If the SKU score of inventory item ‘i” is less than the threshold, an alert notification is sent to store manager or other designated employees indicating inventory item ‘i” needs to be re-stocked (step 1310). The system can also identify the inventory locations at which the inventory item needs to be re-stocked by matching the cells with SKU score below threshold to inventory locations. In other embodiments, the system can check the stock level of inventory item ‘i” in stock room of the shopping store to determine if inventory item ‘i” needs to be ordered from a distributor. The process ends at step 1312.
Misplaced Inventory Items
In embodiments including planograms, or if a planogram of the store is otherwise available, then the realogram is compared with the planogram for planogram compliance by identifying misplaced items. In such an embodiment, the system includes a planogram specifying a distribution of inventory items in inventory locations in the area of real space. The system includes logic to maintain data representing inventory items matched with cells in the plurality of cells. The system determines misplaced items by comparing the data representing inventory matched with cells to the distribution of inventory items in the inventory locations specified in the planogram.
In one embodiment, the store app displays location of items on a store map and guides the store employee to the misplaced item. Following this, the store app displays the correct location of the item on the store map and can guide the employee to the correct shelf portion to put the item in its designated location. In another embodiment, the store app can also guide a customer to an inventory item based on a shopping list entered in the store app. The store app can use real time locations of the inventory items using the realogram and guide the customer to a nearest inventory location with the inventory item on a map. In this example, the nearest location of an inventory item can be of a misplaced item which is not positioned on the inventory location according to the store planogram.
Improving Inventory Item Prediction Accuracy
Another application of realogram is in improving prediction of inventory items by the image recognition engine. The flowchart in
The realogram for inventory item “i” at scoring time “t” is retrieved at step 1510. In one example, this can be a most recent realogram while in another example, a realogram at a scoring time ‘t” matching or closer in time to the time of the inventory event can be retrieved from the realogram database 170. The SKU score of the inventory item “i” at the location of the inventory event is compared with a threshold at a step 1512. If the SKU score is above the threshold (step 1514), the prediction of inventory item “i” by the image recognition is accepted (step 1516). The log data structure of the customer associated with the inventory event is updated accordingly. If the inventory event is a “take” event, the inventory item “i” is added to the log data structure of the customer. If the inventory event is a “put” event, the inventory item “i” is removed from the log data structure of the customer. If the SKU score below the threshold (step 1514), the prediction of the image recognition engine is rejected (step 1518). If the inventory event is a “take” event, this will result in the inventory item “i” not added to the log data structure of the customer. Similarly, if the inventory event is a “put” event, the inventory item “i” is not removed from the log data structure of the customer. The process ends at step 1520. In another embodiment, the SKU score of the inventory item “i” can be used to adjust an input parameter to the image recognition engine for determining item prediction confidence score. A WhatCNN, which is a convolutional neural network (CNN), is an example of an image recognition engine to predict inventory items.
Network Configuration
Storage subsystem 1630 stores the basic programming and data constructs that provide the functionality of certain embodiments of the present invention. For example, the various modules implementing the functionality of the store realogram engine 190 may be stored in storage subsystem 1630. The storage subsystem 1630 is an example of a computer readable memory comprising a non-transitory data storage medium, having computer instructions stored in the memory executable by a computer to perform all or any combination of the data processing and image processing functions described herein, including logic to logic to calculate realograms for the area of real space by processes as described herein. In other examples, the computer instructions can be stored in other types of memory, including portable memory, that comprise a non-transitory data storage medium or media, readable by a computer.
These software modules are generally executed by a processor subsystem 1650. A host memory subsystem 1632 typically includes a number of memories including a main random access memory (RAM) 1634 for storage of instructions and data during program execution and a read-only memory (ROM) 1636 in which fixed instructions are stored. In one embodiment, the RAM 1634 is used as a buffer for storing point cloud data structure tuples generated by the store realogram engine 190.
A file storage subsystem 1640 provides persistent storage for program and data files. In an example embodiment, the storage subsystem 1640 includes four 120 Gigabyte (GB) solid state disks (SSD) in a RAID 0 (redundant array of independent disks) arrangement identified by a numeral 1642. In the example embodiment, maps data in the maps database 140, inventory events data in the inventory events database 150, inventory data in the inventory database 160, and realogram data in the realogram database 170 which is not in RAM is stored in RAID 0. In the example embodiment, the hard disk drive (HDD) 1646 is slower in access speed than the RAID 0 1642 storage. The solid state disk (SSD) 1644 contains the operating system and related files for the store realogram engine 190.
In an example configuration, four cameras 1612, 1614, 1616, 1618, are connected to the processing platform (network node) 103. Each camera has a dedicated graphics processing unit GPU 11662, GPU 21664, GPU 31666, and GPU 41668, to process images sent by the camera. It is understood that fewer than or more than three cameras can be connected per processing platform. Accordingly, fewer or more GPUs are configured in the network node so that each camera has a dedicated GPU for processing the image frames received from the camera. The processor subsystem 1650, the storage subsystem 1630 and the GPUs 1662, 1664, and 1666 communicate using the bus subsystem 1654.
A network interface subsystem 1670 is connected to the bus subsystem 1654 forming part of the processing platform (network node) 104. Network interface subsystem 1670 provides an interface to outside networks, including an interface to corresponding interface devices in other computer systems. The network interface subsystem 1670 allows the processing platform to communicate over the network either by using cables (or wires) or wirelessly. A number of peripheral devices such as user interface output devices and user interface input devices are also connected to the bus subsystem 1654 forming part of the processing platform (network node) 104. These subsystems and devices are intentionally not shown in
In one embodiment, the cameras 114 can be implemented using Chameleon3 1.3 MP Color USB3 Vision (Sony ICX445), having a resolution of 1288×964, a frame rate of 30 FPS, and at 1.3 MegaPixels per image, with Varifocal Lens having a working distance (mm) of 300-∞, a field of view field of view with a ⅓″ sensor of 98.2°-23.8.
The technology described herein also includes a system for tracking inventory items in an area of real space including inventory display structures, along with a corresponding method and computer program product, comprises a plurality of cameras or other sensors disposed above the inventory display structures, sensors in the plurality of sensors producing respective sequences of images of inventory display structures in corresponding fields of view in the real space, the field of view of each sensor overlapping with the field of view of at least one other sensor in the plurality of sensors; memory storing a data set, the data set defining a plurality of cells having coordinates in the area of real space; and a processing system coupled to the plurality of sensors, the processing system including logic that processes the sequences of images produced by the plurality of sensors to find locations of inventory events in three dimensions in the area of real space and in response to the inventory events determines a nearest cell in the data set based on the locations the inventory events, the processing system including logic that calculates scores at a scoring time for inventory items associated with the inventory events having locations matching particular cells using respective counts of inventory events. The system can include logic to select a set of inventory items for each cell based on the scores. The inventory events can include an item identifier, a put or a take indicator, locations represented by positions along three axes of the area of real space, and a timestamp. The system can include the data set defining a plurality of cells represented as two dimensional grids having coordinates in the area of real space, the cells correlate with portions of front plan of inventory locations, the processing system including logic that determines the nearest cell based on the location of inventory event. The system can include the data set defining a plurality of cells represented as three dimensional grids having coordinates in the area of real space, the cells correlate with portions of volume on inventory locations, the processing system including logic that determines the nearest cell based on the location of inventory event. The put indicator can identifies that the item is placed on an inventory location and the take indicator identifies that the item is taken off from the inventory location. The logic that processes the sequences of images produced by the plurality of sensors can comprise image recognition engines which generate data sets representing elements in the images corresponding to hands, and executes analysis of the data sets from sequences of images from at least two sensors to determine locations of inventory events in three dimensions. The image recognition engines can comprise convolutional neural networks. The logic that calculates scores for cells can use sums of puts and takes of inventory items weighted by a separation between timestamps of the puts and takes and the scoring time, and to store the scores in the memory. The logic to determine the nearest cell in the data set based on the location the inventory events can execute a procedure including calculating a distance from the location of the inventory event to cells in the data set and matching the inventory event with a cell based on the calculated distance.
Any data structures and code described or referenced above are stored according to many implementations in computer readable memory, which comprises a non-transitory computer-readable storage medium, which may be any device or medium that can store code and/or data for use by a computer system. This includes, but is not limited to, volatile memory, non-volatile memory, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs), DVDs (digital versatile discs or digital video discs), or other media capable of storing computer-readable media now known or later developed.
The preceding description is presented to enable the making and use of the technology disclosed. Various modifications to the disclosed implementations will be apparent, and the general principles defined herein may be applied to other implementations and applications without departing from the spirit and scope of the technology disclosed. Thus, the technology disclosed is not intended to be limited to the implementations shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein. The scope of the technology disclosed is defined by the appended claims.
This application is a continuation of U.S. patent application Ser. No. 16/679,035, filed 8 Nov. 2019 (now U.S. Pat. No. 11,295,270), which is a continuation of U.S. patent application Ser. No. 16/256,355 filed 24 Jan. 2019 (now U.S. Pat. No. 10,474,991), which claims the benefit of U.S. Provisional Patent Application No. 62/703,785 filed 26 Jul. 2018. U.S. patent application Ser. No. 16/256,355 is also a continuation-in-part of U.S. patent application Ser. No. 15/945,473 filed 4 Apr. 2018 (now U.S. Pat. No. 10,474,988), which is a continuation-in-part of U.S. patent Ser. No. 15/907,112 filed 27 Feb. 2018, (now U.S. Pat. No. 10,133,933), which is a continuation-in-part of U.S. patent application Ser. No. 15/847,796, filed 19 Dec. 2017 (now U.S. Pat. No. 10,055,853), which claims benefit of U.S. Provisional Patent Application No. 62/542,077 filed 7 Aug. 2017, which applications are incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
1037779 | Kusebauch | Sep 1912 | A |
4746830 | Holland | May 1988 | A |
5734722 | Halpern | Mar 1998 | A |
5745036 | Clare | Apr 1998 | A |
6154559 | Beardsley | Nov 2000 | A |
6561417 | Gadd | May 2003 | B1 |
6584375 | Bancroft et al. | Jun 2003 | B2 |
6768102 | Skoll | Jul 2004 | B1 |
7050624 | Dialameh et al. | May 2006 | B2 |
7050652 | Stanek | May 2006 | B2 |
7240027 | McConnell et al. | Jul 2007 | B2 |
7693757 | Zimmerman | Apr 2010 | B2 |
7742623 | Moon et al. | Jun 2010 | B1 |
8009863 | Sharma et al. | Aug 2011 | B1 |
8219438 | Moon et al. | Jul 2012 | B1 |
8261256 | Adler et al. | Sep 2012 | B1 |
8279325 | Pitts et al. | Oct 2012 | B2 |
8452868 | Shafer et al. | May 2013 | B2 |
8577705 | Baboo et al. | Nov 2013 | B1 |
8624725 | MacGregor | Jan 2014 | B1 |
8749630 | Alahi et al. | Jun 2014 | B2 |
8897741 | Johnson | Nov 2014 | B2 |
9033229 | Matsuhisa et al. | May 2015 | B2 |
9036028 | Buehler | May 2015 | B2 |
9058523 | Merkel et al. | Jun 2015 | B2 |
9147174 | Glickman et al. | Sep 2015 | B2 |
9171442 | Clements | Oct 2015 | B2 |
9262681 | Mishra | Feb 2016 | B1 |
9269012 | Fotland | Feb 2016 | B2 |
9269093 | Lee et al. | Feb 2016 | B2 |
9294873 | MacGregor | Mar 2016 | B1 |
9390032 | Baldwin | Jul 2016 | B1 |
9449233 | Taylor | Sep 2016 | B2 |
9489623 | Sinyavskiy et al. | Nov 2016 | B1 |
9494532 | Xie et al. | Nov 2016 | B2 |
9536177 | Chalasani et al. | Jan 2017 | B2 |
9582891 | Geiger et al. | Feb 2017 | B2 |
9595127 | Champion et al. | Mar 2017 | B2 |
9652751 | Aaron et al. | May 2017 | B2 |
9693333 | Alles et al. | Jun 2017 | B2 |
9846810 | Partis | Dec 2017 | B2 |
9881221 | Bala et al. | Jan 2018 | B2 |
9886827 | Schoner | Feb 2018 | B2 |
9911290 | Zalewski et al. | Mar 2018 | B1 |
10055853 | Fisher et al. | Aug 2018 | B1 |
10083453 | Campbell | Sep 2018 | B2 |
10127438 | Fisher et al. | Nov 2018 | B1 |
10133933 | Fisher et al. | Nov 2018 | B1 |
10165194 | Baldwin | Dec 2018 | B1 |
10169677 | Ren et al. | Jan 2019 | B1 |
10175340 | Abari et al. | Jan 2019 | B1 |
10176452 | Rizzolo et al. | Jan 2019 | B2 |
10192408 | Schoner | Jan 2019 | B2 |
10202135 | Mian et al. | Feb 2019 | B2 |
10210603 | Venable et al. | Feb 2019 | B2 |
10210737 | Zhao | Feb 2019 | B2 |
10217120 | Shin et al. | Feb 2019 | B1 |
10242393 | Kumar et al. | Mar 2019 | B1 |
10257708 | Kamkar et al. | Apr 2019 | B1 |
10262331 | Sharma et al. | Apr 2019 | B1 |
10282720 | Buibas et al. | May 2019 | B1 |
10282852 | Buibas et al. | May 2019 | B1 |
10332089 | Asmi et al. | Jun 2019 | B1 |
10354262 | Hershey et al. | Jul 2019 | B1 |
10373322 | Buibas et al. | Aug 2019 | B1 |
10387896 | Hershey et al. | Aug 2019 | B1 |
10438277 | Jiang et al. | Oct 2019 | B1 |
10445694 | Fisher et al. | Oct 2019 | B2 |
10474877 | Huang et al. | Nov 2019 | B2 |
10474988 | Fisher et al. | Nov 2019 | B2 |
10474991 | Fisher et al. | Nov 2019 | B2 |
10474992 | Fisher et al. | Nov 2019 | B2 |
10474993 | Fisher et al. | Nov 2019 | B2 |
10515518 | Cantley et al. | Dec 2019 | B2 |
10529137 | Black et al. | Jan 2020 | B1 |
10535146 | Buibas et al. | Jan 2020 | B1 |
10580099 | Branscomb et al. | Mar 2020 | B2 |
10650545 | Fisher et al. | May 2020 | B2 |
10699537 | Schoner | Jun 2020 | B1 |
10776926 | Shrivastava | Sep 2020 | B2 |
10810539 | Mohanty et al. | Oct 2020 | B1 |
10846996 | Schoner | Nov 2020 | B2 |
10853965 | Fisher et al. | Dec 2020 | B2 |
11132810 | Kume et al. | Sep 2021 | B2 |
11250376 | Fisher et al. | Feb 2022 | B2 |
11295270 | Fisher et al. | Apr 2022 | B2 |
11301684 | Kumar et al. | Apr 2022 | B1 |
20030078849 | Snyder | Apr 2003 | A1 |
20030107649 | Flickner et al. | Jun 2003 | A1 |
20040099736 | Neumark | May 2004 | A1 |
20040131254 | Liang et al. | Jul 2004 | A1 |
20050177446 | Hoblit | Aug 2005 | A1 |
20050201612 | Park et al. | Sep 2005 | A1 |
20060132491 | Riach et al. | Jun 2006 | A1 |
20060279630 | Aggarwal et al. | Dec 2006 | A1 |
20070017984 | Mountz | Jan 2007 | A1 |
20070021863 | Mountz | Jan 2007 | A1 |
20070021864 | Mountz et al. | Jan 2007 | A1 |
20070182718 | Schoener et al. | Aug 2007 | A1 |
20070282665 | Buehler et al. | Dec 2007 | A1 |
20080001918 | Hsu et al. | Jan 2008 | A1 |
20080159634 | Sharma et al. | Jul 2008 | A1 |
20080170776 | Albertson et al. | Jul 2008 | A1 |
20080181507 | Gope et al. | Jul 2008 | A1 |
20080211915 | McCubbrey | Sep 2008 | A1 |
20080243614 | Tu et al. | Oct 2008 | A1 |
20090041297 | Zhang et al. | Feb 2009 | A1 |
20090057068 | Lin et al. | Mar 2009 | A1 |
20090083815 | McMaster et al. | Mar 2009 | A1 |
20090121017 | Cato | May 2009 | A1 |
20090217315 | Malik et al. | Aug 2009 | A1 |
20090222313 | Kannan et al. | Sep 2009 | A1 |
20090307226 | Koster et al. | Dec 2009 | A1 |
20100021009 | Yao | Jan 2010 | A1 |
20100103104 | Son et al. | Apr 2010 | A1 |
20100138281 | Zhang et al. | Jun 2010 | A1 |
20100208941 | Broaddus et al. | Aug 2010 | A1 |
20100283860 | Nader | Nov 2010 | A1 |
20110141011 | Lashina et al. | Jun 2011 | A1 |
20110209042 | Porter | Aug 2011 | A1 |
20110228976 | Fitzgibbon et al. | Sep 2011 | A1 |
20110248083 | Bonner | Oct 2011 | A1 |
20110317012 | Hammadou | Dec 2011 | A1 |
20110317016 | Saeki et al. | Dec 2011 | A1 |
20110320322 | Roslak et al. | Dec 2011 | A1 |
20120119879 | Estes et al. | May 2012 | A1 |
20120137256 | Lalancette et al. | May 2012 | A1 |
20120154604 | Chen et al. | Jun 2012 | A1 |
20120159290 | Pulsipher et al. | Jun 2012 | A1 |
20120209749 | Hammad et al. | Aug 2012 | A1 |
20120245974 | Bonner et al. | Sep 2012 | A1 |
20120271712 | Katzin et al. | Oct 2012 | A1 |
20120275686 | Wilson et al. | Nov 2012 | A1 |
20120290401 | Neven | Nov 2012 | A1 |
20130011007 | Muriello et al. | Jan 2013 | A1 |
20130011049 | Kimura | Jan 2013 | A1 |
20130060708 | Oskolkov et al. | Mar 2013 | A1 |
20130076898 | Philippe et al. | Mar 2013 | A1 |
20130142384 | Ofek | Jun 2013 | A1 |
20130156260 | Craig | Jun 2013 | A1 |
20130182114 | Zhang et al. | Jul 2013 | A1 |
20130201339 | Venkatesh | Aug 2013 | A1 |
20130235206 | Smith et al. | Sep 2013 | A1 |
20130337789 | Johnson | Dec 2013 | A1 |
20140052555 | MacIntosh | Feb 2014 | A1 |
20140084060 | Jain | Mar 2014 | A1 |
20140100769 | Wurman | Apr 2014 | A1 |
20140168477 | David | Jun 2014 | A1 |
20140172649 | Cancro | Jun 2014 | A1 |
20140188648 | Argue et al. | Jul 2014 | A1 |
20140207615 | Li et al. | Jul 2014 | A1 |
20140222501 | Hirakawa et al. | Aug 2014 | A1 |
20140282162 | Fein et al. | Sep 2014 | A1 |
20140285660 | Jamtgaard et al. | Sep 2014 | A1 |
20140300539 | Tong et al. | Oct 2014 | A1 |
20140300736 | Reitinger et al. | Oct 2014 | A1 |
20140304123 | Schwartz | Oct 2014 | A1 |
20140350711 | Gopalakrishnan | Nov 2014 | A1 |
20150002675 | Kundu et al. | Jan 2015 | A1 |
20150009323 | Lei | Jan 2015 | A1 |
20150012396 | Puerini et al. | Jan 2015 | A1 |
20150019391 | Kumar et al. | Jan 2015 | A1 |
20150026010 | Ellison | Jan 2015 | A1 |
20150026646 | Ahn et al. | Jan 2015 | A1 |
20150029339 | Kobres et al. | Jan 2015 | A1 |
20150039458 | Reid | Feb 2015 | A1 |
20150049914 | Alves | Feb 2015 | A1 |
20150085111 | Lavery | Mar 2015 | A1 |
20150124107 | Muriello et al. | May 2015 | A1 |
20150193761 | Svetal | Jul 2015 | A1 |
20150206188 | Tanigawa et al. | Jul 2015 | A1 |
20150208043 | Lee et al. | Jul 2015 | A1 |
20150213391 | Hasan | Jul 2015 | A1 |
20150221094 | Marcheselli et al. | Aug 2015 | A1 |
20150222861 | Fujii et al. | Aug 2015 | A1 |
20150262116 | Katircioglu et al. | Sep 2015 | A1 |
20150269398 | Zumsteg | Sep 2015 | A1 |
20150269740 | Mazurenko et al. | Sep 2015 | A1 |
20150294397 | Croy et al. | Oct 2015 | A1 |
20150302593 | Mazurenko et al. | Oct 2015 | A1 |
20150310459 | Bernal et al. | Oct 2015 | A1 |
20150324779 | Gala | Nov 2015 | A1 |
20150327794 | Rahman et al. | Nov 2015 | A1 |
20150332312 | Cosman | Nov 2015 | A1 |
20150353282 | Mansfield | Dec 2015 | A1 |
20150363868 | Kleinhandler et al. | Dec 2015 | A1 |
20150379366 | Nomura et al. | Dec 2015 | A1 |
20160095511 | Taguchi et al. | Apr 2016 | A1 |
20160110760 | Herring et al. | Apr 2016 | A1 |
20160125245 | Saitwal et al. | May 2016 | A1 |
20160155011 | Sulc et al. | Jun 2016 | A1 |
20160162715 | Luk et al. | Jun 2016 | A1 |
20160171707 | Schwartz | Jun 2016 | A1 |
20160188962 | Taguchi | Jun 2016 | A1 |
20160189286 | Zohar et al. | Jun 2016 | A1 |
20160203525 | Hara et al. | Jul 2016 | A1 |
20160217157 | Shih et al. | Jul 2016 | A1 |
20160217417 | Ma et al. | Jul 2016 | A1 |
20160232677 | Liao et al. | Aug 2016 | A1 |
20160259994 | Ravindran et al. | Sep 2016 | A1 |
20160358145 | Montgomery | Dec 2016 | A1 |
20160371726 | Yamaji et al. | Dec 2016 | A1 |
20160381328 | Zhao | Dec 2016 | A1 |
20170024806 | High et al. | Jan 2017 | A1 |
20170032193 | Yang | Feb 2017 | A1 |
20170068861 | Miller et al. | Mar 2017 | A1 |
20170104979 | Shaw et al. | Apr 2017 | A1 |
20170116473 | Sashida et al. | Apr 2017 | A1 |
20170124096 | Hsi et al. | May 2017 | A1 |
20170124508 | Wasilewsky | May 2017 | A1 |
20170132492 | Xie et al. | May 2017 | A1 |
20170148005 | Murn | May 2017 | A1 |
20170154212 | Feris et al. | Jun 2017 | A1 |
20170154246 | Guttmann | Jun 2017 | A1 |
20170161555 | Kumar et al. | Jun 2017 | A1 |
20170168586 | Sinha et al. | Jun 2017 | A1 |
20170178226 | Graham et al. | Jun 2017 | A1 |
20170200106 | Jones | Jul 2017 | A1 |
20170206664 | Shen | Jul 2017 | A1 |
20170206669 | Saleemi et al. | Jul 2017 | A1 |
20170249339 | Lester | Aug 2017 | A1 |
20170255990 | Ramamurthy et al. | Sep 2017 | A1 |
20170278255 | Shingu et al. | Sep 2017 | A1 |
20170308911 | Barham et al. | Oct 2017 | A1 |
20170309136 | Schoner | Oct 2017 | A1 |
20170323376 | Glaser et al. | Nov 2017 | A1 |
20180003315 | Reed | Jan 2018 | A1 |
20180012072 | Glaser et al. | Jan 2018 | A1 |
20180012080 | Glaser et al. | Jan 2018 | A1 |
20180014382 | Glaser et al. | Jan 2018 | A1 |
20180025175 | Kato | Jan 2018 | A1 |
20180032799 | Marcheselli et al. | Feb 2018 | A1 |
20180032840 | Yu et al. | Feb 2018 | A1 |
20180033015 | Opalka et al. | Feb 2018 | A1 |
20180033151 | Matsumoto et al. | Feb 2018 | A1 |
20180068431 | Takeda et al. | Mar 2018 | A1 |
20180070056 | DeAngelis et al. | Mar 2018 | A1 |
20180088900 | Glaser et al. | Mar 2018 | A1 |
20180150788 | Vepakomma et al. | May 2018 | A1 |
20180165728 | McDonald et al. | Jun 2018 | A1 |
20180165733 | Kundu et al. | Jun 2018 | A1 |
20180181995 | Burry et al. | Jun 2018 | A1 |
20180189600 | Astrom et al. | Jul 2018 | A1 |
20180217223 | Kumar et al. | Aug 2018 | A1 |
20180225625 | DiFatta et al. | Aug 2018 | A1 |
20180232796 | Glaser et al. | Aug 2018 | A1 |
20180240180 | Glaser et al. | Aug 2018 | A1 |
20180295424 | Taylor et al. | Oct 2018 | A1 |
20180322616 | Guigues | Nov 2018 | A1 |
20180329762 | Li et al. | Nov 2018 | A1 |
20180332235 | Glaser | Nov 2018 | A1 |
20180332236 | Glaser et al. | Nov 2018 | A1 |
20180343417 | Davey | Nov 2018 | A1 |
20180365481 | Tolbert et al. | Dec 2018 | A1 |
20180365755 | Bekbolatov et al. | Dec 2018 | A1 |
20180373928 | Glaser et al. | Dec 2018 | A1 |
20190005479 | Glaser et al. | Jan 2019 | A1 |
20190019309 | Herrli et al. | Jan 2019 | A1 |
20190034735 | Cuban et al. | Jan 2019 | A1 |
20190043003 | Fisher et al. | Feb 2019 | A1 |
20190057435 | Chomley et al. | Feb 2019 | A1 |
20190147709 | Schoner | May 2019 | A1 |
20190156273 | Fisher et al. | May 2019 | A1 |
20190156274 | Fisher et al. | May 2019 | A1 |
20190156275 | Fisher et al. | May 2019 | A1 |
20190156276 | Fisher et al. | May 2019 | A1 |
20190156277 | Fisher et al. | May 2019 | A1 |
20190156506 | Fisher et al. | May 2019 | A1 |
20190158813 | Rowell et al. | May 2019 | A1 |
20190188876 | Song et al. | Jun 2019 | A1 |
20190244386 | Fisher et al. | Aug 2019 | A1 |
20190244500 | Fisher et al. | Aug 2019 | A1 |
20190251340 | Brown et al. | Aug 2019 | A1 |
20190347611 | Fisher et al. | Nov 2019 | A1 |
20190377957 | Johnston et al. | Dec 2019 | A1 |
20190378205 | Glaser et al. | Dec 2019 | A1 |
20190392318 | Ghafoor et al. | Dec 2019 | A1 |
20200034988 | Zhou | Jan 2020 | A1 |
20200074165 | Ghafoor et al. | Mar 2020 | A1 |
20200074393 | Fisher et al. | Mar 2020 | A1 |
20200074394 | Fisher et al. | Mar 2020 | A1 |
20200074432 | Valdman et al. | Mar 2020 | A1 |
20200118400 | Zalewski et al. | Apr 2020 | A1 |
20200125824 | Mabyalaht et al. | Apr 2020 | A1 |
20200134588 | Nelms et al. | Apr 2020 | A1 |
20200151692 | Gao et al. | May 2020 | A1 |
20200193507 | Glaser et al. | Jun 2020 | A1 |
20200234463 | Fisher et al. | Jul 2020 | A1 |
20200258241 | Liu et al. | Aug 2020 | A1 |
20200293992 | Bogolea et al. | Sep 2020 | A1 |
20200334834 | Fisher | Oct 2020 | A1 |
20200334835 | Buibas et al. | Oct 2020 | A1 |
20200410713 | Auer et al. | Dec 2020 | A1 |
20210067744 | Buibas et al. | Mar 2021 | A1 |
20210158430 | Buibas et al. | May 2021 | A1 |
20210201253 | Fisher et al. | Jul 2021 | A1 |
20210295081 | Berry et al. | Sep 2021 | A1 |
Number | Date | Country |
---|---|---|
2497795 | Aug 2006 | CA |
104850846 | Aug 2015 | CN |
105069413 | Nov 2015 | CN |
105701519 | Jun 2016 | CN |
104778690 | Jun 2017 | CN |
108055390 | May 2018 | CN |
1574986 | Jul 2008 | EP |
2555162 | Feb 2013 | EP |
3002710 | Apr 2016 | EP |
3474183 | Apr 2019 | EP |
3474184 | Apr 2019 | EP |
2560387 | Sep 2018 | GB |
2566762 | Mar 2019 | GB |
2011253344 | Dec 2011 | JP |
2013196199 | Sep 2013 | JP |
2014089626 | May 2014 | JP |
2016206782 | Dec 2016 | JP |
2017157216 | Sep 2017 | JP |
2018099317 | Jun 2018 | JP |
1020180032400 | Mar 2018 | KR |
1020190093733 | Aug 2019 | KR |
102223570 | Mar 2021 | KR |
1751401 | Jun 2018 | SE |
201445474 | Dec 2014 | TW |
201504964 | Feb 2015 | TW |
201725545 | Jul 2017 | TW |
201911119 | Mar 2019 | TW |
0021021 | Apr 2000 | WO |
0243352 | May 2002 | WO |
02059836 | May 2003 | WO |
2008029159 | Mar 2008 | WO |
2009027839 | Mar 2009 | WO |
2012067646 | May 2012 | WO |
2013033442 | Mar 2013 | WO |
2013041444 | Mar 2013 | WO |
2013103912 | Jul 2013 | WO |
2014133779 | Sep 2014 | WO |
2015033577 | Mar 2015 | WO |
2015040661 | Mar 2015 | WO |
2015133699 | Sep 2015 | WO |
2015173869 | Nov 2015 | WO |
2016136144 | Sep 2016 | WO |
2016166508 | Oct 2016 | WO |
2017015390 | Jan 2017 | WO |
2017151241 | Sep 2017 | WO |
2017163909 | Sep 2017 | WO |
2017196822 | Nov 2017 | WO |
2018013438 | Jan 2018 | WO |
2018013439 | Jan 2018 | WO |
2018148613 | Aug 2018 | WO |
2018162929 | Sep 2018 | WO |
2018209156 | Nov 2018 | WO |
2018237210 | Dec 2018 | WO |
2019032304 | Feb 2019 | WO |
2019032305 | Feb 2019 | WO |
2019032306 | Feb 2019 | WO |
2019032307 | Feb 2019 | WO |
2020023795 | Jan 2020 | WO |
2020023796 | Jan 2020 | WO |
2020023798 | Jan 2020 | WO |
2020023799 | Jan 2020 | WO |
2020023801 | Jan 2020 | WO |
2020023926 | Jan 2020 | WO |
2020023930 | Jan 2020 | WO |
2020047555 | Mar 2020 | WO |
2020214775 | Oct 2020 | WO |
2020225562 | Nov 2020 | WO |
Entry |
---|
Cao et al, “Leveraging Convolutional Pose Machines for Fast and Accurate Head Pose Estimation”, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 1, 2018. pp. 1089-1094. |
DeTone et al, SuperPoint: Self-Supervised Interest Point Detection and Description, Apr. 19, 2018, arXiv:1712.07629v4 [cs.CV] Apr. 19, 2018, 13 pages. |
Erdem et al. “Automated camera layout to satisfy task-specific and floor plan-specific coverage requirements,” Computer Vision and Image Undertanding 103, Aug. 1, 2006, 156-169. |
Lin et al., Energy-Accuracy Trade-off for Continuous Mobile Device Location, MobiSys'10, Jun. 15-18, 2010, San Francisco, California, 13 pages. |
Wei et al., “Convolutional Pose Machine”, Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Jun. 27, 2016, pp. 4724-4732. |
Yusoff et al. “Optimal Camera Placement for 3D Environment,” ICSECS 2011: Software Engineering and Computer Systems, Jun. 27-29, 2011, 448-459. |
Jayabalan, et al., “Dynamic Action Recognition: A convolutional neural network model for temporally organized joint location data,” Cornell University, Computer Science, Dec. 20, 2016, 11 pages. |
Camplani et al., “Background foreground segmentation with RGB-D Kinect data: An efficient combination of classifiers”, Journal of Visual Communication and Image Representation, Academic Press, Inc., US, vol. 25, No. 1, Mar. 27, 2013, pp. 122-136, XP028804219, ISSN: 1047-3203, DOI: 10.1016/J.JVCIR.2013.03.009. |
Ceballos, Scikit-Learn Decision Trees Explained, https://towardsdatascience.com/scikit-learn-decision-trees-explained-803f3812290d, Feb. 22, 2019, 13 pages. |
Gkioxari et al. “R-CNNs for Pose Estimation and Action Detection,” Cornell University, Computer Science, Computer Vision and Pattern Recognition, arXiv.org > cs > arXiv:1406.5212, Jun. 19, 2014, 8 pages. |
Harville, “Stereo person tracking with adaptive plan-view templates of height and occupancy statistics,” Image and Vision Computing, vol. 22, Issue 2, Feb. 1, 2004, pp. 127-142. |
Huang, et al. “Driver's view and vehicle surround estimation using omnidirectional video stream,” IEEE IV2003 Intelligent Vehicles Symposium. Proceedings (Cat. No. 03TH8683), Jun. 9-11, 2003, pp. 444-449. |
Symons, “Data Fusion Methods for Netted Sensors with Limited Communication Bandwidth”, QinetiQ Ltd and University College London, 2004. |
Vincze, “Robust tracking of ellipses at frame rate,” Pattern Recognition, vol. 34, Issue 2, Feb. 2001, pp. 487-498. |
TW 108126624—First Office Action with English Translation, dated Feb. 8, 2022, 6 pages. |
EP 17197518.8—Extended European Search Report, dated Apr. 24, 2018, 8 pages. |
EP 17197518.8—Communication Pursuant to Article 94(3), dated May 7, 2021, 2 pages. |
EP 17197525.3—Response to Communication Pursuant to Article 94(3) dated May 6, 2021, filed Oct. 28, 2021, 8 pages. |
EP 17197518.8—Response to Communication Pursuant to Article 94(3) dated May 7, 2021, filed Oct. 28, 2021, 24 pages. |
EP 17197525.3—Communication Pursuant to Article 94(3) dated May 6, 2021, 4 pages. |
EP 17197525.3—Extended European Search Report, dated Apr. 24, 2018, 8 pages. |
Black et al., “Multi View Image Surveillance and Tracking,” IEEE Proceedings of the Workshop on Motion and Video Computing, 2002, pp. 1-6. |
Grinciunaite et al. “Human Pose Estimation in Space and Time Using 3D CNN,” ECCV Workshop on Brave new ideas for motion representations in videos, Oct. 2016, 7 pages. |
He et al. “Identity mappings in deep residual networks” (published at https://arxiv.org/pdf/1603.05027.pdf), Jul. 25, 2016, 15 pages. |
Longuet-Higgens, “A computer algorithm for reconstructing a scene from two projections,” Nature 293, Sep. 10, 1981, pp. 133-135. |
PCT/US2019-043520—International Preliminary Report and Written Opinion dated Feb. 4, 2021, 7 pages. |
PCT/US2019-043520—International Search Report and Written Opinion dated May 8, 2020, 10 pages. |
PCT/US2019/043519—International Preliminary Report on Patentability dated Feb. 4, 2021, 7 pages. |
PCT/US2019/043519—International Search Report and Written Opinion dated Oct. 31, 2019, 10 pages. |
PCT/US2019/043522—International Search Report and Written Opinion dated Nov. 15, 2019, 11 pages. |
PCT/US2019/043523—International Preliminary Report on Patentability dated Feb. 4, 2021, 15 pages. |
PCT/US2019/043523—International Search Report and Written Opinion dated Nov. 20, 2019, 18 pages. |
Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection,” University of Washington, Allen Institute for Aly, Facebook AI Research, May 9, 2016, 10 pages. |
Redmon et al., YOLO9000: Better, Faster, Stronger, (available at https://arxiv.org/pdf/1612.08242.pdf), Dec. 25, 2016, 9 pages. |
Rossi et al., “Tracking and Counting Moving People,” IEEE Int'l Conf. on Image Processing, ICIP-94, Nov. 13-16, 1994, 5 pages. |
Toshev et al. “DeepPose: Human Pose Estimation via Deep Neural Networks,” IEEE Conf. on Computer Vision and Pattern Recognition, Aug. 2014, 8 pages. |
U.S. Appl. No. 16/256,904—Notice of Allowance dated Jun. 12, 2019, 28 pages. |
U.S. Appl. No. 16/256,904—Office Action dated Mar. 19, 2019, 15 pages. |
U.S. Appl. No. 16/256,936—Final Office Action dated Nov. 29, 2019, 23 pages. |
U.S. Appl. No. 16/256,936—Office Action dated May 16, 2019, 11 pages. |
U.S. Appl. No. 16/519,660—Office Action dated Aug. 20, 2020, 18 pages. |
U.S. Appl. No. 16/256,936—Office Action, dated Jun. 1, 2020, 21 pages. |
U.S. Appl. No. 16/519,660—Final Office Action dated Dec. 23, 2020, 22 pages. |
Zhang “A Flexible New Technique for Camera Calibration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, No. 11, Nov. 2000, 22pages. |
U.S. Appl. No. 16/256,936—Notice of Allowance, dated Dec. 14, 2020, 13 pages. |
U.S. Appl. No. 16/256,936—Notice of Allowance, dated Mar. 19, 2021, 16 pages. |
EP 19841054—Rules 161(2) and 162 Communication, dated Mar. 5, 2021, 3 pages. |
EP 19841054—Response to Rules 161(2) and 162 Communication, filed Sep. 3, 2021, 7 pages. |
PCT/US2019/043522—International Preliminary Report on Patentability, dated Feb. 4, 2021, 8 pages. |
EP 19840680.3—Rules 161(2) and 162 Communication, dated Mar. 5, 2021, 3 pages. |
EP 19840680.3—Response to Rules 161(2) and 162 Communication, filed Sep. 3, 2021, 7 pages. |
Prindle, This Automated Store in Sweden Doesn't have Any Human Employees—Only a Smartphone App, Digital Trends, dated Feb. 29, 2016, 6 pages. Retrieved on Feb. 23, 2022. Retrieved from the internet [URL: <https://www.digitaltrends.com/cool-tech/sweden-app-enabled-automated-store/> ]. |
Amazon, “Amazon Go Frequently Asked Questions”, 4 pages. Retrieved on Feb. 23, 2022. Retrieved from the internet [URL: <https://www.amazon.com/b?node=16008589011> ]. |
Keulian, “Naraffar, the First Staffless Convenience Store,” “Stephane Keulian: Your fix of retail insights Blog” Retrieved on May 5, 2017. 3 pages. Retrieved from the internet [URL: <http://stephanekeulian.com/en/naraffar-first-staffees-convenience-store/> ]. |
EDYN Company, Oakland, CA, Retrieved from the internt [URL: <https://edyn.com/> ]. Retrieved on May 5, 2017. 12 pages. |
Number | Date | Country | |
---|---|---|---|
20220207470 A1 | Jun 2022 | US |
Number | Date | Country | |
---|---|---|---|
62703785 | Jul 2018 | US | |
62542077 | Aug 2017 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16679035 | Nov 2019 | US |
Child | 17697760 | US | |
Parent | 16256355 | Jan 2019 | US |
Child | 16679035 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15945473 | Apr 2018 | US |
Child | 16256355 | US | |
Parent | 15907112 | Feb 2018 | US |
Child | 15945473 | US | |
Parent | 15847796 | Dec 2017 | US |
Child | 15907112 | US |