The invention relates to acquiring training data to train a learning machine. In particular, acquiring product training data for training a learning machine to achieve automated checkout.
Systems exist for the purpose of trying to provide automated checkout within a store. These systems have several issues. Some of the issues are related to training and modeling products for recognition by automated checkout systems. For example, some systems send a product to be scanned, modeled three dimensionally, and generate synthetic data. Processing the scanned, modeled, or synthetic data, however, is not as accurate as modeling the product itself. Other systems collect images of the product and mark the product by hand in each frame in which the product appears. This method for identifying a product is very time-consuming and requires extensive user resources. Other systems tally a customer's receipts at a point of sale to indicate that a customer has picked these items. The disadvantage with this system is that it does not truly confirm what or when a user picked from the aisles of the store. What is needed is an improved system for training a learning machine for automated checkout.
The present technology, roughly described, automatically acquires training data products for automated checkout. The present system configures an automated checkout system for acquiring training data and automatically captures training data using multimodal sensing techniques. The training data can then be used to train a learning model, such as for example a deep learning model with multiple neural networks. The trained learning model may then be applied to users interacting with product display units within a store.
There are several benefits to the present technology with respect to the prior art. For example, videos are automatically collected when a person interacts with a product display unit. The video includes ground truth data for the time, quantity, product identifier (for example, an SKU), and location. The training data can be representative of an actual in-store condition and performs better for testing than other methods. Because of the in-store conditions, the videos are more realistic than those that are generated using synthetic rendering.
In some instances, a method for acquiring training data of products for automated checkout begins with receiving a plurality of weight values by a computing device and from a weight sensing mechanism coupled to a product display unit. Each of the plurality of weight values associated with a time stamp and a change in the number of products are stored on the product display unit. The method continues with receiving a plurality of video data sets by the computing device and from one or more cameras, wherein each video data set having a time stamp that is synchronized with one of the weight value time stamps. Additionally, each video data set captures the location on the product display unit that is associated with the change in the number of products stored on the display unit. A computing device determines, for each weight value, the quantity of products removed or added to the display unit based at least in part on the weight value. The plurality of weight values, plurality of video data sets, and product quantities removed or added are intended to be used for training a learning machine.
In some instances, a computer readable medium stores code that, when executed, performs a similar method.
In some instances, a system for acquiring training data of products for automated checkout includes a server having memory, a processor, and one or more modules. The one or more modules can be stored in the memory and executed by the processor to receive a plurality of weight values by a computing device and from a weight sensing mechanism coupled to a product display unit, each of the plurality of weight values associated with a time stamp and a change in the number of products stored on the product display unit. receive a plurality of video data sets by the computing device and from one or more cameras, each video data set having a time stamp that is synchronized with one of the weight value time stamps, and each video data set capturing the location on the product display unit that is associated with the change in the number of products stored on the display unit, determine, by the computing device and for each weight value, the quantity of products removed or added to the display unit based at least in part on the weight value, wherein the plurality of weight values, plurality of video data sets, and product quantities removed or added are intended to be used for training a learning machine.
The present system automatically acquires training data of products for automated checkout. The present system configures an automated checkout system for acquiring training data and automatically captures training data using multimodal sensing techniques. The training data can then be used to train a learning model, such as for example a deep learning model with multiple neural networks. The trained learning model may then be applied to users interacting with product display units within a store.
There are several benefits to the present technology with respect to the prior art. For example, videos are automatically collected when a person interacts with a product display unit. The video includes ground truth data for the time, quantity, product identifier (for example, a SKU), and location. The training data can be representative of an actual in-store condition and performs better for testing than other methods. Because of the in-store conditions, the videos are more realistic than those that are generated using synthetic rendering.
The product display units may be any unit that can support a product, such as for example a shelf. Each product display unit 113-115 may support or hold a number of products, such as products 110 and 112. Each product may have a different weight, such that product 110 may have a different weight than product 112. Weight sensing mechanisms 101-103 may detect the total weight on display units 113-115, respectively. As a product is removed from a particular product display unit, a corresponding weight sensing unit coupled to or incorporated within the product display unit will detect the change in weight. The weight sensing unit may also send a message to a computing device indicating that a change in weight has been detected. When a change in weight has been detected, a timestamp is recorded by the weight sensing device and sent with the notification to the computing device.
Overhead cameras 104-106 may capture the product display surface of each product display unit as well as the vicinity around the display units. Similarly, each lateral camera 107, 108, and 109 can capture one shelf each, and collectively capture the space over all three product display units. Hence, cameras 104-106 and cameras 107-109 capture videos of all the products on product display units 107-109, but from different angles. In some instances, additional cameras may also capture video of one or more locations on the product display units, but are not shown in
With a configuration that involves a cluster of cameras, each product may be captured by at least two cameras, or at least three cameras for example in the illustration of
The clusters of cameras may be positioned such that each product is covered by two or more cameras. In some instances, each corresponding camera within a cluster may be positioned at a distance d between each other. For example, in
Data collection server 120 may receive data from the multimodal sensors and configure the received data as training data for a learning machine. Data collection server 120 may also provide an interface for configuring weight sensing mechanism sensitivity and camera images, manage a weight mechanism manager, manage cameras, and configure planograms. More details for data collection server 120 are discussed with respect to the system of
The interfaces provided by data collection server 120 can be viewed by a network browser on computing device 140, an application on computing device 150, as well as other devices. In some instances, network browser 145 may receive content pages and other data from server 120, and provide the content pages to a user through an output of computing device 140. The content pages may provide interfaces, provide output, collect input, and otherwise communicate with a user through computing device 140. Computing device 110 can 140 implemented as a desktop computer, workstation, or some other computing device.
In some instances, the interfaces can be accessed through an application on a computing device, such as application 155. Computing device 150 may include application 155 stored in device memory and executed by one or more processors. In some instances, application 155 may receive data and from a remote server 120, process the data and render output from the raw and processed data, and provide the output to a user through an output of computing device 150.
Model training server 160 may receive training data from data collection server 120 and train a learning machine using the received data. In some instances, the learning machine may be a deep learning machine with multiple layers of neural networks. Model training server is discussed in more detail with respect to the system of
Data store 170 may be in communication with model training server 160 as well as data collection server 120. The data store may include data such as video data, metadata (metadata annotated to video data and other metadata), tables of product identifier and location identifier data, and product data such as the weight, size, and an image of the product, such as for example a thumbnail image. Data store 170 is discussed in more detail with respect to the system of
Once a model is trained with training data, it may be implemented within an automatic checkout store 180. Weight sensing mechanisms and cameras may be set up in multiple aisles, as illustrated in automatic checkout store 180, and the data collected from these multimodal modal sensors may be processed by a trained learning model on data collection server 182. The trained learning model may detect products retrieved or added to a lane of a product display unit, as well as if a product retrieved from or added to a product display unit is different from a product associated with the particular display unit per the planogram. In any case, when the learning model indicates one or more products have been retrieved from a product display unit, those products are added to a user's cart, and an application on the customer's mobile device 184 identifies the user and the products that are taken by the user while shopping through the automated checkout store 180.
Mobile device 184 may include a mobile application that provides interfaces, output, and receives input as part of implementing the present technology as discussed herein. Mobile device 184 may be implemented as a laptop, cellular phone, tablet computer, chrome book, or some other machine that is considered a “mobile” computing device.
Weight mechanism manager 420 may calibrate weight sensing mechanisms, receive data from weight sensing mechanisms, and otherwise communicate and manage mechanisms of the present system.
Planogram manager 430 may generate and configure a planogram in response to user input. For example, a user may provide input through an interface to configure a planogram for a particular product display unit associated with one or more lanes.
Video annotation manager may annotate received video data with additional metadata, such as for example a timestamp, product location, number of products, and other data. Video annotation manager may also determine the number of products retrieved by a retrieving product identifiers and weight information from a remote data store.
Camera manager 450 may configuring communicate with one or more cameras within the present system. Camera manager 450 may also receive video data from one or more cameras.
Trained learning machine 460 may be provided to data collection server 120 after a learning machine has been trained with learning data. The trained learning machine can be used to detect products added to or removed from a product display unit, as well as detecting that a product removed or added to a product display unit does not match a product for that lane as indicated by a planogram.
Meta-data 620 may include data annotated to the received video data and/or other data, including but not limited to weight data, product quantity, timestamp, lane or location data, and other data.
A table of product identifiers and location identifiers can be searched by remote applications to determine what product is associated with a particular location at which a weight event has occurred. Product weight, size, and image data may be used to populate one or more interfaces when configuring or providing information about a particular product.
Training data is captured using multimodal sensors at step 720. Capturing the training data may include detecting a change in weight, storing timestamps, and automatically capturing video. Capturing training data is discussed in more detail with respect to the method of
A deep learning model is trained using the captured training data at step 730. The trained deep learning model may then be applied to users interacting with product display units at step 740. More detail for training a learning model and applying a training model are discussed with respect to
A planogram is then built for products on the product display units at step 830. The planogram may specify which products are on which locations on the product display units. The planogram data can be stored in a remote data store, locally, or both. Shelf sensitivity may then be adjusted based on products positioned on the product display unit at step 840. The shelf sensitivity can be adjusted through an interface, mechanically, or in some other manner. The sensitivity can be set such that the way to sensing devices on the product display units can detect when a product on those display units is removed or added.
Lanes can be marked in views of one or more cameras at step 850. The lanes can be marked in images provided through an interface of the present system. The lanes may specify one or more products on particular positions within the product display unit.
The date and time can be synchronized between the weight sensing units and cameras at step 860. In some instances, the time synchronization can be performed using a network timing protocol synchronization. A product identifier, product weight, and lane identifier can be populated in a data store at step 870. The data may be sent to the data store for storage by data collection server 120, model train server 160, or both. The product weight, size, and image data may be stored in a data store at step 880. The product weight, size and image data, such as for example a thumbnail image, can be retrieved for display in one or more interfaces for managing the present system, such as for example inventory tracking, planogram generation, lane annotation, and other operations.
The factor of the product weight is determined to identify quantity of products taken from the product display unit at step 950. For example, if the change in weight is twice the weight of a product, then two units of the product have been taken. A product identifier may be automatically retrieved based on the planogram and wait sensor at step 960. When the particular product display unit and the change in weight is known, the planogram can be used to retrieve the product identifier from a remote database. The captured video that may then be automatically annotated with the timestamp, product identifier, product quantity, and lane ID at step 970.
In some instances, the interface may receive a weight and a thumbnail image of each product. The weight of the product may be determined, for example, by determining an average weight over several weight measurements for the product, such as for example 10, 15, or some other number of weight measurements taken from a weight scale.
The components shown in
Mass storage device 1530, which may be implemented with a magnetic disk drive, an optical disk drive, a flash drive, or other device, is a non-volatile storage device for storing data and instructions for use by processor unit 1510. Mass storage device 1530 can store the system software for implementing embodiments of the present invention for purposes of loading that software into main memory 1520.
Portable storage device 1540 operates in conjunction with a portable non-volatile storage medium, such as a floppy disk, compact disk or Digital video disc, USB drive, memory card or stick, or other portable or removable memory, to input and output data and code to and from the computer system 1500 of
Input devices 1560 provide a portion of a user interface. Input devices 1560 may include an alpha-numeric keypad, such as a keyboard, for inputting alpha-numeric and other information, a pointing device such as a mouse, a trackball, stylus, cursor direction keys, microphone, touchscreen, accelerometer, and other input devices. Additionally, the system 1500 as shown in
Display system 1570 may include a liquid crystal display (LCD) or other suitable display device. Display system 1570 receives textual and graphical information and processes the information for output to the display device. Display system 1570 may also receive input as a touchscreen.
Peripherals 1580 may include any type of computer support device to add additional functionality to the computer system. For example, peripheral device(s) 1580 may include a modem or a router, printer, and other device.
The system of 1500 may also include, in some implementations, antennas, radio transmitters and radio receivers 1590. The antennas and radios may be implemented in devices such as smart phones, tablets, and other devices that may communicate wirelessly. The one or more antennas may operate at one or more radio frequencies suitable to send and receive data over cellular networks, Wi-Fi networks, commercial device networks such as a Bluetooth device, and other radio frequency networks. The devices may include one or more radio transmitters and receivers for processing signals sent and received using the antennas.
The components contained in the computer system 1500 of
The foregoing detailed description of the technology herein has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the technology to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. The described embodiments were chosen to best explain the principles of the technology and its practical application to thereby enable others skilled in the art to best utilize the technology in various embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the technology be defined by the claims appended hereto.