The field of the invention generally relates to techniques for enabling customers and other users to accurately identify items to be purchased at a retail facility, for example. One particular field of the invention relates to systems and methods for using visual appearance and weight information to augment universal product code (UPC) scans in order to insure that items are properly identified and accounted for at ring up.
In many traditional retail establishments, a cashier receives items to be purchased and scans them with a UPC scanner. The cashier insures that all the items are properly scanned before they are bagged. As some retail establishments incorporate customer self-checkout options, the customer assumes the responsibility of scanning and bagging items with little or no supervision by store personnel. A small percentage of customers have used this opportunity to defraud the store by bagging items without having scanned them or by swapping an item's UPC with the UPC of a lower priced item. Such activities cost retailers millions of dollars in lost income. There is therefore a need for safeguards to independently confirm that the checkout list is correct and discourage illegal activity while minimizing any inconvenience to the vast majority of honest and well-intentioned customers that properly scan their items.
The invention in the preferred embodiment features a system and method for using object recognition/verification and weight information to confirm the accuracy of a UPC scan, or to provide an affirmative recognition where no UPC scan was made. In a preferred embodiment, a checkout system comprises: a universal product code (UPC) scanner configured to generate a product identifier; at least one camera for capturing one or more images of an item; a database of features and images of known objects; an image processor configured to: extract a plurality of geometric point features from the one or more images; identify matches between the extracted geometric point features and the features of known objects; generate a geometric transform between the extracted geometric point features and the features of known objects for a subset of known objects corresponding to matches; and identify one of the known objects based on a best match of the geometric transform; and a transaction processor configured to execute one of a predetermined set of actions if the identified object is different than the product identifier. In some additional embodiments, the transaction processor maintains one or more lists identifying items that must always be visually verified or verified by weight, or need not be visually verified and/or weight verified.
The preferred embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, and in which:
Illustrated in
In
As shown in
Illustrated in
The UPC scanner and UPC decoder are well known to those skilled in the art and therefore not discussed in detail here. The UPC database, which is also well known in the prior art, includes item name, price, and the weight of the item in pounds for example. The one or more video cameras transmit image data to a feature extractor which selects and processes a subset of those images. In the preferred embodiment, the feature extractor extracts geometric point features such as scale-invariant feature transform (SIFT) features, which is discussed in more detail in context of
In addition to verification, the self-checkout system can also recognize an item of merchandise based on the visual appearance of the item without the UPC code. As described above, one or more images are acquired and geometric point features extracted from the images. The extracted features are compared to the visual features of known objects in the image database. The identity of the item as well as its UPC code can then be determined based on the number and quality of matching visual features, an accurate geometric transformation between the set of matching features of the image and a model, the quality of the normalized correlation of the image to the transformed model, or combination thereof. In the preferred embodiment, the checkout system can be configured to do either verification or recognition by a system administrator 360 at the store or remotely located via a network connection, or configured to automatically perform recognition operations if and when verification cannot be implemented due to the absence of a UPC scan for example.
The checkout system further includes a scale and weight processor for performing item verification based on weight. In the preferred embodiment, the measured weight of the object is compared to the known weight of the object retrieved from the UPC database. If the measured weight and retrieved weight match within a determined threshold, the weight processor transmits a signal to the transaction processor indicating whether the item weight is consistent or inconsistent with the UPC code on the item.
At the transaction processor, the UPC data, visual verification/recognition signal, weight verification signal, or combination thereof are processed for purposes of implementing the sales transaction. At a minimum, the transaction processor communicates via the customer interface 130 to display purchase information on the touch screen and facilitate the financial transactions of the payment device. In addition, the verification/recognition process intervenes in the transaction by alerting a cashier of a potential problem or temporarily stopping the transaction when attendant (e.g., cashier) intervention is required. As explained in more detail below, the transaction processor decides whether to intervene in a transaction based on the consistency of the UPC, visual data, weight data, or lesser combination thereof.
In the normal course of operations, a customer using the self-checkout system will hover the item to be purchased over the UPC scanner bed until an audible tone confirms that the UPC scanner read the code. The user then transfers the item to the belt conveyor or bag area where the item's weight is determined. One or more cameras capture images of the item before it is placed in the bag. As such, the checkout system can typically confirm both the weight and visual appearance of the scanned item. If all data is consistent, the item is added to the checkout list. If the data is inconsistent, the system may be configured to implement one or more of a general set of responses:
A) If the image processor determines that the item identified by the UPC scanner is different than that determined by the visual features, the system can prompt the customer to scan/re-scan the UPC, allow the item to pass and the transaction to continue with an increased alert level, generate an alert if the accumulated alert level exceeds a predetermined threshold, or lock the transaction and alert an attendant/cashier if necessary;
B) If the UPC of the item is moved to the bagging area before the UPC scanned but its identity determined through the object recognition methodology discussed herein, for example, the system can implement one of the actions above, tentatively add the identified item to the list of items being purchased, or ask the customer whether he/she wants to include the item in the check out list;
C) If the extracted visual features cannot be verified/recognized or are otherwise inconsistent with the UPC and weight, the system can implement the actions above or disregard the appearance of the item when the item associated with the UPC is inherently difficult or impractical to visualize, as is the case with small items like packs of gum or items with few unique visual features; and
D) If the weight of the item is inconsistent with the UPC and/or visual features of the item, the system can implement the actions above or disregard the weight measurement when the item associated with the UPC is difficult to accurately weigh or place on the scale, as is the case with lightweight items like greeting cards or like paper goods and with heavy items like cases of drinks.
In some embodiments, the action taken is based at least in part on the value of the difference in price between the UPC-identified item and the item identified based on visual features.
In some embodiments, a first list 352 of items whose visual appearance is ignored if inconsistent with the UPC and weight because of its unreliability; and second list 354 of items whose weight is ignored if inconsistent with the UPC and visual features, thereby intelligently determining if and when to continue with a transaction if some of the data acquired about the item is inconsistent. In contrast, the system may maintain one or more additional lists of items that must be visually verified or recognized, and a list of items whose weight must be verified in order for the item to be added to the checkout list. In the absence of this visual or weight verification, the transaction processor prompts the user to rescan the item, generate an alert, or lock the transaction.
Several flowcharts of representative procedures for acquiring product information and inconsistencies are shown in
Illustrated in
Illustrated in
Illustrated in
Illustrated in
Illustrated in
Illustrated in
Each of the DoG images is inspected to identify the pixel extrema including minima and maxima. To be selected, an extremum must possess the highest or lowest pixel intensity among the eight adjacent pixels in the same DoG image as well as the nine adjacent pixels in the two adjacent DoG images having the closest related band-pass filtering, i.e., the adjacent DoG images having the next highest scale and the next lowest scale if present. The identified extrema, which may be referred to herein as image “keypoints,” are associated with the center point of visual features. In some embodiments, an improved estimate of the location of each extremum within a DoG image may be determined through interpolation using a 3-dimensional quadratic function, for example, to improve feature matching and stability.
With each of the visual features localized, the local image properties are used to assign an orientation to each of the keypoints. By consistently assigning each of the features an orientation, different keypoints may be readily identified within different images even where the object with which the features are associated is displaced or rotated within the image. In the preferred embodiment, the orientation is derived from an orientation histogram formed from gradient orientations at all points within a circular window around the keypoint. As one skilled in the art will appreciate, it may be beneficial to weight the gradient magnitudes with a circularly-symmetric Gaussian weighting function where the gradients are based on non-adjacent pixels in the vicinity of a keypoint. The peak in the orientation histogram, which corresponds to a dominant direction of the gradients local to a keypoint, is assigned to be the feature's orientation.
With the orientation of each keypoint assigned, the feature extractor generates 408 a feature descriptor to characterize the image data in a region surrounding each identified keypoint at its respective orientation. In the preferred embodiment, the surrounding region within the associated DoG image is subdivided into an M×M array of subfields aligned with the keypoint's assigned orientation. Each subfield in turn is characterized by an orientation histogram having a plurality of bins, each bin representing the sum of the image's gradient magnitudes possessing a direction within a particular angular range and present within the associated subfield. As one skilled in the art will appreciate, generating the feature descriptor from the one DoG image in which the inter-scale extrema is located insures that the feature descriptor is largely independent of the scale at which the associated object is depicted in the images being compared. In the preferred embodiment, the feature descriptor includes a 128 byte array corresponding to a 4×4 array of subfields with each subfield including eight bins corresponding to an angular width of 45 degrees. The feature descriptor in the preferred embodiment further includes an identifier of the associated image, the scale of the DoG image in which the associated keypoint was identified, the orientation of the feature, and the geometric location of the keypoint in the associated DoG image.
The process of generating 1002 DoG images, localizing 1004 pixel extrema across the DoG images, assigning 1006 an orientation to each of the localized extrema, and generating 1008 a feature descriptor for each of the localized extrema may then be repeated for each of the two or more images received from the one or more cameras trained on the shopping cart passing through a checkout lane.
Illustrated in
With the features common to a model identified, the image processor determines 504 the geometric consistency between the combinations of matching features. In the preferred embodiment, a combination of features (referred to as “feature patterns”) is aligned using an affine transformation, which maps 1108 the coordinates of features of one image to the coordinates of the corresponding features in the model. If the feature patterns are associated with the same underlying object, the feature descriptors characterizing the object will geometrically align with small difference in the respective feature coordinates.
The degree to which a model matches (or fails to match) can be quantified in terms of a “residual error” computed 506 for each affine transform comparison. A small error signifies a close alignment between the feature patterns which may be due to the fact that the same underlying object is being depicted in the two images. In contrast, a large error generally indicates that the feature patterns do not align, although common feature descriptors match individually by coincidence. The one or more models with the smallest residual error is returned as the best match 1110.
The SIFT methodology described above has also been extensively taught in U.S. Pat. No. 6,711,293 issued Mar. 23, 2004, which is hereby incorporated by reference herein. The correlation methodology described above is also taught in U.S. patent application Ser. No. 11/849,503, filed Sep. 4, 2007, which is hereby incorporated by reference herein.
Another embodiment is directed to a system that implements a scale-invariant and rotation-invariant technique referred to as Speeded Up Robust Features (SURF). The SURF technique uses a Hessian matrix composed of box filters that operate on points of the image to determine the location of features as well as the scale of the image data at which the feature is an extremum in scale space. The box filters approximate Gaussian second order derivative filters. An orientation is assigned to the feature based on Gaussian-weighted, Haar-wavelet responses in the horizontal and vertical directions. A square aligned with the assigned orientation is centered about the point for purposes of generating a feature descriptor. Multiple Haar-wavelet responses are generated at multiple points for orthogonal directions in each of 4×4 sub-regions that make up the square. The sum of the wavelet response in each direction, together with the polarity and intensity information derived from the absolute values of the wavelet responses, yields a four-dimensional vector for each sub-region and a 64-length feature descriptor. SURF is taught in: Herbert Bay, Tinne Tuytelaars, Luc Van Gool, “SURF: Speeded Up Robust Features”, Proceedings of the ninth European Conference on Computer Vision, May 2006, which is hereby incorporated by reference herein.
One skilled in the art will appreciate that there are other feature detectors and feature descriptors that may be employed in combination with the embodiments described herein. Exemplary feature detectors include: the Harris detector which finds corner-like features at a fixed scale; the Harris-Laplace detector which uses a scale-adapted Harris function to localize points in scale-space (it then selects the points for which the Laplacian-of-Gaussian attains a maximum over scale); Hessian-Laplace localizes points in space at the local maxima of the Hessian determinant and in scale at the local maxima of the Laplacian-of-Gaussian; the Harris/Hessian Affine detector which does an affine adaptation of the Harris/Hessian Laplace detector using the second moment matrix; the Maximally Stable Extremal Regions detector which finds regions such that pixels inside the MSER have either higher (brighter extremal regions) or lower (dark extremal regions) intensity than all pixels on its outer boundary; the salient region detector which maximizes the entropy within the region, proposed by Kadir and Brady; and the edge-based region detector proposed by June et al.; and various affine-invariant feature detectors known to those skilled in the art.
Exemplary feature descriptors include: Shape Contexts which computes the distance and orientation histogram of other points relative to the interest point; Image Moments which generate descriptors by taking various higher order image moments; Jet Descriptors which generate higher order derivatives at the interest point; Gradient location and orientation histogram which uses a histogram of location and orientation of points in a window around the interest point; Gaussian derivatives; moment invariants; complex features; steerable filters; and phase-based local features known to those skilled in the art.
One or more embodiments may be implemented with one or more computer readable media, wherein each medium may be configured to include thereon data or computer executable instructions for manipulating data. The computer executable instructions include data structures, objects, programs, routines, or other program modules that may be accessed by a processing system, such as one associated with a general-purpose computer or processor capable of performing various different functions or one associated with a special-purpose computer capable of performing a limited number of functions. Computer executable instructions cause the processing system to perform a particular function or group of functions and are examples of program code means for implementing steps for methods disclosed herein. Furthermore, a particular sequence of the executable instructions provides an example of corresponding acts that may be used to implement such steps. Examples of computer readable media include random-access memory (“RAM”), read-only memory (“ROM”), programmable read-only memory (“PROM”), erasable programmable read-only memory (“EPROM”), electrically erasable programmable read-only memory (“EEPROM”), compact disk read-only memory (“CD-ROM”), or any other device or component that is capable of providing data or executable instructions that may be accessed by a processing system. Examples of mass storage devices incorporating computer readable media include hard disk drives, magnetic disk drives, tape drives, optical disk drives, and solid state memory chips, for example. The term processor as used herein refers to a number of processing devices including general purpose computers, special purpose computers, application-specific integrated circuit (ASIC), and digital/analog circuits with discrete components, for example.
Although the description above contains many specifications, these should not be construed as limiting the scope of the invention but as merely providing illustrations of some of the presently preferred embodiments.
Therefore, the invention has been disclosed by way of example and not limitation, and reference should be made to the following claims to determine the scope of the present invention.
This application is a continuation of U.S. application Ser. No. 12/229,069 filed Aug. 18, 2008, U.S. Pat. No. 7,909,248 that issued Mar. 22, 2011, which claims the benefit of U.S. Provisional Patent Application Ser. No. 60/965,086 filed Aug. 17, 2007, entitled “SELF CHECKOUT WITH VISUAL VERIFICATION,” each of which is hereby incorporated by reference herein for all purposes.
Number | Date | Country | |
---|---|---|---|
60965086 | Aug 2007 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12229069 | Aug 2008 | US |
Child | 13052965 | US |