The present disclosure relates generally to security systems, and more particularly, to unsupervised enrollment for an anti-theft facial recognition system.
In many retail stores, there is a need of identifying persons of interest (POI) in the context of theft incidents. Conventional facial recognition systems require multiple images and detailed supervision.
Thus, improvements in facial recognition systems are desired.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the DETAILED DESCRIPTION. This summary is not intended to identify key features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Thus, in an implementation, the security system 120 may include a memory storing instructions; and a processor in communication with the memory and configured to: receive a video having a plurality of frames of a security event; detect a human face in the plurality of frames, wherein the human face corresponds to a person of interest; generate a track of the human face of the person of interest, wherein the track includes a series of one or more face images of the human face from respective ones of the plurality of frames; generate a unique set of facial recognition biometric features for the human face of the person of interest based on analyzing the one or more face images in the track, extracting one or more potential facial recognition biometric features, and removing outlier ones of the one or more potential facial recognition biometric features to define the unique set of facial recognition biometric features; store the unique set of facial recognition biometric features for the human face in association with an identification of the person of interest a facial recognition database having a plurality of unique sets of facial recognition biometric features corresponding to a plurality of known human faces of a plurality of persons of interest; receive a new video including a new face; determine whether the new face detected in the new video includes a new set of facial recognition biometric features that is a match with one of the plurality of unique sets of facial recognition biometric features corresponding to the plurality of known human faces of a plurality of persons of interest; and generate an alert of an identified person of interest in response to determining the match.
In another implementation, the present disclosure may include a computer-readable medium storing instructions executable by a processor to perform one or more of the actions described herein.
Further aspects of the present disclosure are described in more details below.
The disclosed aspects will hereinafter be described in conjunction with the appended drawings, provided to illustrate and not to limit the disclosed aspects, wherein like designations denote like elements, and in which:
The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations and is not intended to represent the only configurations in which the concepts described herein may be practiced. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. In some instances, well known components may be shown in block diagram form in order to avoid obscuring such concepts.
In many retail stores, there is a need of identifying persons of interest (POI) in the context of theft incidents. Current facial recognition systems which are used for repeat offender detection require a trained human in order to enroll a person into the system. Single or multiple images of the person should be selected and inserted into the system by a trained personnel. The images are then embedded into a deep feature space and stored in a face recognition database, which is used for facial recognition of POIs.
The present disclosure addresses one or more shortcomings of current systems by providing a system for unsupervised enrollment of a POI into a facial recognition system so it could detect the POI on his next visit to the store. This type of system can extract faces from video frames and decide whether to store their features in a database without any human in the loop. In an implementation, the decision is based on each frames' features extracted by a facial recognition model, such as but not limited to a deep neural network (DNN), and thus no human intervention is needed. Thus, the present disclosure can be used without a trained professional that supervises the system. Furthermore, the present disclosure can reduce costs of enrollment to such a system.
In an aspect, the disclosed solution includes multiple parts in order to automatically process the theft video. The system includes a facial recognition model, such as a face detection DNN, which locates the face in each frame with high accuracy. Single or multiple detections are then tracked by a tracker, which enables finding correspondence between video frames and creates a track for each POI. A track is a series of face images belongs to an individual. Each track is then analyzed using a feature extractor DNN, which finds unique biometric features for each person. In an implementation, the extracted deep features are clustered per track using a clustering algorithm in order to remove outliers and improve the database quality. The chosen features are then stored in a facial recognition database and used for future POI recognition.
This solution is completely unsupervised so it overcomes the need for a human in the loop in order to choose the optimal images from a video clip. This has a clear advantage over prior solutions in both cost effectiveness and speed. Cost effectiveness is easy to understand due to reduction in operational cost when an operator in no longer needed. Speed is also a notable factor, as it enables the system to search for the POI within seconds of the incident. In contrast, the traditional solutions might take hours or even days until the images would be inserted into the system, thereby leaving open the possibility of a security breach during this period. Thus, the present solution may maintain maximum security for the store.
Therefore, by removing the requirement for a human in the loop, the present solution may provide a leap in cost effectiveness and/or speed, which enables stores to use such a system easily without the need to hire additional staff for this purpose.
Turning now to the figures, example aspects are depicted with reference to one or more components described herein, where components in dashed lines may be optional.
Referring to
For example, as individuals enter and exit the establishment 102, they pass through one or more pedestal scanners 108a and 108b. Goods such as item 112 that include an electronic tag 114 (e.g., a radio frequency identifier (RFID) tag, an acousto-magnetic tag, or any other type of electronic article surveillance device) may be scanned by the pedestal scanners 108a and 108b to determine whether the item 112 was paid for or not. For example, when the item 112 is paid for, the tag 114 may be removed or deactivated so that it will not be detected by the scanners 108a and 108b.
During this time, both the outside-facing camera 104 and inside-facing cameras 106 may be recording activity from each end. In some cases, the outside-facing camera 104 and the inside-facing camera 106 may be mounted on the one or more pedestal scanners 108a and 108b.
In some instances, the pedestal scanners 108a and 108b may detect that the item 112 having the electronic tag 114 is located near the scanners, and hence may be unpaid for and is being carried out of the store 102 by the individual 110. As such, the pedestal scanners 108a and 108b and/or the security system 120 may generate a security event signal 115, which may activate one or more notification devices 109, such as an audio alarm device, a strobe or flashing light device, and/or a notification message sent to security or store personnel. Concurrently, as the individual 110 is just prior to exiting the establishment 102, or as they are exiting, or after they have exited, inside-facing camera 106 and/or outside-facing camera 104 may have recorded video or photographic image frames of the individual 110.
Accordingly, in response to the security event signal 115, a POI clip obtainer module 121 in the security system 120 may obtain one or more image frame(s) 117 of the individual 110. For example, the one or more image frames 117 may be captured from a timeframe that spans before and after the security event signal 115, e.g., within a certain threshold time. The POI clip obtainer module 121 may provide the one or more image frames 117 to a facial recognition model 122.
The facial recognition model 122 is configured to identify a unique facial feature set 123 for the individual 110, and then determine whether the individual 110 is a confirmed POI 127 for future surveillance or can be classified as a potential POI 129 for future surveillance. In particular, the facial recognition model 122 may include a comparator 125 configured to compare the unique facial feature set 123 for the individual 110 with one or more confirmed POI facial feature sets 132 stored in a database 124, such as in a watch list 130 of confirmed POIs. If the comparator 125 determines a match, then the comparator 125 may classify the unique facial feature set 123 of the individual 110 a confirmed POI 127 and store the information and/or update some portion of the matching confirmed POI facial feature set 132 for the individual 110, e.g., if new information has been gathered. It should be understood that a match, as used herein, may not be an exact match but may refer to a probability of matching being above a matching threshold.
If there is no match to a confirmed POI facial feature set 132 in the watch list 130, then the comparator 125 may compare the unique facial feature set 123 for the individual 110 with one or more potential POI facial feature sets 128 stored in a potential POI list 126 in the database 124. If there is a match, then the comparator 125 may determine whether a POI threshold 131 has been met in order to classify the unique facial feature set 123 of the individual 110 as a confirmed POI 127 and store the unique facial feature set 123 of the individual 110 as one of the confirmed POI facial feature sets 132. For example, the POI threshold 131 may be a threshold number of times a unique facial feature set 123 must be identified before being considered a confirmed POI 127. For instance, the POI threshold 131 may be set to 2 (or more) so that an accidental triggering of the security event signal 115 by the individual 110 does not cause the facial feature set of the individual to be placed in the watchlist 130. In other words, the POI threshold 131 may represent a minimum number of times that an individual 110 has been identified by the facial recognition model 122 in response to the one or more pedestals 108a and 108b detecting a potential theft of one or more items 112 having the tag 11. If the individual 110 has been identified by the facial recognition model 122 and is listed in the potential POI list 126 for the POI threshold 131 defined number of times, e.g., which could be tracked by a counter associated with the potential POI facial feature sets 128, then the unique facial feature set 123 of the individual 110 may be moved from the potential POI list 126 to the watchlist 130. In other words, movement to the watchlist 130 may represent an identified pattern of potential theft of items from the establishment 102 by the individual 110 corresponding to the confirmed POI facial feature set 132. Alternatively, if the comparator 125 determines that the POI threshold 131 has not been met, then the comparator may classify the unique facial feature set 123 of the individual 110 as a potential POI 129 and store the unique facial feature set 123 of the individual 110 as one of the potential POI facial feature sets 128.
Thus, based upon the above procedure, the security system 120 has learned the facial features of potential thieves, and is setup to perform future surveillance and generate an alert 133 to notify store personnel whenever the individual 110 having the confirmed POI facial feature set 132 in the watchlist 130 enters the store 102.
For example, subsequently, after the individual 110 is added to the watchlist 130, the security system 120, and more particularly, the facial recognition model 122, may identify the individual 110 using one or more image frames 117 from the outside-facing camera 104 prior to or upon the individual 110 entering the store 102. More specifically, the facial recognition model 122 may identify the unique facial feature set 123 and the comparator 125 may determine correspondence with the confirmed POI 127 (e.g., already in the watchlist 130 or based on meeting the POI threshold 131). In response to this proactive identification, and before any security event signal 115 is generated, the comparator 125 of the security system 120 may generate the alert 133. For example, the alert 133 may be a message (e.g., text message, e-mail), an audio notification (e.g., a message broadcast by a speaker, a voice mail), a visual notification (a light that is turned on), and/or a haptic notification that is transmitted to a device used by a store personnel (e.g., a point of sale terminal, a computer, a phone (wired/wireless/cellular), a device in visual or audio range of the store personnel, or any other type of device that may be able to communicate information about the presence of the confirmed POI 127 to the personnel or security members associated with the establishment 102.
In other words, the security system 120 provides a proactive approach, whereby the security system 120 may identify the individual 110 based on a learned pattern of potential theft of items 112 at the establishment 102 (or other establishments sharing watchlists 130, e.g., via wired or wireless communication over a network such as the Internet and/or a cellular network), and alert 130 store or security personnel for appropriate action. In some implementations, the potential POI list 126 and/or the watchlist 128 may include any type of electronic data extracted from the image clip, such as but not limited to image data (e.g., one or more video clips, one or more photos) of the individual 110 using both the inside-facing 106 and outside-facing camera 104. For example, the extracted electronic data may aid the store or security personnel in tracking the confirmed POI 127 and/or in identifying previous types of stolen goods.
Referring to
At block 202, for example, the facial recognition model 122 gets a video clip of a theft incident, which can be automatically generated by a store pedestal alarm equipped with 2 cameras. In some cases, the video origin may be a camera mounted on the pedestal facing the inner side of the store. When the pedestal alarm goes off, a video from a few seconds prior to the incident is sent from the video recording system to the system. It should be understood that while this example may refer to video, the system may likewise receive one or more image frames, including one or more photographs.
At block 204, for example, in an implementation, raw video frames are passed through a face detection DNN. The DNN can find single or multiple human faces with very high accuracy in a single video frame. All video frames are processed using the proposed detection network which outputs bounding box coordinates around each face. In case no face was found in the frame, it may be discarded immediately. In other cases where one or more faces were found, the output serves as an input to the next component in the system, which is a tracking algorithm (or tracker) at block 206.
At block 206, the role of the tracker is to perform inter frame correspondence between faces and thus create meaningful sequences through time that include a face of a single person. The output of the tracker is correspondence between each bounding box to a track identifier (id). If more than one bounding box is present in a frame, it is the tracker's responsibility to decide which bounding box corresponds to each track id. The decision is usually based on minimal intersection over union (IOU) criteria. In most cases, different tracks could start and/or finish on different frames, thus the tracker has a decision for each bounding box it gets. This decision is whether the current bounding box belongs to an already existing track or whether the tracker needs to open a new unique track id. Closing old tracks is also done by the tracker in the same manner (IOU criteria).
As such, at block 208 in this implementation, each track generates a series of bounding boxes, which may be received by a feature extraction component. The feature extraction component can crop the series of bounding boxes for each track from the video clip and analyze them separately. The cropped faces may then be processed using another DNN. This kind of DNN is an embedding function that converts a cropped face image into a high dimensional vector often referred to as the embedding vector. This compact representation captures the unique face features and enables the facial recognition model 122 to distinguish between people. The embedding vectors will be used as a reference database, e.g., the watchlist 130, by the facial recognition model 122 operating with the outside-facing camera 104 overlooking the store entrance. This camera would be able to recognize the POI on their return to the store and alert the store stuff.
At block 210, an outlier removal component 210 is added to increase the performance of such an unsupervised system. In traditional systems, a human would cherry pick the best captures in the images of the POI so the performance would be optimal. A scenario where random images are present in the facial recognition database could lead to poor performance and recognition of random people as POIs. Avoiding such a scenario, the facial recognition model 122 uses a clustering algorithm, which has the ability to efficiently find cluster outliers. Removing outliers by storing only the core of the cluster in the database, at block 212, enables the system to overcome the limitations of detection and tracking mistakes, generally leading to a major increase in the overall performance.
At block 214, the facial recognition model 122 implements a monitoring component to watch for POIs using the facial recognition functionality described above.
Referring to
The processor 302 may be a micro-controller, an application-specific integrated circuit (ASIC), or a field-programmable gate array (FPGA), and/or may include a single or multiple set of processors or multi-core processors. Moreover, the processor 302 may be implemented as an integrated processing system and/or a distributed processing system. The computing device 300 may further include a memory 304, such as for storing local versions of applications being executed by the processor 302, related instructions, parameters, etc. The memory 304 may include a type of memory usable by a computer, such as random access memory (RAM), read only memory (ROM), tapes, magnetic discs, optical discs, volatile memory, non-volatile memory, and any combination thereof. Additionally, the processor 302 and the memory 304 may include and execute an operating system executing on the processor 302, one or more applications, display drivers, etc., and/or other components of the computing device 300.
Further, the computing device 300 may include a communications component 306 that provides for establishing and maintaining communications with one or more other devices, parties, entities, etc. utilizing hardware, software, and services. The communications component 306 may carry communications between components on the computing device 300, as well as between the computing device 300 and external devices, such as devices located across a communications network and/or devices serially or locally connected to the computing device 300. In an aspect, for example, the communications component 306 may include one or more buses, and may further include transmit chain components and receive chain components associated with a wireless or wired transmitter and receiver, respectively, operable for interfacing with external devices.
Additionally, the computing device 300 may include a data store 308, which can be any suitable combination of hardware and/or software, that provides for mass storage of information, databases, and programs. For example, the data store 308 may be or may include a data repository for applications and/or related parameters not currently being executed by processor 302. In addition, the data store 308 may be a data repository for an operating system, application, display driver, etc., executing on the processor 302, and/or one or more other components of the computing device 300.
The computing device 300 may also include a user interface component 310 operable to receive inputs from a user of the computing device 300 and further operable to generate outputs for presentation to the user (e.g., via a display interface to a display device). The user interface component 310 may include one or more input devices, including but not limited to a keyboard, a number pad, a mouse, a touch-sensitive display, a navigation key, a function key, a microphone, a voice recognition component, or any other mechanism capable of receiving an input from a user, or any combination thereof. Further, the user interface component 310 may include one or more output devices, including but not limited to a display interface, a speaker, a haptic feedback mechanism, a printer, any other mechanism capable of presenting an output to a user, or any combination thereof.
Referring to
The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein, but is to be accorded the full scope consistent with the language claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects. Unless specifically stated otherwise, the term “some” refers to one or more. Combinations such as “at least one of A, B, or C,” “one or more of A, B, or C,” “at least one of A, B, and C,” “one or more of A, B, and C,” and “A, B, C, or any combination thereof” include any combination of A, B, and/or C, and may include multiples of A, multiples of B, or multiples of C. Specifically, combinations such as “at least one of A, B, or C,” “one or more of A, B, or C,” “at least one of A, B, and C,” “one or more of A, B, and C,” and “A, B, C, or any combination thereof” may be A only, B only, C only, A and B, A and C, B and C, or A and B and C, where any such combinations may contain one or more member or members of A, B, or C. All structural and functional equivalents to the elements of the various aspects described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims. The words “module,” “mechanism,” “element,” “device,” and the like may not be a substitute for the word “means.” As such, no claim element is to be construed as a means plus function unless the element is expressly recited using the phrase “means for.”
This application claims the benefit of U.S. Provisional Application Ser. No. 62/857,097, entitled “UNSUPERVISED ENROLLMENT FOR ANTI-THEFT FACIAL RECOGNITION SYSTEM” and filed on Jun. 4, 2019, which is expressly incorporated by reference herein in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2020/036075 | 6/4/2020 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62857097 | Jun 2019 | US |