The present disclosure generally relates to security measures in devices and more particularly, to systems and methods for anti-spoofing using feature point analysis across multiple image frames.
Given the extensive use of smartphones and other computing devices in daily activities, such devices typically contain sensitive data and allow users to access mobile payment applications and other services. As such, there is an ongoing need for incorporating improved security measures to prevent unauthorized access to such devices.
In accordance with one embodiment, a computing device captures a live video of a user. For a first frame of the live video, the computing device identifies a first facial region of the user, determines a first plurality of regions of interest within the first facial region, and identifies a plurality of feature points for each of the first plurality of regions of interest. For a second frame of the live video, the computing device identifies a second facial region of the user, determines a second plurality of regions of interest within the second facial region, and identifies a plurality of feature points for each of the second plurality of regions of interest, wherein locations of the feature points in the second plurality of regions of interest in the second frame coincide with locations of the feature points in the first plurality of regions of interest in the first frame. The computing device generates a perspective transform matrix based on locations of the feature points in the second plurality of regions of interest in the second frame and locations of the feature points in the first plurality of regions of interest in the first frame to generate transformed coordinates of a plurality of first background feature points. The computing device determines a difference value between coordinates of a plurality of second background feature points and the transformed coordinates of the plurality of first background feature points. The computing device determines whether the user is spoofing the computing device to unlock the computing device based on the difference value.
Another embodiment is a system that comprises a memory storing instructions and a processor coupled to the memory. The processor is configured by the instructions and captures a live video of a user. For a first frame of the live video, the processor is configured identify a first facial region of the user, determine a first plurality of regions of interest within the first facial region, and identify a plurality of feature points for each of the first plurality of regions of interest. For a second frame of the live video, the processor is configured to identify a second facial region of the user, determine a second plurality of regions of interest within the second facial region, and identify a plurality of feature points for each of the second plurality of regions of interest, wherein locations of the feature points in the second plurality of regions of interest in the second frame coincide with locations of the feature points in the first plurality of regions of interest in the first frame. The processor is configured to generate a perspective transform matrix based on locations of the feature points in the second plurality of regions of interest in the second frame and locations of the feature points in the first plurality of regions of interest in the first frame to generate transformed coordinates of a plurality of first background feature points. The processor is configured to determine a difference value between coordinates of a plurality of second background feature points and the transformed coordinates of the plurality of first background feature points. The processor is configured to determine whether the user is spoofing the system to unlock the system based on the difference value.
Another embodiment is a non-transitory computer-readable storage medium storing instructions to be implemented by a computing device having a processor, wherein the instructions, when executed by the processor, cause the computing device to capture a live video of a user. For a first frame of the live video, the processor is configured identify a first facial region of the user, determine a first plurality of regions of interest within the first facial region, and identify a plurality of feature points for each of the first plurality of regions of interest. For a second frame of the live video, the processor is configured to identify a second facial region of the user, determine a second plurality of regions of interest within the second facial region, and identify a plurality of feature points for each of the second plurality of regions of interest, wherein locations of the feature points in the second plurality of regions of interest in the second frame coincide with locations of the feature points in the first plurality of regions of interest in the first frame. The processor is configured to generate a perspective transform matrix based on locations of the feature points in the second plurality of regions of interest in the second frame and locations of the feature points in the first plurality of regions of interest in the first frame to generate transformed coordinates of a plurality of first background feature points. The processor is configured to determine a difference value between coordinates of a plurality of second background feature points and the transformed coordinates of the plurality of first background feature points. The processor is configured to determine whether the user is spoofing the computing device to unlock the computing device based on the difference value.
Other systems, methods, features, and advantages of the present disclosure will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present disclosure, and be protected by the accompanying claims.
Various aspects of the disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, with emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views.
A description of a system for implementing anti-spoofing protection using multi-frame feature point analysis is described followed by a discussion of the operation of the components within the system. An improved anti-spoofing technique implemented in a computing device is disclosed for preventing unauthorized access of personal devices that allow users to unlock the devices using an image of the user's facial region. Some computing devices are vulnerable to spoofing attempts by unauthorized users using images or videos of the owners of the devices. The anti-spoofing technique disclosed herein provide an improvement over existing facial recognition technologies.
The security service 104 detects attempts to spoof the computing device 102 using a distance metric involving feature points within the facial region. The feature point detector 106 is configured to obtain a live video 118 of the user using, for example, a front facing camera on the computing device 102 and store the video 118 in a data store 116. The video 118 stored in the data store 116 may be encoded in formats including, but not limited to, Motion Picture Experts Group (MPEG)-1, MPEG-2, MPEG-4, H.264, Third Generation Partnership Project (3GPP), 3GPP-2, Standard-Definition Video (SD-Video), High-Definition Video (HD-Video), Digital Versatile Disc (DVD) multimedia, Video Compact Disc (VCD) multimedia, High-Definition Digital Versatile Disc (HD-DVD) multimedia, Digital Television Video/High-definition Digital Television (DTV/HDTV) multimedia, Audio Video Interleave (AVI), Digital Video (DV), QuickTime (QT) file, Windows Media Video (WMV), Advanced System Format (ASF), Real Media (RM), Flash Media (FLV), an MPEG Audio Layer III (MP3), an MPEG Audio Layer II (MP2), Waveform Audio Format (WAV), Windows Media Audio (WMA), 360 degree video, 3D scan model, or any number of other digital formats.
For a first frame of the captured video 118, the feature point detector 106 identifies a first facial region of the user and determines a first plurality of regions of interest within the first facial region. This may be achieved using a scale-invariant feature transform (SIFT) algorithm, a speeded up robust features (SURF) algorithm, or other feature points detection algorithm. The feature point detector 106 is further configured to identify a plurality of feature points for each of the first plurality of regions of interest.
For a second frame of the capture video 118, the feature point detector 106 similarly identifies a second facial region of the user and determines a second plurality of regions of interest within the second facial region. The feature point detector 106 is further configured to identify a plurality of feature points for each of the second plurality of regions of interest, and determine where the locations of the feature points in the second plurality of regions of interest in the second frame coincide with the locations of the feature points in the first plurality of regions of interest in the first frame. The one or more feature points may similarly be identified using a scale-invariant feature transform (SIFT) algorithm, a speeded up robust features (SURF) algorithm, or other feature points detection algorithms. For some embodiments, the feature point detector 106 identifies a second facial region of the user and determines a second plurality of regions of interest within the second facial region based on the first plurality of regions of interest within the first facial region where the first plurality of regions correspond to the second plurality of regions.
The conversion module 108 is configured to generate a perspective transform matrix based on the locations of the feature points in the second plurality of regions of interest in the second frame and the locations of the feature points in the first plurality of regions of interest in the first frame to generate transformed coordinates of a plurality of first background feature points.
The distance calculator 110 determines a difference value between the coordinates of a plurality of second background feature points and the transformed coordinates of the plurality of first background feature points. Based on the value of this difference value, the spoofing detector 112 determines whether the user is spoofing the computing device 102 in an attempt to unlock the computing device 102. For some embodiments, the spoofing detector 112 determines that the user is spoofing the computing device 102 when the difference value is less than a threshold value.
For some embodiments, the security service 104 may also detect attempts to spoof the computing device 102 using a distance metric involving feature points located outside the facial region. For the first frame of the live video, the feature point detector 106 determines a plurality of first background feature points outside the facial region. Similarly, for the second frame of the live video, the feature point detector 106 determines a plurality of second background feature points outside the facial region where the locations of the first background feature points coincide with locations of the second background feature points. The feature point detector 106 then generates the transformed coordinates of the plurality of first background points based on the perspective transform matrix and the plurality of first background feature points.
The processing device 202 may include a custom made processor, a central processing unit (CPU), or an auxiliary processor among several processors associated with the computing device 102, a semiconductor based microprocessor (in the form of a microchip), a macroprocessor, one or more application specific integrated circuits (ASICs), a plurality of suitably configured digital logic gates, and so forth.
The memory 214 may include one or a combination of volatile memory elements (e.g., random-access memory (RAM, such as DRAM, and SRAM, etc.)) and nonvolatile memory elements (e.g., ROM, hard drive, tape, CDROM, etc.). The memory 214 typically comprises a native operating system 216, one or more native applications, emulation systems, or emulated applications for any of a variety of operating systems and/or emulated hardware platforms, emulated operating systems, etc. For example, the applications may include application specific software which may comprise some or all the components of the computing device 102 displayed in
In accordance with such embodiments, the components are stored in memory 214 and executed by the processing device 202, thereby causing the processing device 202 to perform the operations/functions disclosed herein. For some embodiments, the components in the computing device 102 may be implemented by hardware and/or software.
Input/output interfaces 204 provide interfaces for the input and output of data. For example, where the computing device 102 comprises a personal computer, these components may interface with one or more user input/output interfaces 204, which may comprise a keyboard or a mouse, as shown in
In the context of this disclosure, a non-transitory computer-readable medium stores programs for use by or in connection with an instruction execution system, apparatus, or device. More specific examples of a computer-readable medium may include by way of example and without limitation: a portable computer diskette, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM, EEPROM, or Flash memory), and a portable compact disc read-only memory (CDROM) (optical).
Reference is made to
Although the flowchart 300 of
At block 302, the computing device 102 captures a live video of a user. For a first frame of the live video, the computing device 102 identifies a first facial region of the user (block 304), determines a first plurality of regions of interest within the first facial region (block 306), and identifies a plurality of feature points for each of the first plurality of regions of interest (block 308). During the first frame, the computing device 102 may also determine a plurality of first background feature points outside the first facial region.
For a second frame of the live video, the computing device 102 identifies a second facial region of the user (block 310), determines a second plurality of regions of interest within the second facial region (block 312), and identifies a plurality of feature points for each of the second plurality of regions of interest (block 314). The first facial region of the user corresponds to the second facial region of the user. The locations of the feature points in the second plurality of regions of interest in the second frame coincide with locations of the feature points in the first plurality of regions of interest in the first frame. The computing device 102 detects the feature points in the first plurality of regions of interest and the feature points in the second plurality of regions of interest within the first facial region using a scale-invariant feature transform (SIFT) algorithm, a speeded up robust features (SURF) algorithm, or other feature points detection algorithm. During the second frame, the computing device 102 may also determine a plurality of second background feature points outside the second facial region, where the locations of the first background feature points coincide with the locations of the second background feature points.
At block 316, the computing device 102 generates a perspective transform matrix based on locations of the feature points in the second plurality of regions of interest in the second frame and locations of the feature points in the first plurality of regions of interest in the first frame to generate transformed coordinates of a plurality of first background feature points. For some embodiments, the computing device 102 generates the transformed coordinates of the plurality of first background points based on the perspective transform matrix and the plurality of first background feature points.
At block 318, the computing device 102 determines a difference value between coordinates of a plurality of second background feature points and the transformed coordinates of the plurality of first background feature points. At block 320, the computing device 102 determines whether the user is spoofing the computing device 102 based on the difference value. For some embodiments, the computing device 102 determines that the user is spoofing the computing device 102 when the difference value is less than a threshold value. Thereafter, the process in
To further illustrate various aspects of the present invention, reference is made to the following figures.
As shown in
The computing device 102 generates a perspective transform matrix M based on the locations of the feature points (Pin2) in the second plurality of regions of interest in the second frame and locations of the feature points (Pin1) in the first plurality of regions of interest in the first frame. At least four feature points are utilized for determining the perspective transform matrix M.
The computing device 102 then determines a plurality of second background feature points (Pout2) outside the facial region, where the locations of the first background feature points (Pout1) coincide with locations of the second background feature points (Pout2). The computing device 102 performs the perspective transform M on the plurality of first background feature points (Pout1) to generate transformed coordinates of the plurality of first background point Pout1′ where the perspective transform M is generated based on locations of the feature points (Pin1) in the first plurality of regions of interest in the first frame and locations of the feature points (Pin2) in the second plurality of regions of interest in the second frame. The computing device 102 calculates a difference value D(diff) between transformed coordinates of the plurality of first background point Pout1′ and coordinates of the second background points in the second frame (Pout2).
For some embodiments, the spoofing detector 112 (
The original coordinates of the background feature points 602 in the first frame (Frame #1) is represented by (Pout1). A transformed coordinate of (Pout1) is represented by Pout1″ where the transformed coordinate is calculated by M×Pout1=Pout1″, where M represents a perspective transform matrix. The computing device 102 determines a difference value represented by D(diff) by calculating a difference between coordinates of the second background feature points 604 in the second frame (Frame #N) and the transformed coordinates of the first background feature points. Note that the first frame and the second frame may comprise adjacent frames or non-adjacent frames.
The spoofing detector 112 (
It should be emphasized that the above-described embodiments of the present disclosure are merely possible examples of implementations set forth for a clear understanding of the principles of the disclosure. Many variations and modifications may be made to the above-described embodiment(s) without departing substantially from the spirit and principles of the disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.
This application claims priority to, and the benefit of, U.S. Provisional Patent Application entitled, “Method of Anti-Spoofing with Feature Pairs,” having Ser. No. 62/984,485, filed on Mar. 3, 2020, which is incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62984485 | Mar 2020 | US |