This application claims the benefit under 35 U.S.C. § 119(a) of the filing date of Chinese Patent Application No. 202211684312.X, filed in the Chinese Patent Office on Dec. 27, 2022. The disclosure of the foregoing application is herein incorporated by reference in its entirety.
Embodiments of the present disclosure relate to the computer technologies field, and, more particularly, a face anti-spoofing method, device, and computer-readable storage medium.
With the wide application of face recognition, face unlocking, face payment, and other technologies in finance, access control, mobile devices, and other daily life, face anti-spoofing/liveness detection technology has gained more and more attention in recent years.
The face anti-spoofing technology is mainly used to determine whether the face appearing in front of the machine is real or fake, wherein any face presented with the help of other media can be defined as a fake face, including printed paper photographs, display screens of electronic products, silicone masks, and 3D (three-dimensional) portraits.
Currently, mainstream face anti-spoofing solutions are categorized into cooperative anti-spoofing and non-cooperative anti-spoofing (silent anti-spoofing detection). Cooperative anti-spoofing requires the user to complete specified actions in response to a prompt, and then the anti-spoofing is performed. Cooperative anti-spoofing uses face key points and face tracking technology through a combination of cooperative actions such as eyes blinking, mouth opening, head shaking, head nodding, etc., wherein the ratio of the changed distance to the unchanged distance is calculated through consecutive images and the image of the previous frame and the image of the next frame is compared, to verify that whether the user is a real live person or not. Silent anti-spoofing performs liveness verification directly without the user's senses. Silent anti-spoofing does not require the user to perform additional actions, and directly screens paper photos, screen images, face masks and other fake face attacks based on algorithms. Compared with cooperative anti-spoofing, silent anti-spoofing detection has a better user experience and works faster, which can directly perform anti-spoofing in a senseless way. In actual use, the scheme can be selected according to the specific scenario, such as gates, access control, ticket inspection, and other scenarios requiring higher detection speed, silent anti-spoofing is generally recommended.
The main technical problems of the anti-spoofing in existing techniques lie in that the detection accuracy is not high, and the present disclosure aims to provide a new anti-spoofing technical solution that can significantly improve liveness detection accuracy.
A face anti-spoofing method, device, and computer-readable storage medium is provided in the embodiments of the present disclosure, to solve or at least partially solve the liveness detection accuracy problem in existing techniques.
In an embodiment of the present disclosure, a face anti-spoofing method is provided, comprising: extracting features from a face image; calculating an anti-spoofing result of the face image in a pre-determined manner based on the features of the face image; performing a feature visualization process on the features of the face image to obtain a first three-dimensional depth image; comparing the first three-dimensional depth image with a pre-generated second three-dimensional depth image corresponding to the face image; and adjusting the anti-spoofing analysis result based on the comparison result, and determining whether or not the face image is from a living body according to the adjusted anti-spoofing analysis result.
In some embodiments, before said extracting features from a face image, the method further comprises: performing face recognition on the face image, and drawing a first box in the face image according to the recognition result; processing the first box to obtain a second box according to a predetermined expansion ratio; said extracting features from a face image further includes: extracting the features from a region of the face image located within the second box.
In some embodiments, said processing the first box to obtain a second box according to a predetermined expansion ratio comprises: generating a square third box based on the short side of the first box; enlarging the third box according to the expansion ratio to obtain the second box.
In some embodiments, before said comparing the first three-dimensional depth image with a pre-generated second three-dimensional depth image corresponding to the face image, the method further comprises: generating the second three-dimensional depth image based on a region of the face image located within the second box.
In some embodiments, said generating a second three-dimensional depth image comprises: generating a binary rectangular mask based on the first box; calculating a second position of the binary rectangular mask in the second three-dimensional depth image based on the first position of the first box in the face image; and processing the second three-dimensional depth image with the binary rectangular mask based on the second position.
In some embodiments, said processing the second three-dimensional depth image with the binary rectangular mask comprises: setting the areas covered by the binary rectangular mask in the second three-dimensional depth image to 1, and setting the uncovered areas to 0.
In some embodiments, said comparing the first three-dimensional depth image with a pre-generated second three-dimensional depth image corresponding to the face image comprises: calculating a difference between the first three-dimensional depth image and the second three-dimensional depth image; and said adjusting the anti-spoofing analysis result based on the comparison result comprises: setting a weight based on the difference between the first three-dimensional depth image and the second three-dimensional depth image, and adjusting the anti-spoofing analysis result.
In some embodiments, said calculating an anti-spoofing analysis result of the face image in a predetermined manner comprises: using the sigmoid function to calculate the anti-spoofing analysis result.
In an embodiment of the present disclosure, a face anti-spoofing detection device is provided, comprising: a feature extraction module for extracting features from a face image; a face analysis module for calculating an anti-spoofing analysis result of the face image in a predetermined manner based on the features of the face image; a feature visualization module for performing a feature visualization process on the features of the face image to obtain a first three-dimensional depth image; a comparison module for comparing the first three-dimensional depth image with a pre-generated second three-dimensional depth image corresponding to the face image; and a result adjustment module for adjusting the anti-spoofing analysis result based on the comparison result, and determining whether or not the face image is from a living body according to the adjusted anti-spoofing analysis result.
In an embodiment of the present disclosure, a computer-readable storage medium having a plurality of instructions stored thereon is provided, the instructions being adapted to be loaded and run by a processor to perform any one of the face anti-spoofing methods provided in the above embodiments.
One or more of the above embodiments of the present disclosure have at least one or more of the following beneficial effects:
Unlike the technical solutions in existing techniques, in the embodiments of the present disclosure, after the anti-spoofing analysis is performed based on the features of the face image and the results are obtained, it does not directly determine whether the face image is from a living body or not, but instead the feature visualization technique is used to generate a three-dimensional depth image at the feature extraction stage, which is compared to a pre-generated real three-dimensional depth image matched with the face image. If the two images are similar to each other, the comparison could indicate that the extracted features can better reflect the three-dimensionality of the face, and more accurate detection results can be obtained for liveness detection. However, if the difference between the two images is greater, the difference could indicate that the extracted features cannot accurately reflect the three-dimensionality of the face, and it could be difficult to obtain more accurate detection results for liveness detection. The embodiments of the present disclosure actually use the depth image as supervisory information to ensure that accurate liveness detection results can be obtained.
Referring to the drawings, the present disclosure will be easier to understand. It will be readily understood by those skilled in the art that these drawings are used for illustrative purposes only and are not intended to limit the protection scope of the present disclosure. Wherein:
The technical solutions provided by embodiments of the present disclosure are described below with reference to the drawings. It should be understood by those skilled in the art that these embodiments are used only to explain the technical principles of the present disclosure and are not intended to limit the protection scope of the present disclosure.
In the description of the present disclosure, “module” or “processor” may include hardware, software, or a combination thereof. A module may include hardware circuitry, various suitable sensors, communication ports, memory, a software component, such as instructions, or a combination of software and hardware. The processor may be a central processor, a microprocessor, an image processor, a digital signal processor, or any other suitable processor. The processor has data and/or signal processing capabilities. The processor may be implemented in software, hardware, or a combination. A non-transitory computer-readable storage medium includes any suitable medium for storing instructions, such as a diskette, a hard disk, a CD-ROM, a flash memory, a read-only memory, a random-access memory, and the like. The term “A and/or B” denotes all possible combinations of A and B, such as just A, just B, or A and B. The term “at least one A or B” or “at least one of A and B” has the same meaning as “at least one A or B”. The terms “at least one A or B” or “at least one of A and B” are similar to “A and/or B” and can include just A, just B, or A and B. The singular forms of the terms “a”, “this”, and “the” can also include plural forms.
As shown in
In S110, the features are extracted from a face image.
In the present embodiment, the face image is usually an RGB (a type of color standard) image.
In S120, an anti-spoofing analysis result of the face image is calculated in a predetermined manner based on the features of the face image.
In the present embodiment, a model which is formed as a combination of several stacked CNN (Convolutional Neural Network) modules is used to extract features from the face image. The model can keep the model structure and the number of parameters small so as to be fast and good.
In S130, a feature visualization process is performed on the features of the face image to obtain the first three-dimensional depth image.
In the present embodiment, a feature visualization module is preset to generate a 3D depth image (i.e., the first three-dimensional depth image) corresponding to the features by feature visualization technique (a technique in the deep learning field).
In S140, the first three-dimensional depth image is compared with the pre-generated second three-dimensional depth image corresponding to the face image.
In the present embodiment, the real 3D depth image matched with the face image is used as the supervisory information. In the present embodiment, the depth image obtained by feature visualization is compared with the supervisory depth image, and the smaller the differences, the better the extracted features; on the contrary, the larger the differences, the worse the extracted features.
In S150, the anti-spoofing analysis result is adjusted based on the comparison result, and whether or not the face image is from a living body is determined according to the adjusted anti-spoofing analysis result.
According to the technical solution of the present embodiment, after the anti-spoofing analysis is performed based on the features of the face image and the results are obtained. The described technique does not directly determine whether the face image is from a living body or not, but instead the feature visualization technique is used to generate a three-dimensional depth image at the feature extraction stage, which is compared to a pre-generated real three-dimensional depth image matched with the face image. If the two images are similar to each other, the comparison could indicate that the extracted features can better reflect the three-dimensionality of the face, and more accurate detection results can be obtained for liveness detection. However, if the difference between the two images is greater, the difference could indicate that the extracted features cannot accurately reflect the three-dimensionality of the face, and it could be difficult to obtain more accurate detection results for liveness detection. The embodiments of the present disclosure actually use the depth image as supervisory information to ensure that accurate liveness detection results can be obtained.
As shown in
In S210, face recognition is performed on the face image, and a first box is drawn in the face image according to the recognition result.
In the present embodiment, face detection is performed on the face image, and the Bounding box (bbox_1), i.e., the first box, is obtained, and its coordinate information is determined. The bbox_1 obtained by face detection contains only the five sense organs of the face.
In S220, the first box is processed to obtain the second box according to a predetermined expansion ratio. Specifically, The described technique generates the square third box based on the short side of the first box; and enlarges the third box according to the expansion ratio to obtain the second box.
In the present embodiment, the short side of bbox-1 is taken to form a square bbox_2 (to avoid distorting the face when changing the face size), i.e., the third box, and then an expansion ratio scale_size is set, e.g., scale_size=2.7. The cropped face will be larger than the original bbox_1, and it will be able to completely cover the face, while having a bit of background information.
In S230, the second three-dimensional depth image is generated based on a region of the face image located within the second box.
Specifically, a binary rectangular mask is generated based on the first box. The second position of the binary rectangular mask in the second three-dimensional depth image is calculated based on the first position of the first box in the face image. The second three-dimensional depth image is processed with the binary rectangular mask based on the second position, and the areas covered by the binary rectangular mask in the second three-dimensional depth image are set to 1, while the uncovered areas are set to 0.
In the present embodiment, a face 3D depth model is preset and trained to generate a corresponding 3D depth image based on the face image. In the present embodiment, a binary rectangular mask is generated based on bbox_1, and the position of the binary rectangular mask in the 3D depth image is determined based on the coordinates of bbox_1. The portion outside the binary rectangular mask is taken as background and set to 0, and the portion covered by the binary rectangular mask is set to 1.
In S240, the features are extracted from a region of the face image that is located within the second box.
In S250, a feature visualization process is performed on the features of the face image, to obtain the first three-dimensional depth image.
In S260, an anti-spoofing analysis result of the face image is calculated in a predetermined manner. Specifically, the sigmoid function is used to calculate the anti-spoofing analysis result.
In S270, the difference between the first three-dimensional depth image and the second three-dimensional depth image is calculated.
In S280, the weight is set based on the difference between the first three-dimensional depth image and the second three-dimensional depth image, and the anti-spoofing analysis result is adjusted. Whether the face image is from a living body or not is determined according to the adjusted anti-spoofing analysis result
In the present embodiment, a 1×1 feature map is generated based on the extracted features, and then a score (i.e., the anti-spoofing analysis result) from 0 to 1 is generated by the sigmoid function. The difference(depth_diff) between the actual 3D depth image (the second 3D depth image) and the generated 3D depth image (the first 3D depth image) is calculated. Then the score_weight (weight)=1−depth_diff is taken, and the final score (adjusted anti-spoofing analysis result)=score_weight×score, such that the score is corrected based on the score_weight generated from the 3D depth supervisory information. In the present embodiment, the 3D depth image is actually taken as the supervisory information to correct the face anti-spoofing result.
In the technical solution of the present embodiment, an unimodal silent anti-spoofing scheme based on RGB face images is designed, which can achieve a very fast processing speed while the face anti-spoofing effectively defends against the conventional attacks.
As shown in
A feature extraction module 310 is configured to extract features from the face image.
In the present embodiment, the face image is usually an RGB (a type of color standard) image.
A face analysis module 320 is configured to calculate the anti-spoofing analysis result of the face image in a predetermined manner based on the features of the face image.
In the present embodiment, a model that is formed as a combination of several stacked CNN (Convolutional Neural Network) modules is used to extract features from the face image. The model can keep the model structure and the amount of parameters small, so as to be fast and good.
A feature visualization module 330 is configured to perform the feature visualization process on the features of the face image to obtain a first three-dimensional depth image.
In the present embodiment, a feature visualization module is preset to generate a 3D depth image (i.e., the first three-dimensional depth image) corresponding to the features by feature visualization technique (a technique in the deep learning field).
A comparison module 340 is configured to compare the first three-dimensional depth image with the pre-generated second three-dimensional depth image corresponding to the face image.
In the embodiment, the real 3D depth image matched with the face image is used as the supervisory information. In the present embodiment, the depth image obtained by feature visualization is compared with the supervised depth image, and the smaller the differences, the better the extracted features; on the contrary, the larger the differences, the worse the extracted features.
A result adjustment module S350 is configured to adjust the anti-spoofing analysis result based on the comparison result, and determine whether or not the face image is from a living body according to the adjusted anti-spoofing analysis result.
According to the technical solution of the present embodiment, after the anti-spoofing analysis is performed based on the features of the face image and the results are obtained, the described technique does not directly determine whether the face image is from a living body or not, but instead the feature visualization technique is used to generate a three-dimensional depth image at the feature extraction stage, which is compared to a pre-generated real three-dimensional depth image matched with the face image. If the two images are similar to each other, the comparison could indicate that the extracted features can better reflect the three-dimensionality of the face, and more accurate detection results can be obtained for liveness detection. However, if the difference between the two images is greater, the difference could indicate that the extracted features cannot accurately reflect the three-dimensionality of the face, and it could be difficult to obtain more accurate detection results for liveness detection. The embodiments of the present disclosure actually use the depth image as supervisory information to ensure that accurate liveness detection results can be obtained.
As shown in
A face recognition module 410 is configured to perform face recognition on the face image, and drawing the first box in the face image according to the recognition result.
In the present embodiment, face detection is performed on the face image, and the Bounding box (bbox_1), i.e., the first box, is obtained, and its coordinate information is determined. The bbox_1 obtained by face detection contains only the five sense organs of the face.
The face recognition module 410 is configured to processes the first box to obtain a second box according to a predetermined expansion ratio. Specifically, the face recognition module 410 generates the square third box based on the short side of the first box; and enlarges the third box according to the expansion ratio to obtain the second box.
In the present embodiment, the short side of bbox-1 is taken to form a square bbox_2 (to avoid distorting the face when changing the face size), i.e., the third box, and then an expansion ratio scale_size is set, e.g., scale_size=2.7. The cropped face will be larger than the original bbox_1, and it will be able to completely cover the face, while having a bit of background information.
A deep image generation module 420 for generating the second three-dimensional depth image based on a region of the face image located within the second box.
Specifically, the binary rectangular mask is generated based on the first box. The second position of the binary rectangular mask in the second three-dimensional depth image is calculated based on the first position of the first box in the face image. The second three-dimensional depth image is processed with the binary rectangular mask based on the second position, and the areas covered by the binary rectangular mask in the second three-dimensional depth image are set to 1, while the uncovered areas are set to 0.
In the present embodiment, a face 3D depth model is preset and trained to generate the corresponding 3D depth image based on the face image. In the present embodiment, the binary rectangular mask is generated based on bbox_1, and the position of the binary rectangular mask in the 3D depth image is determined based on the coordinates of bbox_1. The portion outside the binary rectangular mask is taken as background and set to 0, and the portion covered by the binary rectangular mask is set to 1.
A feature extraction module 430 is configured to extract the features from a region of the face image located within the second box.
A feature visualization module 440 is configured to perform a feature visualization process on the features of the face image to obtain a first three-dimensional depth image.
A face analysis module 450 is configured to calculate the anti-spoofing analysis result of the face image in a predetermined manner. Specifically, the sigmoid function is used to calculate the results of the anti-spoofing analysis result.
A comparison module 460 is configured to calculate the difference between the first three-dimensional depth image and the second three-dimensional depth image.
A result adjustment module 470 is configured to set the weight based on the difference between the first three-dimensional depth image and the second three-dimensional depth image, and adjust the anti-spoofing analysis result.
In the present embodiment, a 1×1 feature map is generated based on the extracted features, and then a score (i.e., the anti-spoofing analysis result) from 0 to 1 is generated by a sigmoid function. The difference(depth_diff) between the actual 3D depth image (the second 3D depth image) and the generated 3D depth image (the first 3D depth image) is calculated. Then the score_weight (weight)=1−depth_diff is taken, and the final score (adjusted anti-spoofing analysis result)=score_weight×score, such that the score is corrected based on the score_weight generated from the 3D depth supervisory information. In the present embodiment, the 3D depth image is actually taken as the supervisory information to correct the face anti-spoofing result.
In the technical solution of this embodiment, an unimodal silent anti-spoofing scheme based on RGB face images is designed, which can achieve a very fast processing speed while the face anti-spoofing effectively defends against the conventional attacks.
A computer-readable storage medium is provided by the present disclosure. In an embodiment of the present disclosure, the computer-readable storage medium may be configured to store instructions for performing the face anti-spoofing method provided in any one of the above-mentioned embodiments, wherein the instructions are adapted to be loaded and run by a processor to perform any one of the face anti-spoofing methods above-mentioned. For ease of illustration, only portions related to the embodiments of the present disclosure are shown, and where specific technical details are not disclosed, reference is made to the method portion of the embodiments of the present disclosure. The computer-readable storage medium may be a storage device apparatus including a storage device apparatus formed by various electronic devices, and optionally, the computer-readable storage medium in the embodiments of the present disclosure is a non-transitory computer-readable storage medium.
The algorithms and displays provided herein are not inherently associated with any particular computer, virtual system, or other device. Various general systems may also be used with the demonstrations based herein. The structures required to construct such systems are apparent in the above description. Moreover, the present disclosure is not directed to any particular programming language. It should be appreciated that a variety of programming languages may be utilized to implement the elements of the present disclosure described herein, and the descriptions given above with respect to particular languages are intended to disclose the best embodiments of the present disclosure.
In the specification provided herein, many specific details are described. However, it is to be understood that embodiments of the present disclosure may be practiced without these specific details. In some embodiments, known methods, structures, and techniques are not shown in detail to not obscure the understanding of the present specification.
Similarly, it should be understood that to streamline the present disclosure and aid in the understanding of one or more of the various inventive aspects, in the description of the exemplary embodiments of the present disclosure above, the various features of the present disclosure have sometimes been grouped together in individual embodiments, figures, or descriptions thereof. However, the methods of the disclosure should not be construed as reflecting an intent that the present disclosure claimed for protection requires more features than those expressly documented in each claim. Accordingly, the claims that follow a specific embodiment are hereby expressly incorporated into that specific embodiment, wherein each claim itself serves as a separate embodiment of the disclosure.
Those skilled in the art will appreciate that modules in the devices of the embodiments can be adaptively changed and set in one or more devices different from the embodiments. It is possible to combine the modules, units, or components of the embodiments into a single module, unit, or component, in addition to dividing them into a plurality of sub-modules, sub-units, or sub-assemblies. In addition to the fact that at least some of such features and/or processes or units are mutually exclusive, any combination of all features disclosed in this specification (including accompanying claims, abstracts, and accompanying drawings) and all processes or units of any method or apparatus so disclosed may be employed. Unless otherwise expressly stated, each feature disclosed in this specification (including the accompanying claims, abstract, and accompanying drawings) may be replaced by an alternative feature that provides the same, equivalent, or similar purpose.
In addition, those skilled in the art can appreciate that although some embodiments described herein include some features included in other embodiments and not others, combinations of features of different embodiments are meant to be within the scope of the present disclosure and form different embodiments. For example, in the following claims, any one of the embodiments may be used in any combination.
Various component embodiments of the present disclosure may be implemented in hardware, or software modules running on one or more processors, or in combinations thereof. It should be appreciated by those skilled in the art that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components of a processing device of a mobile terminal according to embodiments of the present disclosure. The present disclosure may also be implemented as an apparatus or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein. Such instructions for implementing the present disclosure may be stored on a computer-readable medium or may have the form of one or more signals. Such signals may be downloaded from an Internet site, provided on a carrier signal, or provided in any other form.
It should be noted that the above embodiments illustrate the present disclosure rather than limit the present disclosure and that those skilled in the art may devise replacement embodiments without departing from the scope of the appended claims. In the claims, any reference between brackets should not be construed as a limitation of the claims. The term “comprising” or “including” does not exclude the existence of elements or steps not listed in the claims. The term “one,” “a,” or “the” before an element does not exclude the existence of a plurality of such elements. The present disclosure can be implemented with the aid of hardware comprising a number of different elements and a suitably programmed computer. In the unitary claims enumerating a number of devices, several of these devices may be specified by means of the same hardware item. The term “first,” “second,” or “third,” etc., does not indicate any order. These terms may be construed as names.
Number | Date | Country | Kind |
---|---|---|---|
202211684312.X | Dec 2022 | CN | national |