ACTIVE-DEFENSE DETECTION METHOD BASED ON FACIAL LANDMARK WATERMARKING

Information

  • Patent Application
  • 20250166111
  • Publication Number
    20250166111
  • Date Filed
    June 27, 2024
    11 months ago
  • Date Published
    May 22, 2025
    21 days ago
Abstract
An active-defense detection method based on facial landmark watermarking is provided. The active-defense detection method includes: extracting facial landmarks from an original image, converting the extracted facial landmarks into a binary watermark, embedding the binary watermark into the original image to form a watermark image, and subjecting the watermark image to a non-malicious/malicious operation to form a noise image or a malicious image such that a model is robust to the non-malicious/malicious operation. By introducing the facial landmarks, the active-defense detection method can generate a unique watermark for each individual and achieve traceability and detection functions.
Description
CROSS-REFERENCE TO THE RELATED APPLICATIONS

This application is based upon and claims priority to Chinese Patent Application No. 202311561214.1, filed on Nov. 22, 2023, the entire contents of which are incorporated herein by reference.


TECHNICAL FIELD

The present disclosure relates to the field of Deepfake detection, and in particular to an active-defense detection method based on facial landmark watermarking.


BACKGROUND

In recent years, Deepfake technology has entered an increasing number of fields in academia and industry, and has been widely used in multimedia forms such as video, audio, and images to generate fake multimedia products, resulting in various legal and ethical issues. In order to combat the aggressiveness of Deepfake technology, a new research branch called Deepfake detection has emerged. Currently, Deepfake detection mainly focuses on passive detection, that is, detection by artifacts in fake faces after generation. Passive detection methods usually only rely on passive defense and ex-post evidence collection when detecting deeply faked images or videos. This means that they cannot prevent the generation and propagation of deepfakes, and cannot avoid the potential harm caused by fake content.


At present, in methods based on semi-fragile watermarks, a watermark can be used to detect authenticity but cannot achieve tracing. In addition, in methods based on robust watermarks, the watermark embedded into the image is a randomly generated or fixed watermark, and unique watermarks cannot be generated for each individual.


SUMMARY

In order to overcome the above shortcomings in the prior art, the present disclosure provides an active-defense detection method based on facial landmark watermarking, which can generate unique watermarks for each individual and achieve traceability and detection functions.


In order to solve the technical problem, the present disclosure adopts the following technical solution.


The active-defense detection method based on facial landmark watermarking includes the following steps:

    • a) acquiring n facial images to form a facial image set I, I={I1, I2, . . . , Ii . . . , In}, where Ii denotes an i-th facial image, i∈{1, . . . , n}; and preprocessing the i-th facial image Ii, i ∈ {1, . . . ,n}, to acquire a preprocessed i-th facial image Icover_i, thereby acquiring a preprocessed facial image set Icover;
    • b) extracting facial landmarks from the preprocessed i-th facial image Icover_i, and converting the facial landmarks into a watermark Wm;
    • c) constructing an encoder, and inputting the i-th facial image Icover_i and the watermark Wm into the encoder to acquire a watermark image Iwm;
    • d) injecting the watermark image Iwm into a noise pool to acquire a noise image Inoise, and injecting the watermark image Iwm into a malicious pool to acquire a malicious image Idep;
    • e) constructing a decoder, and inputting the noise image Inoise or the malicious image Idep into the decoder to acquire a final watermark Wm1; and
    • f) determining whether the noise image Inoise and the malicious image Idep are real or fake images based on the final watermark Wm1.


Further, the step a) includes:

    • a-1) acquiring the n facial images from a CelebA-HQ dataset to form the facial image set I; and
    • a-2) resizing, by a resize( ) function in a Python imaging library (PIL), the i-th facial image Ii into a 256×256 image, thereby acquiring the preprocessed i-th facial image Icover_i and acquiring the preprocessed facial image set Icover={Icover_1, Icover_2, . . . , Icover_i, . . . , Icover_n}.


Further, the step b) includes:

    • b-1) detecting, by a Dlib facial landmark detection algorithm, the facial landmarks in the preprocessed i-th facial image Icover_i to acquire a facial landmark set Lm including m facial landmarks, Lm={l1, l2, . . . , lm}, m=68, where {l1,l2, . . . ,l17} are landmarks of a jawline, {l18, l19, . . . , l22} are landmarks of a right eyebrow, {l23, l24, . . . ,l27} are landmarks of a left eyebrow, {l28, l29, . . . ,l36} are landmarks of a nose, {l37, l38, . . . ,l42} are landmarks of a right eye, {l43, l44, . . . , l48} are landmarks of a left eye, and {l49,l50, . . . ,l68} are landmarks of a mouth; and
    • b-2) defining an i-th landmark li by a horizontal coordinate xi and a vertical coordinate yi; mapping a value of the horizontal coordinate xi to an integer range of 0-15 through a linear transformation, and converting, by a bin( ) function in Python, the value into a binary representation Wxi with a length of 4; mapping a value of the vertical coordinate yi to an integer range of 0-15 through a linear transformation, and converting, by the bin( ) function in Python, the value into a binary representation Wyi with a length of 4; splicing the binary representation Wxi and the binary representation Wyi into a binary representation Wxy-i with a length of 8; splicing binary representations of the 68 facial landmarks together into a binary representation W68 with a length of 544; and compressing, by a principal component analysis (PCA)-based dimensionality reduction method, the binary representation W68 to a binary representation with a length of 256 as the watermark Wm.


Further, the step c) includes:

    • c-1) constructing the encoder, including an original image processing unit, a watermark processing unit, a first convolutional layer, a batch normalization (BatchNorm) layer, an activation function layer, and a second convolutional layer;
    • c-2) constructing the original image processing unit of the encoder, including a convolutional layer, a BatchNorm layer, a first rectified linear unit (ReLU) activation function, an atrous convolutional layer, a second ReLU activation function, a Dropout layer, a first combined pooling and convolution (CPC) module, a second CPC module, and a third CPC module; inputting the i-th facial image Icover_i into the convolutional layer, the BatchNorm layer, and the first ReLU activation function of the original image processing unit in sequence to acquire an image feature Fcover_1; and inputting the image feature Fcover_1 into the atrous convolutional layer, the second ReLU activation function, and the Dropout layer of the original image processing unit in sequence to acquire an image feature Fcover_2;
    • c-3) constructing the first CPC module, the second CPC module, and the third CPC module, each including a first branch and a second branch, where the first branch includes a first convolutional layer, a first BatchNorm layer, a first ReLU activation function, a second convolutional layer, a second BatchNorm layer, a second ReLU activation function, a third convolutional layer, a third BatchNorm layer, and a third ReLU activation function in sequence, while the second branch includes an average pooling layer, a first convolutional layer, a ReLU activation function, and a second convolutional layer in sequence; inputting the image feature Fcover_2 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature Fcover_2_1; inputting the image feature Fcover_2_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature Fcover_2_2; inputting the image feature Fcover_2_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature Fcover_2_3; inputting the image feature Fcover_2 into the second branch of the first CPC module to acquire an image feature Fcover_3; subjecting the image feature Fcover_3 and the image feature Fcover_2_3 to element-wise multiplication to acquire an image feature Fcover_4; subjecting the image feature Fcover_4 and the image feature Fcover_2 to corresponding-elements addition to acquire an image feature Fcover_5; inputting the image feature Fcover_5 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature Fcover_5_1; inputting the image feature Fcover_5_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature Fcover_5_2; inputting the image feature Fcover_5_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature Fcover_5_3; inputting the image feature Fcover_5 into the second branch of the second CPC module to acquire an image feature Fcover_6; subjecting the image feature Fcover_6 and the image feature Fcover_5_3 to element-wise multiplication to acquire an image feature Fcover_7; subjecting the image feature Fcover_7 and the image feature Fcover_5 to corresponding-elements addition to acquire an image feature Fcover_8; inputting the image feature Fcover_8 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature Fcover_8_1; inputting the image feature Fcover_8_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature Fcover_8_2; inputting the image feature Fcover_8_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature Fcover_8_3; inputting the image feature Fcover_8 into the second branch of the third CPC module to acquire an image feature Fcover_9; subjecting the image feature Fcover_9 and the image feature Fcover_8_3 to element-wise multiplication to acquire an image feature Fcover_10; and subjecting the image feature Fcover_10 and the image feature Fcover_8 to corresponding-elements addition to acquire an image feature Fcover_11;
    • c-4) constructing the watermark processing unit of the encoder, including a linear layer, a convolutional layer, a first BatchNorm layer, a first ReLU activation function, an atrous convolutional layer, a second ReLU activation function, a first Dropout layer, a first deconvolutional layer, a second BatchNorm layer, a third ReLU activation function, a second deconvolutional layer, a fourth ReLU activation function, a second Dropout layer, a first CPC module, a second CPC module, and a third CPC module; inputting the watermark Wm into the linear layer of the watermark processing unit to acquire a watermark feature f1; inputting the watermark feature f1 into the convolutional layer, the first BatchNorm layer, and the first ReLU activation function of the watermark processing unit in sequence to acquire a watermark feature f2; inputting the watermark feature f2 into the atrous convolutional layer, the second ReLU activation function, and the first Dropout layer of the watermark processing unit in sequence to acquire a watermark feature f3; inputting the watermark feature f3 into the first deconvolutional layer, the second BatchNorm layer, and the third ReLU activation function of the watermark processing unit in sequence to acquire a watermark feature f4; inputting the watermark feature f4 into the second deconvolutional layer, the fourth ReLU activation function, and the second Dropout layer of the watermark processing unit in sequence to acquire a watermark feature f5; inputting the watermark feature f5 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the first CPC module in sequence to acquire a watermark feature fm_5_1; inputting the watermark feature fm_5_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the first CPC module in sequence to acquire a watermark feature fm_5_2; inputting the watermark feature fm_5_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the first CPC module in sequence to acquire a watermark feature fm_5_3; inputting the watermark feature f5 into the second branch of the first CPC module to acquire a watermark feature fm_6; subjecting the watermark feature fm_6 and the watermark feature fm_5_3 to element-wise multiplication to acquire a watermark feature fm_7; subjecting the watermark feature fm_7 and the watermark feature f5 to corresponding-elements addition to acquire a watermark feature fm_8; inputting the watermark feature fm_8 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the second CPC module in sequence to acquire a watermark feature fm_8_1; inputting the watermark feature fm_8_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the second CPC module in sequence to acquire a watermark feature fm_8_2; inputting the watermark feature fm_8_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the second CPC module in sequence to acquire a watermark feature fm_8_3; inputting the watermark feature fm_8 into the second branch of the second CPC module to acquire a watermark feature fm_9; subjecting the watermark feature fm_9 and the watermark feature fm_8_3 to element-wise multiplication to acquire a watermark feature fm_10; subjecting the watermark feature fm_10 and the watermark feature fm_8 to corresponding-elements addition to acquire a watermark feature fm_11; inputting the watermark feature fm_11 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the third CPC module in sequence to acquire a watermark feature fm_11_1; inputting the watermark feature fm_11_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the third CPC module in sequence to acquire a watermark feature fm_11_2; inputting the watermark feature fm_11_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the third CPC module in sequence to acquire a watermark feature fm_11_3; inputting the watermark feature fm_11 into the second branch of the third CPC module to acquire a watermark feature fm_12; subjecting the watermark feature fm_12 and the watermark feature fm_11_3 to element-wise multiplication to acquire a watermark feature fm_13; and subjecting the watermark feature fm_13 and the watermark feature fm_11 to corresponding-elements addition to acquire a watermark feature f6; and
    • c-5) subjecting the image feature Fcover_11 and the watermark feature f6 to corresponding-elements addition to acquire a feature F1; inputting the feature F1 into the first convolutional layer, the BatchNorm layer, and the activation function layer of the encoder in sequence to acquire a feature F2; and inputting the feature F2 into the second convolutional layer of the encoder to acquire the watermark image Iwm.


Preferably, in the step c-2), the convolutional layer of the original image processing unit includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; and the atrous convolutional layer of the original image processing unit includes 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1; in the step c-3), the first convolutional layer, the second convolutional layer, and the third convolutional layer in the first branch each include 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; the first convolutional layer and the second convolutional layer in the second branch each include 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; and the average pooling layer in the second branch has a window size of 4; in the step c-4), the linear layer of the watermark processing unit includes 256 input nodes and 256 output nodes; the convolutional layer of the watermark processing unit includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; the atrous convolutional layer of the watermark processing unit includes 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1; and the first deconvolutional layer and the second deconvolutional layer of the watermark processing unit each include 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; in the step c-5), the first convolutional layer of the encoder includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; and the second convolutional layer of the encoder includes 3 channels and a convolutional kernel, with a size of 1, a stride of 1, and a padding of 1.


Further, the step d) includes:

    • d-1) constructing the noise pool, including Identity noise, Dropout noise, Crop noise, GaussianNoise noise, SaltPepper noise, GaussianBlur noise, MedBlur noise, and joint photographic experts group (JPEG) noise; injecting the watermark image Iwm into the noise pool; and adding a noise randomly selected from the noise pool to the watermark image Iwm to form the noise image Inoise; and
    • d-2) constructing the malicious pool, including a simple swapping (SimSwap) model, an information bottleneck disentanglement for identity swapping (InfoSwap) model, a unified cross-entropy loss for deep face recognition (UniFace) model, and attribute manipulation algorithms; injecting the watermark image Iwm into the malicious pool; and manipulating, by a model or attribute manipulation algorithm randomly selected from the malicious pool, the watermark image Iwm to form the malicious image Idep.


Further, the step e) includes:

    • e-1) constructing the decoder, including a first convolutional layer, a first BatchNorm layer, a first ReLU activation function, a first atrous convolutional layer, a second ReLU activation function, a first Dropout layer, a first CPC module, a second CPC module, a third CPC module, a second convolutional layer, a second BatchNorm layer, a third ReLU activation function, a second atrous convolutional layer, a fourth ReLU activation function, a second Dropout layer, a flatten layer, and a fully connected layer; inputting the noise image Inoise or the malicious image Idep into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function of the decoder in sequence to acquire an image feature N1; inputting the image feature N1 into the first atrous convolutional layer, the second ReLU activation function, and the first Dropout layer of the decoder in sequence to acquire an image feature N2; inputting the image feature N2 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature N2_1; inputting the image feature N2_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature N2_2; inputting the image feature N2_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature N2_3; inputting the image feature N2 into the second branch of the first CPC module to acquire an image feature N3; subjecting the image feature N3 and the image feature N2_3 to element-wise multiplication to acquire an image feature N4; subjecting the image feature N4 and the image feature N2 to corresponding-elements addition to acquire an image feature N5; inputting the image feature N5 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature N5_1; inputting the image feature N5_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature N5_2; inputting the image feature N5_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature N5_3; inputting the image feature N5 into the second branch of the second CPC module to acquire an image feature N6; subjecting the image feature N6 and the image feature N5_3 to element-wise multiplication to acquire an image feature N7; subjecting the image feature N7 and the image feature N5 to corresponding-elements addition to acquire an image feature N8; inputting the image feature N8 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature N8_1; inputting the image feature N8_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature N8_2; inputting the image feature N8_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature N8_3; inputting the image feature N8 into the second branch of the third CPC module to acquire an image feature N9; subjecting the image feature N9 and the image feature N8_3 to element-wise multiplication to acquire an image feature N10; subjecting the image feature N10 and the image feature N8 to corresponding-elements addition to acquire an image feature N11; inputting the image feature N11 into the second convolutional layer, the second BatchNorm layer, and the third ReLU activation function of the decoder in sequence to acquire an image feature N12; inputting the image feature N12 into the second atrous convolutional layer, the fourth ReLU activation function, and the second Dropout layer of the decoder in sequence to acquire an image feature N; inputting the image feature N13 into the flatten layer of the decoder to acquire an image feature N14; and inputting the image feature N14 into the fully connected layer of the decoder to acquire the final watermark Wm1.


Preferably, in the step e-1), the first convolutional layer of the decoder includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; the first atrous convolutional layer of the decoder includes 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1; the second convolutional layer of the decoder includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; the second atrous convolutional layer of the decoder includes 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1; and the flatten layer and the fully connected layer of the decoder each include 256 neurons.


Further, the step f) includes:

    • f-1) defining a constant count1 with an initial value of 0; determining whether binary values at corresponding positions of the final watermark Wm1 and the watermark Wm are the same; and if the binary values of the final watermark Wm1 and the watermark Wm are different in one bit: incrementing the constant count1 by 1, and dividing a final value of the constant count1 by 256 to acquire a bit error rate Ebit;
    • f-2) determining that the noise image Inoise is areal image if the bit error rate Ebit is less than 0.5; and determining that the noise image Inoise is a fake image if the bit error rate Ebit is greater than or equal to 0.5;
    • f-3) replacing the i-th facial image Icover_i in the step b) with the malicious image Idep, and repeating the step b) to acquire a watermark W′m;
    • f-4) defining a constant count2 with an initial value of 0; determining whether binary values at corresponding positions of the watermark W′m and the watermark Wm are the same; and if the binary values of the watermark W′m and the watermark Wm are different in one bit: incrementing the constant count2 by 1, and dividing a final value of the constant count2 by 256 to acquire a bit error rate E′min; and
    • f-5) determining that the malicious image Idep is a real image if the bit error rate E′bit, is less than or equal to 0.5; and determining that the malicious image Idep is a fake image if the bit error rate E′bit, is greater than 0.5.


The present disclosure has the following beneficial effects. The present disclosure extracts facial landmarks from an original image and converts the extracted facial landmarks into a binary watermark. The present disclosure embeds the binary watermark into the original image to acquire a watermark image, allowing the watermark image to undergo a non-malicious/malicious operation to form a noise image or a malicious image. In this way, the model is robust to the non-malicious/malicious operation. The present disclosure introduces facial landmarks to generate a unique watermark for each individual and achieve traceability and detection functions.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a flowchart of an active-defense detection method based on facial landmark watermarking according to the present disclosure;



FIG. 2 is a structural diagram of landmark extraction according to the present disclosure;



FIG. 3 is a structural diagram of an encoder according to the present disclosure; and



FIG. 4 is a structural diagram of a decoder according to the present disclosure.





DETAILED DESCRIPTION OF THE EMBODIMENTS

The present disclosure is further described with reference to FIGS. 1 to 4.


An active-defense detection method based on facial landmark watermarking includes the following steps.


a) n facial images are acquired to form facial image set I I=(I1, I2, . . . , Ii, . . . , In, where I, denotes an i-th facial image, i ∈{1, . . . , n}. The i-th facial image Ii, i ∈ {1, . . . ,n}, is preprocessed to acquire preprocessed i-th facial image Icover_i, thereby acquiring preprocessed facial image set Icover.


b) Facial landmarks are extracted from the preprocessed i-th facial image Icover_i, and are converted is input into watermark Wm.


c) An encoder is constructed, and the i-th facial image Icover_i and the watermark Wm are input is input into the encoder to acquire watermark image Iwm.


d) The watermark image Iwm is injected into a noise pool to acquire noise image Inoise, and the watermark image Iwm is injected into a malicious pool to acquire malicious image Idep.


e) A decoder is constructed, and the noise image Inoise or the malicious image Idep is input into the decoder to acquire final watermark Wm1.


f) it is determined whether the noise image Inoise and the malicious image Idep are real or fake images based on the final watermark Wm1.


The present disclosure converts the extracted facial landmarks is input into a binary watermark. The present disclosure embeds the binary watermark is input into the original image to acquire watermark image, allowing the watermark image to undergo a non-malicious/malicious operation to form noise image. In this way, the model is robust to the non-malicious/malicious operation. The present disclosure can generate a unique watermark for each individual and achieve traceability and detection functions. The present disclosure is based on the idea of adversarial attacks. Typically, active defense involves two methods. Firstly, adversarial perturbations are added to images or videos to distort the content generated by Deepfake, achieving the effect of “knowing it is fake at a glance”. Secondly, adversarial watermarks are added to images or videos. Unlike adding perturbations, adding watermarks is done by training the robustness of the watermark. At present, in methods based on semi-fragile watermarks, a watermark can only detect authenticity and cannot achieve tracing. In addition, in methods based on robust watermarks, watermarks embed is input into the image are randomly generated or fixed watermarks, and unique watermarks cannot be generated for each individual.


In an embodiment of the present disclosure, the step a) is as follows.


a-1) The n facial images are acquired from a CelebA-HQ dataset to form the facial image set I. The CelebA-HQ dataset includes 30,000 facial images with different identities, each with a resolution of 1024*1024.


a-2) The i-th facial image Ii is resized by a resize( ) function in a Python imaging library (PIL) into a 256×256 image, thereby acquiring the preprocessed i-th facial image Icover_i and acquiring the preprocessed facial image set Icover={Icover_1, Icover_2, . . . , Icover_i, . . . , Icover_n}.


In an embodiment of the present disclosure, the step b) is as follows.


b-1) The facial landmarks in the preprocessed i-th facial image Icover_i are detected by a Dlib facial landmark detection algorithm to acquire facial landmark set Lm including m facial landmarks, Lm={l1,l2, . . . ,lm}, m=68, where {l1,l2, . . . , l17} are landmarks of a jawline, {l18, l19, . . . , l22} are landmarks of a right eyebrow, {l23,l24, . . . ,l27} are landmarks of a left eyebrow, {l28, l29, . . . , l36} are landmarks of a nose, {l37,l38, . . . ,l42} are landmarks of a right eye, {l43,l44′, . . . , l48} are landmarks of a left eye, and {l49,l50, . . . ,l68} are landmarks of a mouth.


b-2) i-th landmark li is defined by horizontal coordinate x, and vertical coordinate yi. A value of the horizontal coordinate xi is mapped to an integer range of 0-15 through a linear transformation, and the value is converted by a bin( ) function in Python into binary representation Wxi with a length of 4. A value of the vertical coordinate yi is mapped to an integer range of 0-15 through a linear transformation, and the value is converted by the bin( ) function in Python into binary representation Wyi with a length of 4. The binary representation Wxi and the binary representation Wyi are spliced into binary representation Wxy-i with a length of 8. Binary representations of the 68 facial landmarks are spliced together input into binary representation W68 with a length of 544. The binary representation W68 is compressed by a principal component analysis (PCA)-based dimensionality reduction method to a binary representation with a length of 256 as the watermark Wm.


In an embodiment of the present disclosure, the step c) is as follows.


c-1) The encoder is constructed, including an original image processing unit, a watermark processing unit, a first convolutional layer, a batch normalization (BatchNorm) layer, an activation function layer, and a second convolutional layer.


c-2) The original image processing unit of the encoder is constructed, including a convolutional layer, a BatchNorm layer, a first rectified linear unit (ReLU) activation function, an atrous convolutional layer, a second ReLU activation function, a Dropout layer, a first CPC module, a second CPC module, and a third CPC module. The i-th facial image Icover_i is input into the convolutional layer, the BatchNorm layer, and the first ReLU activation function of the original image processing unit in sequence to acquire image feature Fcover_1. The image feature Fcover_1 is input into the atrous convolutional layer, the second ReLU activation function, and the Dropout layer of the original image processing unit in sequence to acquire image feature Fcover_2.


c-3) The first CPC module, the second CPC module, and the third CPC module are constructed, each including a first branch and a second branch, where the first branch includes a first convolutional layer, a first BatchNorm layer, a first ReLU activation function, a second convolutional layer, a second BatchNorm layer, a second ReLU activation function, a third convolutional layer, a third BatchNorm layer, and a third ReLU activation function in sequence, while the second branch includes an average pooling layer, a first convolutional layer, a ReLU activation function, and a second convolutional layer in sequence. The image feature Fcover_2 is input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the first CPC module in sequence to acquire image feature Fcover_2_1. The image feature Fcover_2_1 is input into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the first CPC module in sequence to acquire image feature Fcover_2_2. The image feature Fcover_2_2 is input into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the first CPC module in sequence to acquire image feature Fcover_2_3. The image feature Fcover_2 is input into the second branch of the first CPC module to acquire image feature Fcover_3. The image feature Fcover_3 and the image feature Fcover_2_3 are subjected to element-wise multiplication to acquire image feature Fcover_4. The image feature Fcover_4 and the image feature Fcover_2 are subjected to corresponding-elements addition to acquire image feature Fcover_5. The image feature Fcover_5 is input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the second CPC module in sequence to acquire image feature Fcover_5_1. The image feature Fcover_5_1 is input into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the second CPC module in sequence to acquire image feature Fcover_5_2. The image feature Fcover_5_2 is input into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the second CPC module in sequence to acquire image feature Fcover_5_3. The image feature Fcover_5 is input into the second branch of the second CPC module to acquire image feature Fcover_6. The image feature Fcover_6 and the image feature Fcover_5_3 are subjected to element-wise multiplication to acquire image feature Fcover_7. The image feature Fcover_7 and the image feature Fcover_5 are subjected to corresponding-elements addition to acquire image feature Fcover_8. The image feature Fcover_8 is input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the third CPC module in sequence to acquire image feature Fcover_8_1. The image feature Fcover_8_1 is input into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the third CPC module in sequence to acquire image feature Fcover_8_2. The image feature Fcover_8_2 is input into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the third CPC module in sequence to acquire image feature Fcover_8_3. The image feature Fcover_8 is input into the second branch of the third CPC module to acquire image feature Fcover_9. The image feature Fcover_9 and the image feature Fcover_8_3 are subjected to element-wise multiplication to acquire image feature Fcover_1. The image feature Fcover_10 and the image feature Fcover_8 are subjected to corresponding-elements addition to acquire image feature Fcover_11.


c-4) The watermark processing unit of the encoder is constructed, including a linear layer, a convolutional layer, a first BatchNorm layer, a first ReLU activation function, an atrous convolutional layer, a second ReLU activation function, a first Dropout layer, a first deconvolutional layer, a second BatchNorm layer, a third ReLU activation function, a second deconvolutional layer, a fourth ReLU activation function, a second Dropout layer, a first CPC module, a second CPC module, and a third CPC module. The watermark Wm is input into the linear layer of the watermark processing unit to acquire watermark feature f1. The watermark feature f1 is input into the convolutional layer, the first BatchNorm layer, and the first ReLU activation function of the watermark processing unit in sequence to acquire watermark feature f2. The watermark feature f2 is input into the atrous convolutional layer, the second ReLU activation function, and the first Dropout layer of the watermark processing unit in sequence to acquire watermark feature f3. The watermark feature f3 is input into the first deconvolutional layer, the second BatchNorm layer, and the third ReLU activation function of the watermark processing unit in sequence to acquire watermark feature f4. The watermark feature f4 is input into the second deconvolutional layer, the fourth ReLU activation function, and the second Dropout layer of the watermark processing unit in sequence to acquire watermark feature f5. The watermark feature f5 is input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the first CPC module in sequence to acquire watermark feature fm_5_1. The watermark feature fm_5_1 is input into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the first CPC module in sequence to acquire watermark feature fm_5_2. The watermark feature fm_5_2 is input into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the first CPC module in sequence to acquire watermark feature fm_5_3. The watermark feature f5 is input into the second branch of the first CPC module to acquire watermark feature fm_6. The watermark feature fm_6 and the watermark feature fm_5_3 are subjected to element-wise multiplication to acquire watermark feature fm_7. The watermark feature fm_7 and the watermark feature f5 are subjected to corresponding-elements addition to acquire watermark feature fm_8. The watermark feature fm_8 is input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the second CPC module in sequence to acquire watermark feature fm_8_1. The watermark feature fm_8_1 is input into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the second CPC module in sequence to acquire watermark feature fm_8_2. The watermark feature fm_8_2 is input into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the second CPC module in sequence to acquire watermark feature fm_8_3. The watermark feature fm_8 is input into the second branch of the second CPC module to acquire watermark feature fm_9. The watermark feature fm_9 and the watermark feature fm_8_3 are subjected to element-wise multiplication to acquire watermark feature fm_10. The watermark feature fm_10 and the watermark feature fm_8 are subjected to corresponding-elements addition to acquire watermark feature fm_11. The watermark feature fm_11 is input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the third CPC module in sequence to acquire watermark feature fm_11_1. The watermark feature fm_11_1 is input into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the third CPC module in sequence to acquire watermark feature fm_11_2. The watermark feature fm_11_2 is input into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the third CPC module in sequence to acquire watermark feature fm_11_3. The watermark feature fm_11 is input into the second branch of the third CPC module to acquire watermark feature fm_12. The watermark feature fm_12 and the watermark feature fm_11_3 are subjected to element-wise multiplication to acquire watermark feature fm_13. The watermark feature fm_13 and the watermark feature fm_11 are subjected to corresponding-elements addition to acquire watermark feature f6.


c-5) The image feature Fcover_11 and the watermark feature f6 are subjected to corresponding-elements addition to acquire feature F1. The feature F1 is input into the first convolutional layer, the BatchNorm layer, and the activation function layer of the encoder in sequence to acquire feature F2. The feature F2 is input into the second convolutional layer of the encoder to acquire the watermark image Iwm.


In the encoder, all convolutional layers, deconvolutional layers, and atrous convolutional layers are two-dimensional.


In this embodiment, preferably, in the step c-2), the convolutional layer of the original image processing unit includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1. The atrous convolutional layer of the original image processing unit includes 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1. in the step c-3), the first convolutional layer, the second convolutional layer, and the third convolutional layer in the first branch each include 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1. The first convolutional layer and the second convolutional layer in the second branch each include 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1. The average pooling layer in the second branch has a window size of 4. In the step c-4), the linear layer of the watermark processing unit includes 256 input nodes and 256 output nodes. The convolutional layer of the watermark processing unit includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1. The atrous convolutional layer of the watermark processing unit includes 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1. The first deconvolutional layer and the second deconvolutional layer of the watermark processing unit each include 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1. In the step c-5), the first convolutional layer of the encoder includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1. The second convolutional layer of the encoder includes 3 channels and a convolutional kernel, with a size of 1, a stride of 1, and a padding of 1.


In an embodiment of the present disclosure, the step d) is as follows.


d-1) The noise pool is constructed, including Identity noise, Dropout noise, Crop noise, GaussianNoise noise, SaltPepper noise, GaussianBlur noise, MedBlur noise, and joint photographic experts group (JPEG) noise. The watermark image Iwm is injected into the noise pool. A noise randomly selected from the noise pool is added to the watermark image Iwm to form the noise image Inoise. By implementing the source code described in the paper “MBRS: Enhancing Robustness of DNN-based Watermarking by Mini-Batch of Real and Simulated joint photographic experts group (JPEG) Compression”, the Identity noise, Dropout noise, Crop noise, GaussianNoise noise, SaltPepper noise, GaussianBlur noise, MedBlur noise, and joint photographic experts group (JPEG) noise are added. This is available in the prior art and will not be elaborated herein.


d-2) The malicious pool is constructed, including a simple swapping (SimSwap) model, an information bottleneck disentanglement for identity swapping (InfoSwap) model, a unified cross-entropy loss for deep face recognition (UniFace) model, and attribute manipulation algorithms (for manipulating nose, mouth, eyes, jawline, and eyebrow attributes). The watermark image Iwm is injected into the malicious pool. The watermark image Iwm is manipulated by a model or attribute manipulation algorithm randomly selected from the malicious pool to form the malicious image Idep. The SimSwap model achieves face swapping through the source code described in the paper “SimSwap: An Efficient Framework for High Fidelity Face Swapping”. The InfoSwap model achieves face swapping through the source code described in the paper “InfoSwap: Information Bottleneck Disentengement for Identity Swapping”. The UniFace model achieves face swapping through the source code described in the paper “Designing One Unified Framework for High-Identity Face Reenactment and Swapping”. The manipulating the shape of attributes such as nose, mouth, eyes, jawline, and eyebrows is achieved through the source code described in the paper “StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation”. This is available in the prior art and will not be elaborated herein.


In an embodiment of the present disclosure, the step e) is as follows.


e-1) The decoder is constructed, including a first convolutional layer, a first BatchNorm layer, a first ReLU activation function, a first atrous convolutional layer, a second ReLU activation function, a first Dropout layer, a first CPC module, a second CPC module, a third CPC module, a second convolutional layer, a second BatchNorm layer, a third ReLU activation function, a second atrous convolutional layer, a fourth ReLU activation function, a second Dropout layer, a flatten layer, and a fully connected layer. The noise image Inoise or the malicious image Idep is input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function of the decoder in sequence to acquire image feature N1. The image feature N1 is input into the first atrous convolutional layer, the second ReLU activation function, and the first Dropout layer of the decoder in sequence to acquire image feature N2. The image feature N2 is input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the first CPC module in sequence to acquire image feature N2_1. The image feature N2_1 is input into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the first CPC module in sequence to acquire image feature N2_2. The image feature N2_2 is input into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the first CPC module in sequence to acquire image feature N2_3. The image feature N2 is input into the second branch of the first CPC module to acquire image feature N3. The image feature N3 and the image feature N2_3 are subjected to element-wise multiplication to acquire image feature N4. The image feature N4 and the image feature N2 are subjected to corresponding-elements addition to acquire image feature N5. The image feature N5 is input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the second CPC module in sequence to acquire image feature N5_1. The image feature N5_1 is input into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the second CPC module in sequence to acquire image feature N5_2. The image feature N5_2 is input into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the second CPC module in sequence to acquire image feature N5_3. The image feature N5 is input into the second branch of the second CPC module to acquire image feature N6. The image feature N6 and the image feature N5_3 are subjected to element-wise multiplication to acquire image feature N7. The image feature N7 and the image feature N5 are subjected to corresponding-elements addition to acquire image feature N8. The image feature N8 is input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the third CPC module in sequence to acquire image feature N8_1. The image feature N8_1 is input into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the third CPC module in sequence to acquire image feature N8_2. The image feature N8_2 is input into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the third CPC module in sequence to acquire image feature N8_3. The image feature N8 is input into the second branch of the third CPC module to acquire image feature N9. The image feature N9 and the image feature N8_3 are subjected to element-wise multiplication to acquire image feature N10. The image feature N10 and the image feature N8 are subjected to corresponding-elements addition to acquire image feature N11. The image feature N11 is input into the second convolutional layer, the second BatchNorm layer, and the third ReLU activation function of the decoder in sequence to acquire image feature N12. The image feature N12 is input into the second atrous convolutional layer, the fourth ReLU activation function, and the second Dropout layer of the decoder in sequence to acquire image feature N13. The image feature N13 is input into the flatten layer of the decoder to acquire image feature N14. The image feature N14 is input into the fully connected layer of the decoder to acquire the final watermark Wm1.


In this embodiment, preferably, in the step e-1), the first convolutional layer of the decoder includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1. The first atrous convolutional layer of the decoder includes 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1. The second convolutional layer of the decoder includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1. The second atrous convolutional layer of the decoder includes 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1. The flatten layer and the fully connected layer of the decoder each include 256 neurons.


In an embodiment of the present disclosure, the step f) is as follows.


f-1) Constant count1 is defined with an initial value of 0. It is determined whether the binary values at corresponding positions of the final watermark Wm1 and the watermark Wm are the same. If the binary values of the final watermark Wm1 and the watermark Wm are different in one bit, it indicates that the binary values at corresponding positions of the final watermark Wm1 and the watermark Wm are different. At this point, the constant count1 is incremented by 1, and a final value of the constant count1 is divided by 256 to acquire bit error rate Ebit.


f-2) If the bit error rate Ebit is less than 0.5, it indicates that the final watermark Wm1 is the watermark Wm of the i-th facial image Icover_i, and the face in the i-th facial image Icover_i does not change, achieving a traceability function. Therefore, the noise image Inoise is a real image. If the bit error rate Ebit is greater than or equal to 0.5, the noise image Inoise is a fake image.


f-3) The malicious image Idep includes a trace of manipulation. Therefore, the i-th facial image Icover_i in the step b) is replaced with the malicious image Idep, and the step b) is repeated to acquire watermark W′m.


f-4) Constant count2 is defined with an initial value of 0. It is determined whether binary values at corresponding positions of the watermark W′m and the watermark Wm are the same. If the binary values of the watermark W′m and the watermark Wm are different in one bit, the constant count2 is incremented by 1, and a final value of the constant count2 is divided by 256 to acquire bit error rate E′bit.


f-5) It is determined that the malicious image Idep is a real image if the bit error rate E′bit is less than or equal to 0.5. It is determined that the malicious image Idep is a fake image if the bit error rate E′bit, is greater than 0.5. Since the watermark in the malicious image Idep can be robustly recovered from the decoder, the trustworthy original image with the watermark Wm can be tracked through matching between facial landmarks and the watermark.


The quantitative comparison results of the bitwise restoration accuracy of watermarks after a common image processing operation and a malicious face swapping operation on the CelebA-HQ dataset at 256×256 resolution are shown in Table 1. The robustness of watermarks is measured based on the accuracy of watermark restoration. The method proposed by the present disclosure achieves an average accuracy of 98.95% in the common image processing operation, which is superior to the state-of-the-art methods. The average accuracy of the method proposed by the present disclosure is improved by 14.29% compared to MBRS and 18.56% compared to FaceSigns. The generalization ability of different face swapping algorithms is evaluated. The method proposed by the present disclosure restores watermarks with an average accuracy of 98.05%, which is improved by 47.82% compared to MBRS and 47.94% compared to FaceSigns.


Finally, it should be noted that the above descriptions are only preferred embodiments of the present disclosure, and are not intended to limit the present disclosure. Although the present disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art may still modify the technical solutions described in the foregoing embodiments, or equivalently substitute some technical features thereof. Any modification, equivalent substitution, improvement, etc. within the spirit and principles of the present disclosure shall fall within the scope of protection of the present disclosure.

Claims
  • 1. An active-defense detection method based on a facial landmark watermarking, comprising the following steps: a) acquiring n facial images to form a facial image set I, I={I1, I2, . . . , Ii, . . . , In}, wherein Ii denotes an i-th facial image, i ∈ {1, . . . ,n}; and preprocessing the i-th facial image Ii, i ε {1, . . . ,n}, to acquire a preprocessed i-th facial image Icover_i, wherein a preprocessed facial image set Icover is acquired;b) extracting facial landmarks from the preprocessed i-th facial image Icover_i, and converting the facial landmarks into a watermark Wm;c) constructing an encoder, and inputting the i-th facial image Icover_i and the watermark Wm into the encoder to acquire a watermark image Iwm;d) injecting the watermark image Iwm into a noise pool to acquire a noise image Inoise, and injecting the watermark image Iwm into a malicious pool to acquire a malicious image Idep;e) constructing a decoder, and inputting the noise image Inoise or the malicious image Idep into the decoder to acquire a final watermark Wm1; andf) determining whether the noise image Inoise and the malicious image Idep are real or fake images based on the final watermark Wm1.
  • 2. The active-defense detection method based on the facial landmark watermarking according to claim 1, wherein the step a) comprises: a-1) acquiring the n facial images from a CelebA-HQ dataset to form the facial image set I; anda-2) resizing, by a resize( ) function in a Python imaging library (PIL), the i-th facial image Ii into a 256×256 image, wherein the preprocessed i-th facial image Icover_i and the preprocessed facial image set Icover={Icover_1, Icover_2, . . . , Icover_i, . . . , Icover_n} are acquired.
  • 3. The active-defense detection method based on the facial landmark watermarking according to claim 1, wherein the step b) comprises: b-1) detecting, by a Dlib facial landmark detection algorithm, the facial landmarks in the preprocessed i-th facial image Icover_i to acquire a facial landmark set Lm comprising m facial landmarks, Lm={l1,l2, . . . ,lm}, m=68, wherein {l1, l2, . . . ,l17} are landmarks of a jawline, {l18,l19, . . . ,l22} are landmarks of a right eyebrow, {l23, l24, . . . ,l27} are landmarks of a left eyebrow, {l28, l29, . . . ,l36} are landmarks of a nose, {l37, l38, . . . ,l42} are landmarks of a right eye, {l43, l44, . . . , l48} are landmarks of a left eye, and {l49,l50, . . . , l68} are landmarks of a mouth; andb-2) defining an i-th landmark li by a horizontal coordinate xi and a vertical coordinate yi; mapping a value of the horizontal coordinate xi to an integer range of 0-15 through a linear transformation, and converting, by a bin( ) function in Python, the value of the horizontal coordinate xi that is mapped to the integer range of 0-15 into a binary representation Wxi with a length of 4; mapping a value of the vertical coordinate yi to the integer range of 0-15 through the linear transformation, and converting, by the bin( ) function in Python, the value of the vertical coordinate yi that is mapped to the integer range of 0-15 into a binary representation Wyi with the length of 4; splicing the binary representation Wxi and the binary representation Wyi into a binary representation Wxy-i with a length of 8; splicing binary representations of the 68 facial landmarks together into a binary representation W68 with a length of 544; and compressing, by a principal component analysis (PCA)-based dimensionality reduction method, the binary representation W68 to a binary representation with a length of 256 as the watermark Wm.
  • 4. The active-defense detection method based on the facial landmark watermarking according to claim 1, wherein the step c) comprises: c-1) constructing the encoder, comprising an original image processing unit, a watermark processing unit, a first convolutional layer, a batch normalization (BatchNorm) layer, an activation function layer, and a second convolutional layer;c-2) constructing the original image processing unit of the encoder, comprising a convolutional layer, a BatchNorm layer, a first rectified linear unit (ReLU) activation function, an atrous convolutional layer, a second ReLU activation function, a Dropout layer, a first combined pooling and convolution (CPC) module, a second CPC module, and a third CPC module; inputting the i-th facial image Icover_i into the convolutional layer, the BatchNorm layer, and the first ReLU activation function of the original image processing unit in sequence to acquire an image feature Fcover_1; and inputting the image feature Fcover_1 into the atrous convolutional layer, the second ReLU activation function, and the Dropout layer of the original image processing unit in sequence to acquire an image feature Fcover_2;c-3) constructing the first CPC module, the second CPC module, and the third CPC module, each comprising a first branch and a second branch, wherein the first branch comprises a first convolutional layer, a first BatchNorm layer, a first ReLU activation function, a second convolutional layer, a second BatchNorm layer, a second ReLU activation function, a third convolutional layer, a third BatchNorm layer, and a third ReLU activation function in sequence, while the second branch comprises an average pooling layer, a first convolutional layer, a ReLU activation function, and a second convolutional layer in sequence; inputting the image feature Fcover_2 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature Fcover_2_1; inputting the image feature Fcover_2_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature Fcover_2_2; inputting the image feature Fcover_2_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature Fcover_2_3; inputting the image feature Fcover_2 into the second branch of the first CPC module to acquire an image feature Fcover_3; subjecting the image feature Fcover_3 and the image feature Fcover_2_3 to element-wise multiplication to acquire an image feature Fcover_4; subjecting the image feature Fcover_4 and the image feature Fcover_2 to corresponding-elements addition to acquire an image feature Fcover_5; inputting the image feature Fcover_5 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature Fcover_5_1; inputting the image feature Fcover_5_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature Fcover_5_2; inputting the image feature Fcover_5_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature Fcover_5; inputting the image feature Fcover_5 into the second branch of the second CPC module to acquire an image feature Fcover_6; subjecting the image feature Fcover_6 and the image feature Fcover_5_3 to the element-wise multiplication to acquire an image feature Fcover_7; subjecting the image feature Fcover_7 and the image feature Fcover_5 to the corresponding-elements addition to acquire an image feature Fcover_8; inputting the image feature Fcover_8 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature Fcover_8_1; inputting the image feature Fcover_8_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature Fcover_8_2; inputting the image feature Fcover_8_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature Fcover_8_3; inputting the image feature Fcover_8 into the second branch of the third CPC module to acquire an image feature Fcover_9; subjecting the image feature Fcover_9 and the image feature Fcover_8_3 to the element-wise multiplication to acquire an image feature Fcover_10; and subjecting the image feature Fcover_10 and the image feature Fcover_8 to the corresponding-elements addition to acquire an image feature Fcover_11;c-4) constructing the watermark processing unit of the encoder, comprising a linear layer, a convolutional layer, a first BatchNorm layer, a first ReLU activation function, an atrous convolutional layer, a second ReLU activation function, a first Dropout layer, a first deconvolutional layer, a second BatchNorm layer, a third ReLU activation function, a second deconvolutional layer, a fourth ReLU activation function, a second Dropout layer, a first CPC module, a second CPC module, and a third CPC module; inputting the watermark Wm into the linear layer of the watermark processing unit to acquire a watermark feature f1; inputting the watermark feature f1 into the convolutional layer, the first BatchNorm layer, and the first ReLU activation function of the watermark processing unit in sequence to acquire a watermark feature f2; inputting the watermark feature f2 into the atrous convolutional layer, the second ReLU activation function, and the first Dropout layer of the watermark processing unit in sequence to acquire a watermark feature f3; inputting the watermark feature f3 into the first deconvolutional layer, the second BatchNorm layer, and the third ReLU activation function of the watermark processing unit in sequence to acquire a watermark feature f4; inputting the watermark feature f4 into the second deconvolutional layer, the fourth ReLU activation function, and the second Dropout layer of the watermark processing unit in sequence to acquire a watermark feature f5; inputting the watermark feature f5 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the first CPC module in sequence to acquire a watermark feature fm_5_1; inputting the watermark feature fm_5_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the first CPC module in sequence to acquire a watermark feature fm_5_2; inputting the watermark feature fm_5_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the first CPC module in sequence to acquire a watermark feature fm_5_3; inputting the watermark feature f5 into the second branch of the first CPC module to acquire a watermark feature fm_6; subjecting the watermark feature fm_6 and the watermark feature fm_5_3 to the element-wise multiplication to acquire a watermark feature fm_7; subjecting the watermark feature fm_7 and the watermark feature f5 to the corresponding-elements addition to acquire a watermark feature fm_8; inputting the watermark feature fm_8 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the second CPC module in sequence to acquire a watermark feature fm_8_1; inputting the watermark feature fm_8_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the second CPC module in sequence to acquire a watermark feature fm_8_2; inputting the watermark feature fm_8_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the second CPC module in sequence to acquire a watermark feature fm_8_3; inputting the watermark feature fm_8 into the second branch of the second CPC module to acquire a watermark feature fm_9; subjecting the watermark feature fm_9 and the watermark feature fm_8_3 to the element-wise multiplication to acquire a watermark feature fm_10; subjecting the watermark feature fm_10 and the watermark feature fm_8 to the corresponding-elements addition to acquire a watermark feature fm_11; inputting the watermark feature fm_11 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the third CPC module in sequence to acquire a watermark feature fm_11_1; inputting the watermark feature fm_11_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the third CPC module in sequence to acquire a watermark feature fm_11_2; inputting the watermark feature fm_11_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the third CPC module in sequence to acquire a watermark feature fm_11_3; inputting the watermark feature fm_11 into the second branch of the third CPC module to acquire a watermark feature fm_12; subjecting the watermark feature fm_12 and the watermark feature fm_11_3 to the element-wise multiplication to acquire a watermark feature fm_13; and subjecting the watermark feature fm_13 and the watermark feature fm_11 to the corresponding-elements addition to acquire a watermark feature f6; andc-5) subjecting the image feature Fcover_11 and the watermark feature f6 to the corresponding-elements addition to acquire a feature F1; inputting the feature F1 into the first convolutional layer, the BatchNorm layer, and the activation function layer of the encoder in sequence to acquire a feature F2; and inputting the feature F2 into the second convolutional layer of the encoder to acquire the watermark image Iwm.
  • 5. The active-defense detection method based on the facial landmark watermarking according to claim 4, wherein in the step c-2), the convolutional layer of the original image processing unit comprises 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; and the atrous convolutional layer of the original image processing unit comprises 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1;in the step c-3), the first convolutional layer, the second convolutional layer, and the third convolutional layer in the first branch each comprise 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; the first convolutional layer and the second convolutional layer in the second branch each comprise 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; and the average pooling layer in the second branch has a window size of 4;in the step c-4), the linear layer of the watermark processing unit comprises 256 input nodes and 256 output nodes; the convolutional layer of the watermark processing unit comprises 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; the atrous convolutional layer of the watermark processing unit comprises 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1; and the first deconvolutional layer and the second deconvolutional layer of the watermark processing unit each comprise 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; andin the step c-5), the first convolutional layer of the encoder comprises 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; and the second convolutional layer of the encoder comprises 3 channels and a convolutional kernel, with a size of 1, a stride of 1, and a padding of 1.
  • 6. The active-defense detection method based on the facial landmark watermarking according to claim 1, wherein the step d) comprises: d-1) constructing the noise pool, comprising Identity noise, Dropout noise, Crop noise, GaussianNoise noise, SaltPepper noise, GaussianBlur noise, MedBlur noise, and joint photographic experts group (JPEG) noise; injecting the watermark image Iwm into the noise pool; and adding a noise randomly selected from the noise pool to the watermark image Iwm to form the noise image Inoise; andd-2) constructing the malicious pool, comprising a simple swapping (SimSwap) model, an information bottleneck disentanglement for identity swapping (InfoSwap) model, a unified cross-entropy loss for deep face recognition (UniFace) model, and attribute manipulation algorithms; injecting the watermark image Iwm into the malicious pool; and manipulating, by a model or attribute manipulation algorithm randomly selected from the malicious pool, the watermark image Iwm to form the malicious image Idep.
  • 7. The active-defense detection method based on the facial landmark watermarking according to claim 4, wherein the step e) comprises: e-1) constructing the decoder, comprising a first convolutional layer, a first BatchNorm layer, a first ReLU activation function, a first atrous convolutional layer, a second ReLU activation function, a first Dropout layer, a first CPC module, a second CPC module, a third CPC module, a second convolutional layer, a second BatchNorm layer, a third ReLU activation function, a second atrous convolutional layer, a fourth ReLU activation function, a second Dropout layer, a flatten layer, and a fully connected layer;inputting the noise image Inoise or the malicious image Idep into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function of the decoder in sequence to acquire an image feature N1;inputting the image feature N1 into the first atrous convolutional layer, the second ReLU activation function, and the first Dropout layer of the decoder in sequence to acquire an image feature N2;inputting the image feature N2 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature N2_1;inputting the image feature N2_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature N2_2;inputting the image feature N2_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature N2_3;inputting the image feature N2 into the second branch of the first CPC module to acquire an image feature N3;subjecting the image feature N3 and the image feature N2_3 to the element-wise multiplication to acquire an image feature N4;subjecting the image feature N4 and the image feature N2 to the corresponding-elements addition to acquire an image feature N5;inputting the image feature N5 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature N5_1;inputting the image feature N5_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature N5_2;inputting the image feature N5_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature N5_3;inputting the image feature N5 into the second branch of the second CPC module to acquire an image feature N6;subjecting the image feature N6 and the image feature N5_3 to the element-wise multiplication to acquire an image feature N7;subjecting the image feature N7 and the image feature N5 to the corresponding-elements addition to acquire an image feature N8;inputting the image feature N8 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature N8_1;inputting the image feature N8_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature N8_2;inputting the image feature N8_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature N8_3;inputting the image feature N8 into the second branch of the third CPC module to acquire an image feature N9;subjecting the image feature N9 and the image feature N8_3 to the element-wise multiplication to acquire an image feature N10;subjecting the image feature N10 and the image feature N8 to the corresponding-elements addition to acquire an image feature N11;inputting the image feature N11 into the second convolutional layer, the second BatchNorm layer, and the third ReLU activation function of the decoder in sequence to acquire an image feature N12;inputting the image feature N12 into the second atrous convolutional layer, the fourth ReLU activation function, and the second Dropout layer of the decoder in sequence to acquire an image feature N13;inputting the image feature N13 into the flatten layer of the decoder to acquire an image feature N4; andinputting the image feature N14 into the fully connected layer of the decoder to acquire the final watermark Wm1.
  • 8. The active-defense detection method based on the facial landmark watermarking according to claim 7, wherein in the step e-1), the first convolutional layer of the decoder comprises 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1;the first atrous convolutional layer of the decoder comprises 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1;the second convolutional layer of the decoder comprises 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1;the second atrous convolutional layer of the decoder comprises 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1; andthe flatten layer and the fully connected layer of the decoder each comprise 256 neurons.
  • 9. The active-defense detection method based on the facial landmark watermarking according to claim 1, wherein the step f) comprises: f-1) defining a constant count1 with an initial value of 0; determining whether binary values at corresponding positions of the final watermark Wm1 and the watermark Wm are the same; and when the binary values of the final watermark Wm1 and the watermark Wm are different in one bit: incrementing the constant count1 by 1, and dividing a final value of the constant count1 by 256 to acquire a bit error rate Ebit;f-2) determining that the noise image Inoise is a real image when the bit error rate Ebit is less than 0.5; and determining that the noise image Inoise is a fake image when the bit error rate Ebit is greater than or equal to 0.5;f-3) replacing the i-th facial image Icover_i in the step b) with the malicious image Idep, and repeating the step b) to acquire a watermark W′m;f-4) defining a constant count2 with an initial value of 0; determining whether binary values at corresponding positions of the watermark W′m and the watermark Wm are the same; and when the binary values of the watermark W′m and the watermark Wm are different in one bit: incrementing the constant count2 by 1, and dividing a final value of the constant count2 by 256 to acquire a bit error rate E′bit; andf-5) determining that the malicious image Idep is a real image when the bit error rate E′bit is less than or equal to 0.5; and determining that the malicious image Idep is a fake image when the bit error rate E′bit, is greater than 0.5.
Priority Claims (1)
Number Date Country Kind
2023115612141 Nov 2023 CN national