ACTIVE-DEFENSE DETECTION METHOD BASED ON FACIAL LANDMARK WATERMARKING

Description

CROSS-REFERENCE TO THE RELATED APPLICATIONS

This application is based upon and claims priority to Chinese Patent Application No. 202311561214.1, filed on Nov. 22, 2023, the entire contents of which are incorporated herein by reference.

TECHNICAL FIELD

The present disclosure relates to the field of Deepfake detection, and in particular to an active-defense detection method based on facial landmark watermarking.

BACKGROUND

In recent years, Deepfake technology has entered an increasing number of fields in academia and industry, and has been widely used in multimedia forms such as video, audio, and images to generate fake multimedia products, resulting in various legal and ethical issues. In order to combat the aggressiveness of Deepfake technology, a new research branch called Deepfake detection has emerged. Currently, Deepfake detection mainly focuses on passive detection, that is, detection by artifacts in fake faces after generation. Passive detection methods usually only rely on passive defense and ex-post evidence collection when detecting deeply faked images or videos. This means that they cannot prevent the generation and propagation of deepfakes, and cannot avoid the potential harm caused by fake content.

At present, in methods based on semi-fragile watermarks, a watermark can be used to detect authenticity but cannot achieve tracing. In addition, in methods based on robust watermarks, the watermark embedded into the image is a randomly generated or fixed watermark, and unique watermarks cannot be generated for each individual.

SUMMARY

In order to overcome the above shortcomings in the prior art, the present disclosure provides an active-defense detection method based on facial landmark watermarking, which can generate unique watermarks for each individual and achieve traceability and detection functions.

In order to solve the technical problem, the present disclosure adopts the following technical solution.

The active-defense detection method based on facial landmark watermarking includes the following steps:

- a) acquiring n facial images to form a facial image set I, I={I₁, I₂, . . . , I_i. . . , I_n}, where I_idenotes an i-th facial image, i∈{1, . . . , n}; and preprocessing the i-th facial image I_i, i ∈ {1, . . . ,n}, to acquire a preprocessed i-th facial image I_{cover_i}, thereby acquiring a preprocessed facial image set I_cover;
- b) extracting facial landmarks from the preprocessed i-th facial image I_{cover_i}, and converting the facial landmarks into a watermark W_m;
- c) constructing an encoder, and inputting the i-th facial image I_{cover_i}and the watermark W_minto the encoder to acquire a watermark image I_wm;
- d) injecting the watermark image I_wminto a noise pool to acquire a noise image I_noise, and injecting the watermark image I_wminto a malicious pool to acquire a malicious image I_dep;
- e) constructing a decoder, and inputting the noise image I_noiseor the malicious image I_depinto the decoder to acquire a final watermark W_m1; and
- f) determining whether the noise image I_noiseand the malicious image I_depare real or fake images based on the final watermark W_m1.

Further, the step a) includes:

- a-1) acquiring the n facial images from a CelebA-HQ dataset to form the facial image set I; and
- a-2) resizing, by a resize( ) function in a Python imaging library (PIL), the i-th facial image I_iinto a 256×256 image, thereby acquiring the preprocessed i-th facial image I_{cover_i}and acquiring the preprocessed facial image set I_cover={I_{cover_1}, I_{cover_2}, . . . , I_{cover_i}, . . . , I_{cover_n}}.

Further, the step b) includes:

- b-1) detecting, by a Dlib facial landmark detection algorithm, the facial landmarks in the preprocessed i-th facial image I_{cover_i}to acquire a facial landmark set Lm including m facial landmarks, L_m={l₁, l₂, . . . , l_m}, m=68, where {l₁,l₂, . . . ,l₁₇} are landmarks of a jawline, {l₁₈, l₁₉, . . . , l₂₂} are landmarks of a right eyebrow, {l₂₃, l₂₄, . . . ,l₂₇} are landmarks of a left eyebrow, {l₂₈, l₂₉, . . . ,l₃₆} are landmarks of a nose, {l₃₇, l₃₈, . . . ,l₄₂} are landmarks of a right eye, {l₄₃, l₄₄, . . . , l₄₈} are landmarks of a left eye, and {l₄₉,l₅₀, . . . ,l₆₈} are landmarks of a mouth; and
- b-2) defining an i-th landmark l_iby a horizontal coordinate x_iand a vertical coordinate y_i; mapping a value of the horizontal coordinate x_ito an integer range of 0-15 through a linear transformation, and converting, by a bin( ) function in Python, the value into a binary representation W_x_iwith a length of 4; mapping a value of the vertical coordinate y_ito an integer range of 0-15 through a linear transformation, and converting, by the bin( ) function in Python, the value into a binary representation W_y_iwith a length of 4; splicing the binary representation W_x_iand the binary representation W_y_iinto a binary representation W_xy_-iwith a length of 8; splicing binary representations of the 68 facial landmarks together into a binary representation W₆₈with a length of 544; and compressing, by a principal component analysis (PCA)-based dimensionality reduction method, the binary representation W₆₈to a binary representation with a length of 256 as the watermark W_m.

Further, the step c) includes:

- c-1) constructing the encoder, including an original image processing unit, a watermark processing unit, a first convolutional layer, a batch normalization (BatchNorm) layer, an activation function layer, and a second convolutional layer;
- c-2) constructing the original image processing unit of the encoder, including a convolutional layer, a BatchNorm layer, a first rectified linear unit (ReLU) activation function, an atrous convolutional layer, a second ReLU activation function, a Dropout layer, a first combined pooling and convolution (CPC) module, a second CPC module, and a third CPC module; inputting the i-th facial image I_{cover_i}into the convolutional layer, the BatchNorm layer, and the first ReLU activation function of the original image processing unit in sequence to acquire an image feature F_{cover_1}; and inputting the image feature F_{cover_1}into the atrous convolutional layer, the second ReLU activation function, and the Dropout layer of the original image processing unit in sequence to acquire an image feature F_{cover_2};
- c-3) constructing the first CPC module, the second CPC module, and the third CPC module, each including a first branch and a second branch, where the first branch includes a first convolutional layer, a first BatchNorm layer, a first ReLU activation function, a second convolutional layer, a second BatchNorm layer, a second ReLU activation function, a third convolutional layer, a third BatchNorm layer, and a third ReLU activation function in sequence, while the second branch includes an average pooling layer, a first convolutional layer, a ReLU activation function, and a second convolutional layer in sequence; inputting the image feature F_{cover_2}into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature F_{cover_2_1}; inputting the image feature F_{cover_2_1}into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature F_{cover_2_2}; inputting the image feature F_{cover_2_2}into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature F_{cover_2_3}; inputting the image feature F_{cover_2}into the second branch of the first CPC module to acquire an image feature F_{cover_3}; subjecting the image feature F_{cover_3}and the image feature F_{cover_2_3}to element-wise multiplication to acquire an image feature F_{cover_4}; subjecting the image feature F_{cover_4}and the image feature F_{cover_2}to corresponding-elements addition to acquire an image feature F_{cover_5}; inputting the image feature F_{cover_5}into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature F_{cover_5_1}; inputting the image feature F_{cover_5_1}into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature F_{cover_5_2}; inputting the image feature F_{cover_5_2}into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature F_{cover_5_3}; inputting the image feature F_{cover_5}into the second branch of the second CPC module to acquire an image feature F_{cover_6}; subjecting the image feature F_{cover_6}and the image feature F_{cover_5_3}to element-wise multiplication to acquire an image feature F_{cover_7}; subjecting the image feature F_{cover_7}and the image feature F_{cover_5}to corresponding-elements addition to acquire an image feature F_{cover_8}; inputting the image feature F_{cover_8}into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature F_{cover_8_1}; inputting the image feature F_{cover_8_1}into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature F_{cover_8_2}; inputting the image feature F_{cover_8_2}into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature F_{cover_8_3}; inputting the image feature F_{cover_8}into the second branch of the third CPC module to acquire an image feature F_{cover_9}; subjecting the image feature F_{cover_9}and the image feature F_{cover_8_3}to element-wise multiplication to acquire an image feature F_{cover_10}; and subjecting the image feature F_{cover_10}and the image feature F_{cover_8}to corresponding-elements addition to acquire an image feature F_{cover_11};
- c-4) constructing the watermark processing unit of the encoder, including a linear layer, a convolutional layer, a first BatchNorm layer, a first ReLU activation function, an atrous convolutional layer, a second ReLU activation function, a first Dropout layer, a first deconvolutional layer, a second BatchNorm layer, a third ReLU activation function, a second deconvolutional layer, a fourth ReLU activation function, a second Dropout layer, a first CPC module, a second CPC module, and a third CPC module; inputting the watermark W_minto the linear layer of the watermark processing unit to acquire a watermark feature f₁; inputting the watermark feature f₁into the convolutional layer, the first BatchNorm layer, and the first ReLU activation function of the watermark processing unit in sequence to acquire a watermark feature f₂; inputting the watermark feature f₂into the atrous convolutional layer, the second ReLU activation function, and the first Dropout layer of the watermark processing unit in sequence to acquire a watermark feature f₃; inputting the watermark feature f₃into the first deconvolutional layer, the second BatchNorm layer, and the third ReLU activation function of the watermark processing unit in sequence to acquire a watermark feature f₄; inputting the watermark feature f₄into the second deconvolutional layer, the fourth ReLU activation function, and the second Dropout layer of the watermark processing unit in sequence to acquire a watermark feature f₅; inputting the watermark feature f₅into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the first CPC module in sequence to acquire a watermark feature f_{m_5_1}; inputting the watermark feature f_{m_5_1}into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the first CPC module in sequence to acquire a watermark feature f_{m_5_2}; inputting the watermark feature f_{m_5_2}into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the first CPC module in sequence to acquire a watermark feature f_{m_5_3}; inputting the watermark feature f₅into the second branch of the first CPC module to acquire a watermark feature f_{m_6}; subjecting the watermark feature f_{m_6}and the watermark feature f_{m_5_3}to element-wise multiplication to acquire a watermark feature f_{m_7}; subjecting the watermark feature f_{m_7}and the watermark feature f₅to corresponding-elements addition to acquire a watermark feature f_{m_8}; inputting the watermark feature f_{m_8}into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the second CPC module in sequence to acquire a watermark feature f_{m_8_1}; inputting the watermark feature f_{m_8_1}into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the second CPC module in sequence to acquire a watermark feature f_{m_8_2}; inputting the watermark feature f_{m_8_2}into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the second CPC module in sequence to acquire a watermark feature f_{m_8_3}; inputting the watermark feature f_{m_8}into the second branch of the second CPC module to acquire a watermark feature f_{m_9}; subjecting the watermark feature f_{m_9}and the watermark feature f_{m_8_3}to element-wise multiplication to acquire a watermark feature f_{m_10}; subjecting the watermark feature f_{m_10}and the watermark feature f_{m_8}to corresponding-elements addition to acquire a watermark feature f_{m_11}; inputting the watermark feature f_{m_11}into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the third CPC module in sequence to acquire a watermark feature f_{m_11_1}; inputting the watermark feature f_{m_11_1}into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the third CPC module in sequence to acquire a watermark feature f_{m_11_2}; inputting the watermark feature f_{m_11_2}into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the third CPC module in sequence to acquire a watermark feature f_{m_11_3}; inputting the watermark feature f_{m_11}into the second branch of the third CPC module to acquire a watermark feature f_{m_12}; subjecting the watermark feature f_{m_12}and the watermark feature f_{m_11_3}to element-wise multiplication to acquire a watermark feature f_{m_13}; and subjecting the watermark feature f_{m_13}and the watermark feature f_{m_11}to corresponding-elements addition to acquire a watermark feature f₆; and
- c-5) subjecting the image feature F_{cover_11}and the watermark feature f₆to corresponding-elements addition to acquire a feature F₁; inputting the feature F₁into the first convolutional layer, the BatchNorm layer, and the activation function layer of the encoder in sequence to acquire a feature F₂; and inputting the feature F₂into the second convolutional layer of the encoder to acquire the watermark image I_wm.

Preferably, in the step c-2), the convolutional layer of the original image processing unit includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; and the atrous convolutional layer of the original image processing unit includes 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1; in the step c-3), the first convolutional layer, the second convolutional layer, and the third convolutional layer in the first branch each include 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; the first convolutional layer and the second convolutional layer in the second branch each include 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; and the average pooling layer in the second branch has a window size of 4; in the step c-4), the linear layer of the watermark processing unit includes 256 input nodes and 256 output nodes; the convolutional layer of the watermark processing unit includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; the atrous convolutional layer of the watermark processing unit includes 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1; and the first deconvolutional layer and the second deconvolutional layer of the watermark processing unit each include 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; in the step c-5), the first convolutional layer of the encoder includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; and the second convolutional layer of the encoder includes 3 channels and a convolutional kernel, with a size of 1, a stride of 1, and a padding of 1.

Further, the step d) includes:

- d-1) constructing the noise pool, including Identity noise, Dropout noise, Crop noise, GaussianNoise noise, SaltPepper noise, GaussianBlur noise, MedBlur noise, and joint photographic experts group (JPEG) noise; injecting the watermark image I_wminto the noise pool; and adding a noise randomly selected from the noise pool to the watermark image I_wmto form the noise image I_noise; and
- d-2) constructing the malicious pool, including a simple swapping (SimSwap) model, an information bottleneck disentanglement for identity swapping (InfoSwap) model, a unified cross-entropy loss for deep face recognition (UniFace) model, and attribute manipulation algorithms; injecting the watermark image I_wminto the malicious pool; and manipulating, by a model or attribute manipulation algorithm randomly selected from the malicious pool, the watermark image I_wmto form the malicious image I_dep.

Further, the step e) includes:

- e-1) constructing the decoder, including a first convolutional layer, a first BatchNorm layer, a first ReLU activation function, a first atrous convolutional layer, a second ReLU activation function, a first Dropout layer, a first CPC module, a second CPC module, a third CPC module, a second convolutional layer, a second BatchNorm layer, a third ReLU activation function, a second atrous convolutional layer, a fourth ReLU activation function, a second Dropout layer, a flatten layer, and a fully connected layer; inputting the noise image I_noiseor the malicious image I_depinto the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function of the decoder in sequence to acquire an image feature N₁; inputting the image feature N₁into the first atrous convolutional layer, the second ReLU activation function, and the first Dropout layer of the decoder in sequence to acquire an image feature N₂; inputting the image feature N₂into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature N_{2_1}; inputting the image feature N_{2_1}into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature N_{2_2}; inputting the image feature N_{2_2}into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature N_{2_3}; inputting the image feature N₂into the second branch of the first CPC module to acquire an image feature N₃; subjecting the image feature N₃and the image feature N_{2_3}to element-wise multiplication to acquire an image feature N₄; subjecting the image feature N₄and the image feature N₂to corresponding-elements addition to acquire an image feature N₅; inputting the image feature N₅into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature N_{5_1}; inputting the image feature N_{5_1}into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature N_{5_2}; inputting the image feature N_{5_2}into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature N_{5_3}; inputting the image feature N₅into the second branch of the second CPC module to acquire an image feature N₆; subjecting the image feature N₆and the image feature N_{5_3}to element-wise multiplication to acquire an image feature N₇; subjecting the image feature N₇and the image feature N₅to corresponding-elements addition to acquire an image feature N₈; inputting the image feature N₈into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature N_{8_1}; inputting the image feature N_{8_1}into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature N_{8_2}; inputting the image feature N_{8_2}into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature N_{8_3}; inputting the image feature N₈into the second branch of the third CPC module to acquire an image feature N₉; subjecting the image feature N₉and the image feature N_{8_3}to element-wise multiplication to acquire an image feature N₁₀; subjecting the image feature N₁₀and the image feature N₈to corresponding-elements addition to acquire an image feature N₁₁; inputting the image feature N₁₁into the second convolutional layer, the second BatchNorm layer, and the third ReLU activation function of the decoder in sequence to acquire an image feature N₁₂; inputting the image feature N₁₂into the second atrous convolutional layer, the fourth ReLU activation function, and the second Dropout layer of the decoder in sequence to acquire an image feature N; inputting the image feature N₁₃into the flatten layer of the decoder to acquire an image feature N₁₄; and inputting the image feature N₁₄into the fully connected layer of the decoder to acquire the final watermark W_m1.

Preferably, in the step e-1), the first convolutional layer of the decoder includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; the first atrous convolutional layer of the decoder includes 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1; the second convolutional layer of the decoder includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; the second atrous convolutional layer of the decoder includes 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1; and the flatten layer and the fully connected layer of the decoder each include 256 neurons.

Further, the step f) includes:

- f-1) defining a constant count1 with an initial value of 0; determining whether binary values at corresponding positions of the final watermark W_m1and the watermark W_mare the same; and if the binary values of the final watermark W_m1and the watermark W_mare different in one bit: incrementing the constant count1 by 1, and dividing a final value of the constant count1 by 256 to acquire a bit error rate E_bit;
- f-2) determining that the noise image I_noiseis areal image if the bit error rate E_bitis less than 0.5; and determining that the noise image I_noiseis a fake image if the bit error rate E_bitis greater than or equal to 0.5;
- f-3) replacing the i-th facial image I_{cover_i}in the step b) with the malicious image I_dep, and repeating the step b) to acquire a watermark W′_m;
- f-4) defining a constant count2 with an initial value of 0; determining whether binary values at corresponding positions of the watermark W′_mand the watermark W_mare the same; and if the binary values of the watermark W′_mand the watermark W_mare different in one bit: incrementing the constant count2 by 1, and dividing a final value of the constant count2 by 256 to acquire a bit error rate E′_min; and
- f-5) determining that the malicious image I_depis a real image if the bit error rate E′_bit, is less than or equal to 0.5; and determining that the malicious image I_depis a fake image if the bit error rate E′_bit, is greater than 0.5.

The present disclosure has the following beneficial effects. The present disclosure extracts facial landmarks from an original image and converts the extracted facial landmarks into a binary watermark. The present disclosure embeds the binary watermark into the original image to acquire a watermark image, allowing the watermark image to undergo a non-malicious/malicious operation to form a noise image or a malicious image. In this way, the model is robust to the non-malicious/malicious operation. The present disclosure introduces facial landmarks to generate a unique watermark for each individual and achieve traceability and detection functions.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flowchart of an active-defense detection method based on facial landmark watermarking according to the present disclosure;

FIG. 2 is a structural diagram of landmark extraction according to the present disclosure;

FIG. 3 is a structural diagram of an encoder according to the present disclosure; and

FIG. 4 is a structural diagram of a decoder according to the present disclosure.

DETAILED DESCRIPTION OF THE EMBODIMENTS

The present disclosure is further described with reference to FIGS. 1 to 4.

An active-defense detection method based on facial landmark watermarking includes the following steps.

a) n facial images are acquired to form facial image set I I=(I₁, I₂, . . . , I_i, . . . , I_n, where I, denotes an i-th facial image, i ∈{1, . . . , n}. The i-th facial image I_i, i ∈ {1, . . . ,n}, is preprocessed to acquire preprocessed i-th facial image I_{cover_i}, thereby acquiring preprocessed facial image set I_cover.

b) Facial landmarks are extracted from the preprocessed i-th facial image I_{cover_i}, and are converted is input into watermark W_m.

c) An encoder is constructed, and the i-th facial image I_{cover_i}and the watermark W_mare input is input into the encoder to acquire watermark image I_wm.

d) The watermark image I_wmis injected into a noise pool to acquire noise image I_noise, and the watermark image I_wmis injected into a malicious pool to acquire malicious image I_dep.

e) A decoder is constructed, and the noise image I_noiseor the malicious image I_depis input into the decoder to acquire final watermark W_m1.

f) it is determined whether the noise image I_noiseand the malicious image I_depare real or fake images based on the final watermark W_m1.

The present disclosure converts the extracted facial landmarks is input into a binary watermark. The present disclosure embeds the binary watermark is input into the original image to acquire watermark image, allowing the watermark image to undergo a non-malicious/malicious operation to form noise image. In this way, the model is robust to the non-malicious/malicious operation. The present disclosure can generate a unique watermark for each individual and achieve traceability and detection functions. The present disclosure is based on the idea of adversarial attacks. Typically, active defense involves two methods. Firstly, adversarial perturbations are added to images or videos to distort the content generated by Deepfake, achieving the effect of “knowing it is fake at a glance”. Secondly, adversarial watermarks are added to images or videos. Unlike adding perturbations, adding watermarks is done by training the robustness of the watermark. At present, in methods based on semi-fragile watermarks, a watermark can only detect authenticity and cannot achieve tracing. In addition, in methods based on robust watermarks, watermarks embed is input into the image are randomly generated or fixed watermarks, and unique watermarks cannot be generated for each individual.

In an embodiment of the present disclosure, the step a) is as follows.

a-1) The n facial images are acquired from a CelebA-HQ dataset to form the facial image set I. The CelebA-HQ dataset includes 30,000 facial images with different identities, each with a resolution of 1024*1024.

a-2) The i-th facial image I_iis resized by a resize( ) function in a Python imaging library (PIL) into a 256×256 image, thereby acquiring the preprocessed i-th facial image I_{cover_i}and acquiring the preprocessed facial image set I_cover={I_{cover_1}, I_{cover_2}, . . . , I_{cover_i}, . . . , I_{cover_n}}.

In an embodiment of the present disclosure, the step b) is as follows.

b-1) The facial landmarks in the preprocessed i-th facial image I_{cover_i}are detected by a Dlib facial landmark detection algorithm to acquire facial landmark set L_mincluding m facial landmarks, L_m={l₁,l₂, . . . ,l_m}, m=68, where {l₁,l₂, . . . , l₁₇} are landmarks of a jawline, {l₁₈, l₁₉, . . . , l₂₂} are landmarks of a right eyebrow, {l₂₃,l₂₄, . . . ,l₂₇} are landmarks of a left eyebrow, {l₂₈, l₂₉, . . . , l₃₆} are landmarks of a nose, {l₃₇,l₃₈, . . . ,l₄₂} are landmarks of a right eye, {l₄₃,l₄₄′, . . . , l₄₈} are landmarks of a left eye, and {l₄₉,l₅₀, . . . ,l₆₈} are landmarks of a mouth.

b-2) i-th landmark l_iis defined by horizontal coordinate x, and vertical coordinate y_i. A value of the horizontal coordinate x_iis mapped to an integer range of 0-15 through a linear transformation, and the value is converted by a bin( ) function in Python into binary representation W_x_iwith a length of 4. A value of the vertical coordinate y_iis mapped to an integer range of 0-15 through a linear transformation, and the value is converted by the bin( ) function in Python into binary representation W_y_iwith a length of 4. The binary representation W_x_iand the binary representation W_y_iare spliced into binary representation W_xy_-iwith a length of 8. Binary representations of the 68 facial landmarks are spliced together input into binary representation W₆₈with a length of 544. The binary representation W₆₈is compressed by a principal component analysis (PCA)-based dimensionality reduction method to a binary representation with a length of 256 as the watermark W_m.

In an embodiment of the present disclosure, the step c) is as follows.

c-1) The encoder is constructed, including an original image processing unit, a watermark processing unit, a first convolutional layer, a batch normalization (BatchNorm) layer, an activation function layer, and a second convolutional layer.

c-2) The original image processing unit of the encoder is constructed, including a convolutional layer, a BatchNorm layer, a first rectified linear unit (ReLU) activation function, an atrous convolutional layer, a second ReLU activation function, a Dropout layer, a first CPC module, a second CPC module, and a third CPC module. The i-th facial image I_{cover_i}is input into the convolutional layer, the BatchNorm layer, and the first ReLU activation function of the original image processing unit in sequence to acquire image feature F_{cover_1}. The image feature F_{cover_1}is input into the atrous convolutional layer, the second ReLU activation function, and the Dropout layer of the original image processing unit in sequence to acquire image feature F_{cover_2}.

c-3) The first CPC module, the second CPC module, and the third CPC module are constructed, each including a first branch and a second branch, where the first branch includes a first convolutional layer, a first BatchNorm layer, a first ReLU activation function, a second convolutional layer, a second BatchNorm layer, a second ReLU activation function, a third convolutional layer, a third BatchNorm layer, and a third ReLU activation function in sequence, while the second branch includes an average pooling layer, a first convolutional layer, a ReLU activation function, and a second convolutional layer in sequence. The image feature F_{cover_2}is input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the first CPC module in sequence to acquire image feature F_{cover_2_1}. The image feature F_{cover_2_1}is input into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the first CPC module in sequence to acquire image feature F_{cover_2_2}. The image feature F_{cover_2_2}is input into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the first CPC module in sequence to acquire image feature F_{cover_2_3}. The image feature F_{cover_2}is input into the second branch of the first CPC module to acquire image feature F_{cover_3}. The image feature F_{cover_3}and the image feature F_{cover_2_3}are subjected to element-wise multiplication to acquire image feature F_{cover_4}. The image feature F_{cover_4}and the image feature F_{cover_2}are subjected to corresponding-elements addition to acquire image feature F_{cover_5}. The image feature F_{cover_5}is input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the second CPC module in sequence to acquire image feature F_{cover_5_1}. The image feature F_{cover_5_1}is input into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the second CPC module in sequence to acquire image feature F_{cover_5_2}. The image feature F_{cover_5_2}is input into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the second CPC module in sequence to acquire image feature F_{cover_5_3}. The image feature F_{cover_5}is input into the second branch of the second CPC module to acquire image feature F_{cover_6}. The image feature F_{cover_6}and the image feature F_{cover_5_3}are subjected to element-wise multiplication to acquire image feature F_{cover_7}. The image feature F_{cover_7}and the image feature F_{cover_5}are subjected to corresponding-elements addition to acquire image feature F_{cover_8}. The image feature F_{cover_8}is input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the third CPC module in sequence to acquire image feature F_{cover_8_1}. The image feature F_{cover_8_1}is input into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the third CPC module in sequence to acquire image feature F_{cover_8_2}. The image feature F_{cover_8_2}is input into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the third CPC module in sequence to acquire image feature F_{cover_8_3}. The image feature F_{cover_8}is input into the second branch of the third CPC module to acquire image feature F_{cover_9}. The image feature F_{cover_9}and the image feature F_{cover_8_3}are subjected to element-wise multiplication to acquire image feature F_{cover_1}. The image feature F_{cover_10}and the image feature F_{cover_8}are subjected to corresponding-elements addition to acquire image feature F_{cover_11}.

c-4) The watermark processing unit of the encoder is constructed, including a linear layer, a convolutional layer, a first BatchNorm layer, a first ReLU activation function, an atrous convolutional layer, a second ReLU activation function, a first Dropout layer, a first deconvolutional layer, a second BatchNorm layer, a third ReLU activation function, a second deconvolutional layer, a fourth ReLU activation function, a second Dropout layer, a first CPC module, a second CPC module, and a third CPC module. The watermark W_mis input into the linear layer of the watermark processing unit to acquire watermark feature f₁. The watermark feature f₁is input into the convolutional layer, the first BatchNorm layer, and the first ReLU activation function of the watermark processing unit in sequence to acquire watermark feature f₂. The watermark feature f₂is input into the atrous convolutional layer, the second ReLU activation function, and the first Dropout layer of the watermark processing unit in sequence to acquire watermark feature f₃. The watermark feature f₃is input into the first deconvolutional layer, the second BatchNorm layer, and the third ReLU activation function of the watermark processing unit in sequence to acquire watermark feature f₄. The watermark feature f₄is input into the second deconvolutional layer, the fourth ReLU activation function, and the second Dropout layer of the watermark processing unit in sequence to acquire watermark feature f₅. The watermark feature f₅is input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the first CPC module in sequence to acquire watermark feature f_{m_5_1}. The watermark feature f_{m_5_1}is input into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the first CPC module in sequence to acquire watermark feature f_{m_5_2}. The watermark feature f_{m_5_2}is input into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the first CPC module in sequence to acquire watermark feature f_{m_5_3}. The watermark feature f₅is input into the second branch of the first CPC module to acquire watermark feature f_{m_6}. The watermark feature f_{m_6}and the watermark feature f_{m_5_3}are subjected to element-wise multiplication to acquire watermark feature f_{m_7}. The watermark feature f_{m_7}and the watermark feature f₅are subjected to corresponding-elements addition to acquire watermark feature f_{m_8}. The watermark feature f_{m_8}is input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the second CPC module in sequence to acquire watermark feature f_{m_8_1}. The watermark feature f_{m_8_1}is input into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the second CPC module in sequence to acquire watermark feature f_{m_8_2}. The watermark feature f_{m_8_2}is input into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the second CPC module in sequence to acquire watermark feature f_{m_8_3}. The watermark feature f_{m_8}is input into the second branch of the second CPC module to acquire watermark feature f_{m_9}. The watermark feature f_{m_9}and the watermark feature f_{m_8_3}are subjected to element-wise multiplication to acquire watermark feature f_{m_10}. The watermark feature f_{m_10}and the watermark feature f_{m_8}are subjected to corresponding-elements addition to acquire watermark feature f_{m_11}. The watermark feature f_{m_11}is input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the third CPC module in sequence to acquire watermark feature f_{m_11_1}. The watermark feature f_{m_11_1}is input into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the third CPC module in sequence to acquire watermark feature f_{m_11_2}. The watermark feature f_{m_11_2}is input into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the third CPC module in sequence to acquire watermark feature f_{m_11_3}. The watermark feature f_{m_11}is input into the second branch of the third CPC module to acquire watermark feature f_{m_12}. The watermark feature f_{m_12}and the watermark feature f_{m_11_3}are subjected to element-wise multiplication to acquire watermark feature f_{m_13}. The watermark feature f_{m_13}and the watermark feature f_{m_11}are subjected to corresponding-elements addition to acquire watermark feature f₆.

c-5) The image feature F_{cover_11}and the watermark feature f₆are subjected to corresponding-elements addition to acquire feature F₁. The feature F₁is input into the first convolutional layer, the BatchNorm layer, and the activation function layer of the encoder in sequence to acquire feature F₂. The feature F₂is input into the second convolutional layer of the encoder to acquire the watermark image I_wm.

In the encoder, all convolutional layers, deconvolutional layers, and atrous convolutional layers are two-dimensional.

In this embodiment, preferably, in the step c-2), the convolutional layer of the original image processing unit includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1. The atrous convolutional layer of the original image processing unit includes 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1. in the step c-3), the first convolutional layer, the second convolutional layer, and the third convolutional layer in the first branch each include 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1. The first convolutional layer and the second convolutional layer in the second branch each include 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1. The average pooling layer in the second branch has a window size of 4. In the step c-4), the linear layer of the watermark processing unit includes 256 input nodes and 256 output nodes. The convolutional layer of the watermark processing unit includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1. The atrous convolutional layer of the watermark processing unit includes 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1. The first deconvolutional layer and the second deconvolutional layer of the watermark processing unit each include 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1. In the step c-5), the first convolutional layer of the encoder includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1. The second convolutional layer of the encoder includes 3 channels and a convolutional kernel, with a size of 1, a stride of 1, and a padding of 1.

In an embodiment of the present disclosure, the step d) is as follows.

d-1) The noise pool is constructed, including Identity noise, Dropout noise, Crop noise, GaussianNoise noise, SaltPepper noise, GaussianBlur noise, MedBlur noise, and joint photographic experts group (JPEG) noise. The watermark image I_wmis injected into the noise pool. A noise randomly selected from the noise pool is added to the watermark image I_wmto form the noise image I_noise. By implementing the source code described in the paper “MBRS: Enhancing Robustness of DNN-based Watermarking by Mini-Batch of Real and Simulated joint photographic experts group (JPEG) Compression”, the Identity noise, Dropout noise, Crop noise, GaussianNoise noise, SaltPepper noise, GaussianBlur noise, MedBlur noise, and joint photographic experts group (JPEG) noise are added. This is available in the prior art and will not be elaborated herein.

d-2) The malicious pool is constructed, including a simple swapping (SimSwap) model, an information bottleneck disentanglement for identity swapping (InfoSwap) model, a unified cross-entropy loss for deep face recognition (UniFace) model, and attribute manipulation algorithms (for manipulating nose, mouth, eyes, jawline, and eyebrow attributes). The watermark image I_wmis injected into the malicious pool. The watermark image I_wmis manipulated by a model or attribute manipulation algorithm randomly selected from the malicious pool to form the malicious image I_dep. The SimSwap model achieves face swapping through the source code described in the paper “SimSwap: An Efficient Framework for High Fidelity Face Swapping”. The InfoSwap model achieves face swapping through the source code described in the paper “InfoSwap: Information Bottleneck Disentengement for Identity Swapping”. The UniFace model achieves face swapping through the source code described in the paper “Designing One Unified Framework for High-Identity Face Reenactment and Swapping”. The manipulating the shape of attributes such as nose, mouth, eyes, jawline, and eyebrows is achieved through the source code described in the paper “StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation”. This is available in the prior art and will not be elaborated herein.

In an embodiment of the present disclosure, the step e) is as follows.

e-1) The decoder is constructed, including a first convolutional layer, a first BatchNorm layer, a first ReLU activation function, a first atrous convolutional layer, a second ReLU activation function, a first Dropout layer, a first CPC module, a second CPC module, a third CPC module, a second convolutional layer, a second BatchNorm layer, a third ReLU activation function, a second atrous convolutional layer, a fourth ReLU activation function, a second Dropout layer, a flatten layer, and a fully connected layer. The noise image I_noiseor the malicious image I_depis input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function of the decoder in sequence to acquire image feature N₁. The image feature N₁is input into the first atrous convolutional layer, the second ReLU activation function, and the first Dropout layer of the decoder in sequence to acquire image feature N₂. The image feature N₂is input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the first CPC module in sequence to acquire image feature N_{2_1}. The image feature N_{2_1}is input into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the first CPC module in sequence to acquire image feature N_{2_2}. The image feature N_{2_2}is input into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the first CPC module in sequence to acquire image feature N_{2_3}. The image feature N₂is input into the second branch of the first CPC module to acquire image feature N₃. The image feature N₃and the image feature N_{2_3}are subjected to element-wise multiplication to acquire image feature N₄. The image feature N₄and the image feature N₂are subjected to corresponding-elements addition to acquire image feature N₅. The image feature N₅is input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the second CPC module in sequence to acquire image feature N_{5_1}. The image feature N_{5_1}is input into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the second CPC module in sequence to acquire image feature N_{5_2}. The image feature N_{5_2}is input into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the second CPC module in sequence to acquire image feature N_{5_3}. The image feature N₅is input into the second branch of the second CPC module to acquire image feature N₆. The image feature N₆and the image feature N_{5_3}are subjected to element-wise multiplication to acquire image feature N₇. The image feature N₇and the image feature N₅are subjected to corresponding-elements addition to acquire image feature N₈. The image feature N₈is input into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the third CPC module in sequence to acquire image feature N_{8_1}. The image feature N_{8_1}is input into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the third CPC module in sequence to acquire image feature N_{8_2}. The image feature N_{8_2}is input into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the third CPC module in sequence to acquire image feature N_{8_3}. The image feature N₈is input into the second branch of the third CPC module to acquire image feature N₉. The image feature N₉and the image feature N_{8_3}are subjected to element-wise multiplication to acquire image feature N₁₀. The image feature N₁₀and the image feature N₈are subjected to corresponding-elements addition to acquire image feature N₁₁. The image feature N₁₁is input into the second convolutional layer, the second BatchNorm layer, and the third ReLU activation function of the decoder in sequence to acquire image feature N₁₂. The image feature N₁₂is input into the second atrous convolutional layer, the fourth ReLU activation function, and the second Dropout layer of the decoder in sequence to acquire image feature N₁₃. The image feature N₁₃is input into the flatten layer of the decoder to acquire image feature N₁₄. The image feature N₁₄is input into the fully connected layer of the decoder to acquire the final watermark W_m1.

In this embodiment, preferably, in the step e-1), the first convolutional layer of the decoder includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1. The first atrous convolutional layer of the decoder includes 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1. The second convolutional layer of the decoder includes 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1. The second atrous convolutional layer of the decoder includes 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1. The flatten layer and the fully connected layer of the decoder each include 256 neurons.

In an embodiment of the present disclosure, the step f) is as follows.

f-1) Constant count1 is defined with an initial value of 0. It is determined whether the binary values at corresponding positions of the final watermark W_m1and the watermark W_mare the same. If the binary values of the final watermark W_m1and the watermark W_mare different in one bit, it indicates that the binary values at corresponding positions of the final watermark W_m1and the watermark W_mare different. At this point, the constant count1 is incremented by 1, and a final value of the constant count1 is divided by 256 to acquire bit error rate E_bit.

f-2) If the bit error rate E_bitis less than 0.5, it indicates that the final watermark W_m1is the watermark W_mof the i-th facial image I_{cover_i}, and the face in the i-th facial image I_{cover_i}does not change, achieving a traceability function. Therefore, the noise image I_noiseis a real image. If the bit error rate E_bitis greater than or equal to 0.5, the noise image I_noiseis a fake image.

f-3) The malicious image I_depincludes a trace of manipulation. Therefore, the i-th facial image I_{cover_i}in the step b) is replaced with the malicious image I_dep, and the step b) is repeated to acquire watermark W′_m.

f-4) Constant count2 is defined with an initial value of 0. It is determined whether binary values at corresponding positions of the watermark W′_mand the watermark W_mare the same. If the binary values of the watermark W′_mand the watermark W_mare different in one bit, the constant count2 is incremented by 1, and a final value of the constant count2 is divided by 256 to acquire bit error rate E′_bit.

f-5) It is determined that the malicious image I_depis a real image if the bit error rate E′_bitis less than or equal to 0.5. It is determined that the malicious image I_depis a fake image if the bit error rate E′_bit, is greater than 0.5. Since the watermark in the malicious image I_depcan be robustly recovered from the decoder, the trustworthy original image with the watermark W_mcan be tracked through matching between facial landmarks and the watermark.

The quantitative comparison results of the bitwise restoration accuracy of watermarks after a common image processing operation and a malicious face swapping operation on the CelebA-HQ dataset at 256×256 resolution are shown in Table 1. The robustness of watermarks is measured based on the accuracy of watermark restoration. The method proposed by the present disclosure achieves an average accuracy of 98.95% in the common image processing operation, which is superior to the state-of-the-art methods. The average accuracy of the method proposed by the present disclosure is improved by 14.29% compared to MBRS and 18.56% compared to FaceSigns. The generalization ability of different face swapping algorithms is evaluated. The method proposed by the present disclosure restores watermarks with an average accuracy of 98.05%, which is improved by 47.82% compared to MBRS and 47.94% compared to FaceSigns.

Finally, it should be noted that the above descriptions are only preferred embodiments of the present disclosure, and are not intended to limit the present disclosure. Although the present disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art may still modify the technical solutions described in the foregoing embodiments, or equivalently substitute some technical features thereof. Any modification, equivalent substitution, improvement, etc. within the spirit and principles of the present disclosure shall fall within the scope of protection of the present disclosure.

Claims

1. An active-defense detection method based on a facial landmark watermarking, comprising the following steps: a) acquiring n facial images to form a facial image set I, I={I1, I2, . . . , Ii, . . . , In}, wherein Ii denotes an i-th facial image, i ∈ {1, . . . ,n}; and preprocessing the i-th facial image Ii, i ε {1, . . . ,n}, to acquire a preprocessed i-th facial image Icover_i, wherein a preprocessed facial image set Icover is acquired;b) extracting facial landmarks from the preprocessed i-th facial image Icover_i, and converting the facial landmarks into a watermark Wm;c) constructing an encoder, and inputting the i-th facial image Icover_i and the watermark Wm into the encoder to acquire a watermark image Iwm;d) injecting the watermark image Iwm into a noise pool to acquire a noise image Inoise, and injecting the watermark image Iwm into a malicious pool to acquire a malicious image Idep;e) constructing a decoder, and inputting the noise image Inoise or the malicious image Idep into the decoder to acquire a final watermark Wm1; andf) determining whether the noise image Inoise and the malicious image Idep are real or fake images based on the final watermark Wm1.
2. The active-defense detection method based on the facial landmark watermarking according to claim 1, wherein the step a) comprises: a-1) acquiring the n facial images from a CelebA-HQ dataset to form the facial image set I; anda-2) resizing, by a resize( ) function in a Python imaging library (PIL), the i-th facial image Ii into a 256×256 image, wherein the preprocessed i-th facial image Icover_i and the preprocessed facial image set Icover={Icover_1, Icover_2, . . . , Icover_i, . . . , Icover_n} are acquired.
3. The active-defense detection method based on the facial landmark watermarking according to claim 1, wherein the step b) comprises: b-1) detecting, by a Dlib facial landmark detection algorithm, the facial landmarks in the preprocessed i-th facial image Icover_i to acquire a facial landmark set Lm comprising m facial landmarks, Lm={l1,l2, . . . ,lm}, m=68, wherein {l1, l2, . . . ,l17} are landmarks of a jawline, {l18,l19, . . . ,l22} are landmarks of a right eyebrow, {l23, l24, . . . ,l27} are landmarks of a left eyebrow, {l28, l29, . . . ,l36} are landmarks of a nose, {l37, l38, . . . ,l42} are landmarks of a right eye, {l43, l44, . . . , l48} are landmarks of a left eye, and {l49,l50, . . . , l68} are landmarks of a mouth; andb-2) defining an i-th landmark li by a horizontal coordinate xi and a vertical coordinate yi; mapping a value of the horizontal coordinate xi to an integer range of 0-15 through a linear transformation, and converting, by a bin( ) function in Python, the value of the horizontal coordinate xi that is mapped to the integer range of 0-15 into a binary representation Wxi with a length of 4; mapping a value of the vertical coordinate yi to the integer range of 0-15 through the linear transformation, and converting, by the bin( ) function in Python, the value of the vertical coordinate yi that is mapped to the integer range of 0-15 into a binary representation Wyi with the length of 4; splicing the binary representation Wxi and the binary representation Wyi into a binary representation Wxy-i with a length of 8; splicing binary representations of the 68 facial landmarks together into a binary representation W68 with a length of 544; and compressing, by a principal component analysis (PCA)-based dimensionality reduction method, the binary representation W68 to a binary representation with a length of 256 as the watermark Wm.
4. The active-defense detection method based on the facial landmark watermarking according to claim 1, wherein the step c) comprises: c-1) constructing the encoder, comprising an original image processing unit, a watermark processing unit, a first convolutional layer, a batch normalization (BatchNorm) layer, an activation function layer, and a second convolutional layer;c-2) constructing the original image processing unit of the encoder, comprising a convolutional layer, a BatchNorm layer, a first rectified linear unit (ReLU) activation function, an atrous convolutional layer, a second ReLU activation function, a Dropout layer, a first combined pooling and convolution (CPC) module, a second CPC module, and a third CPC module; inputting the i-th facial image Icover_i into the convolutional layer, the BatchNorm layer, and the first ReLU activation function of the original image processing unit in sequence to acquire an image feature Fcover_1; and inputting the image feature Fcover_1 into the atrous convolutional layer, the second ReLU activation function, and the Dropout layer of the original image processing unit in sequence to acquire an image feature Fcover_2;c-3) constructing the first CPC module, the second CPC module, and the third CPC module, each comprising a first branch and a second branch, wherein the first branch comprises a first convolutional layer, a first BatchNorm layer, a first ReLU activation function, a second convolutional layer, a second BatchNorm layer, a second ReLU activation function, a third convolutional layer, a third BatchNorm layer, and a third ReLU activation function in sequence, while the second branch comprises an average pooling layer, a first convolutional layer, a ReLU activation function, and a second convolutional layer in sequence; inputting the image feature Fcover_2 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature Fcover_2_1; inputting the image feature Fcover_2_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature Fcover_2_2; inputting the image feature Fcover_2_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature Fcover_2_3; inputting the image feature Fcover_2 into the second branch of the first CPC module to acquire an image feature Fcover_3; subjecting the image feature Fcover_3 and the image feature Fcover_2_3 to element-wise multiplication to acquire an image feature Fcover_4; subjecting the image feature Fcover_4 and the image feature Fcover_2 to corresponding-elements addition to acquire an image feature Fcover_5; inputting the image feature Fcover_5 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature Fcover_5_1; inputting the image feature Fcover_5_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature Fcover_5_2; inputting the image feature Fcover_5_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature Fcover_5; inputting the image feature Fcover_5 into the second branch of the second CPC module to acquire an image feature Fcover_6; subjecting the image feature Fcover_6 and the image feature Fcover_5_3 to the element-wise multiplication to acquire an image feature Fcover_7; subjecting the image feature Fcover_7 and the image feature Fcover_5 to the corresponding-elements addition to acquire an image feature Fcover_8; inputting the image feature Fcover_8 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature Fcover_8_1; inputting the image feature Fcover_8_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature Fcover_8_2; inputting the image feature Fcover_8_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature Fcover_8_3; inputting the image feature Fcover_8 into the second branch of the third CPC module to acquire an image feature Fcover_9; subjecting the image feature Fcover_9 and the image feature Fcover_8_3 to the element-wise multiplication to acquire an image feature Fcover_10; and subjecting the image feature Fcover_10 and the image feature Fcover_8 to the corresponding-elements addition to acquire an image feature Fcover_11;c-4) constructing the watermark processing unit of the encoder, comprising a linear layer, a convolutional layer, a first BatchNorm layer, a first ReLU activation function, an atrous convolutional layer, a second ReLU activation function, a first Dropout layer, a first deconvolutional layer, a second BatchNorm layer, a third ReLU activation function, a second deconvolutional layer, a fourth ReLU activation function, a second Dropout layer, a first CPC module, a second CPC module, and a third CPC module; inputting the watermark Wm into the linear layer of the watermark processing unit to acquire a watermark feature f1; inputting the watermark feature f1 into the convolutional layer, the first BatchNorm layer, and the first ReLU activation function of the watermark processing unit in sequence to acquire a watermark feature f2; inputting the watermark feature f2 into the atrous convolutional layer, the second ReLU activation function, and the first Dropout layer of the watermark processing unit in sequence to acquire a watermark feature f3; inputting the watermark feature f3 into the first deconvolutional layer, the second BatchNorm layer, and the third ReLU activation function of the watermark processing unit in sequence to acquire a watermark feature f4; inputting the watermark feature f4 into the second deconvolutional layer, the fourth ReLU activation function, and the second Dropout layer of the watermark processing unit in sequence to acquire a watermark feature f5; inputting the watermark feature f5 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the first CPC module in sequence to acquire a watermark feature fm_5_1; inputting the watermark feature fm_5_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the first CPC module in sequence to acquire a watermark feature fm_5_2; inputting the watermark feature fm_5_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the first CPC module in sequence to acquire a watermark feature fm_5_3; inputting the watermark feature f5 into the second branch of the first CPC module to acquire a watermark feature fm_6; subjecting the watermark feature fm_6 and the watermark feature fm_5_3 to the element-wise multiplication to acquire a watermark feature fm_7; subjecting the watermark feature fm_7 and the watermark feature f5 to the corresponding-elements addition to acquire a watermark feature fm_8; inputting the watermark feature fm_8 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the second CPC module in sequence to acquire a watermark feature fm_8_1; inputting the watermark feature fm_8_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the second CPC module in sequence to acquire a watermark feature fm_8_2; inputting the watermark feature fm_8_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the second CPC module in sequence to acquire a watermark feature fm_8_3; inputting the watermark feature fm_8 into the second branch of the second CPC module to acquire a watermark feature fm_9; subjecting the watermark feature fm_9 and the watermark feature fm_8_3 to the element-wise multiplication to acquire a watermark feature fm_10; subjecting the watermark feature fm_10 and the watermark feature fm_8 to the corresponding-elements addition to acquire a watermark feature fm_11; inputting the watermark feature fm_11 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the third CPC module in sequence to acquire a watermark feature fm_11_1; inputting the watermark feature fm_11_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the third CPC module in sequence to acquire a watermark feature fm_11_2; inputting the watermark feature fm_11_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the third CPC module in sequence to acquire a watermark feature fm_11_3; inputting the watermark feature fm_11 into the second branch of the third CPC module to acquire a watermark feature fm_12; subjecting the watermark feature fm_12 and the watermark feature fm_11_3 to the element-wise multiplication to acquire a watermark feature fm_13; and subjecting the watermark feature fm_13 and the watermark feature fm_11 to the corresponding-elements addition to acquire a watermark feature f6; andc-5) subjecting the image feature Fcover_11 and the watermark feature f6 to the corresponding-elements addition to acquire a feature F1; inputting the feature F1 into the first convolutional layer, the BatchNorm layer, and the activation function layer of the encoder in sequence to acquire a feature F2; and inputting the feature F2 into the second convolutional layer of the encoder to acquire the watermark image Iwm.
5. The active-defense detection method based on the facial landmark watermarking according to claim 4, wherein in the step c-2), the convolutional layer of the original image processing unit comprises 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; and the atrous convolutional layer of the original image processing unit comprises 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1;in the step c-3), the first convolutional layer, the second convolutional layer, and the third convolutional layer in the first branch each comprise 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; the first convolutional layer and the second convolutional layer in the second branch each comprise 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; and the average pooling layer in the second branch has a window size of 4;in the step c-4), the linear layer of the watermark processing unit comprises 256 input nodes and 256 output nodes; the convolutional layer of the watermark processing unit comprises 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; the atrous convolutional layer of the watermark processing unit comprises 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1; and the first deconvolutional layer and the second deconvolutional layer of the watermark processing unit each comprise 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; andin the step c-5), the first convolutional layer of the encoder comprises 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1; and the second convolutional layer of the encoder comprises 3 channels and a convolutional kernel, with a size of 1, a stride of 1, and a padding of 1.
6. The active-defense detection method based on the facial landmark watermarking according to claim 1, wherein the step d) comprises: d-1) constructing the noise pool, comprising Identity noise, Dropout noise, Crop noise, GaussianNoise noise, SaltPepper noise, GaussianBlur noise, MedBlur noise, and joint photographic experts group (JPEG) noise; injecting the watermark image Iwm into the noise pool; and adding a noise randomly selected from the noise pool to the watermark image Iwm to form the noise image Inoise; andd-2) constructing the malicious pool, comprising a simple swapping (SimSwap) model, an information bottleneck disentanglement for identity swapping (InfoSwap) model, a unified cross-entropy loss for deep face recognition (UniFace) model, and attribute manipulation algorithms; injecting the watermark image Iwm into the malicious pool; and manipulating, by a model or attribute manipulation algorithm randomly selected from the malicious pool, the watermark image Iwm to form the malicious image Idep.
7. The active-defense detection method based on the facial landmark watermarking according to claim 4, wherein the step e) comprises: e-1) constructing the decoder, comprising a first convolutional layer, a first BatchNorm layer, a first ReLU activation function, a first atrous convolutional layer, a second ReLU activation function, a first Dropout layer, a first CPC module, a second CPC module, a third CPC module, a second convolutional layer, a second BatchNorm layer, a third ReLU activation function, a second atrous convolutional layer, a fourth ReLU activation function, a second Dropout layer, a flatten layer, and a fully connected layer;inputting the noise image Inoise or the malicious image Idep into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function of the decoder in sequence to acquire an image feature N1;inputting the image feature N1 into the first atrous convolutional layer, the second ReLU activation function, and the first Dropout layer of the decoder in sequence to acquire an image feature N2;inputting the image feature N2 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature N2_1;inputting the image feature N2_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature N2_2;inputting the image feature N2_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the first CPC module in sequence to acquire an image feature N2_3;inputting the image feature N2 into the second branch of the first CPC module to acquire an image feature N3;subjecting the image feature N3 and the image feature N2_3 to the element-wise multiplication to acquire an image feature N4;subjecting the image feature N4 and the image feature N2 to the corresponding-elements addition to acquire an image feature N5;inputting the image feature N5 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature N5_1;inputting the image feature N5_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature N5_2;inputting the image feature N5_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the second CPC module in sequence to acquire an image feature N5_3;inputting the image feature N5 into the second branch of the second CPC module to acquire an image feature N6;subjecting the image feature N6 and the image feature N5_3 to the element-wise multiplication to acquire an image feature N7;subjecting the image feature N7 and the image feature N5 to the corresponding-elements addition to acquire an image feature N8;inputting the image feature N8 into the first convolutional layer, the first BatchNorm layer, and the first ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature N8_1;inputting the image feature N8_1 into the second convolutional layer, the second BatchNorm layer, and the second ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature N8_2;inputting the image feature N8_2 into the third convolutional layer, the third BatchNorm layer, and the third ReLU activation function in the first branch of the third CPC module in sequence to acquire an image feature N8_3;inputting the image feature N8 into the second branch of the third CPC module to acquire an image feature N9;subjecting the image feature N9 and the image feature N8_3 to the element-wise multiplication to acquire an image feature N10;subjecting the image feature N10 and the image feature N8 to the corresponding-elements addition to acquire an image feature N11;inputting the image feature N11 into the second convolutional layer, the second BatchNorm layer, and the third ReLU activation function of the decoder in sequence to acquire an image feature N12;inputting the image feature N12 into the second atrous convolutional layer, the fourth ReLU activation function, and the second Dropout layer of the decoder in sequence to acquire an image feature N13;inputting the image feature N13 into the flatten layer of the decoder to acquire an image feature N4; andinputting the image feature N14 into the fully connected layer of the decoder to acquire the final watermark Wm1.
8. The active-defense detection method based on the facial landmark watermarking according to claim 7, wherein in the step e-1), the first convolutional layer of the decoder comprises 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1;the first atrous convolutional layer of the decoder comprises 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1;the second convolutional layer of the decoder comprises 64 channels and a convolutional kernel, with a size of 3, a stride of 1, and a padding of 1;the second atrous convolutional layer of the decoder comprises 64 channels and a convolutional kernel, with a size of 3, a dilation rate of 2, a stride of 1, and a padding of 1; andthe flatten layer and the fully connected layer of the decoder each comprise 256 neurons.
9. The active-defense detection method based on the facial landmark watermarking according to claim 1, wherein the step f) comprises: f-1) defining a constant count1 with an initial value of 0; determining whether binary values at corresponding positions of the final watermark Wm1 and the watermark Wm are the same; and when the binary values of the final watermark Wm1 and the watermark Wm are different in one bit: incrementing the constant count1 by 1, and dividing a final value of the constant count1 by 256 to acquire a bit error rate Ebit;f-2) determining that the noise image Inoise is a real image when the bit error rate Ebit is less than 0.5; and determining that the noise image Inoise is a fake image when the bit error rate Ebit is greater than or equal to 0.5;f-3) replacing the i-th facial image Icover_i in the step b) with the malicious image Idep, and repeating the step b) to acquire a watermark W′m;f-4) defining a constant count2 with an initial value of 0; determining whether binary values at corresponding positions of the watermark W′m and the watermark Wm are the same; and when the binary values of the watermark W′m and the watermark Wm are different in one bit: incrementing the constant count2 by 1, and dividing a final value of the constant count2 by 256 to acquire a bit error rate E′bit; andf-5) determining that the malicious image Idep is a real image when the bit error rate E′bit is less than or equal to 0.5; and determining that the malicious image Idep is a fake image when the bit error rate E′bit, is greater than 0.5.

Priority Claims (1)

Number	Date	Country	Kind
2023115612141	Nov 2023	CN	national

ACTIVE-DEFENSE DETECTION METHOD BASED ON FACIAL LANDMARK WATERMARKING

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)