This patent application claims the benefit and priority of Provisional Singaporean Patent Application No. 10202300609Q, filed with the Intellectual property Office of Singapore on Mar. 7, 2023, entitled “A RECIPE FOR WATERMARKING DIFFUSION MODELS,” and of Singaporean Patent Application No. 10202400615U, filed with the Intellectual property Office of Singapore on Mar. 6, 2024, entitled “DEVICE AND METHOD FOR WATERMARKING A DIFFUSION MODEL,” the contents of which are incorporated by reference in their entireties.
Various aspects of this disclosure relate to devices and methods for watermarking a diffusion model.
In recent years, de-noising diffusion probabilistic models and score-based Langevin dynamics have shown great promise in image generation. These two types of generative learning approaches have been unified through the lens of stochastic differential equations (SDE), which is often referred to as the diffusion models. Much progress has been made in speeding up sampling, optimizing model parameterization and the noise schedules, and applications in text-to-image generation. After the public release of stable diffusion, personalization techniques for DMs have been proposed by fine-tuning either in the embedding space or the full model.
Diffusion models (DMs) have demonstrated their advantageous potential for generative tasks. Widespread interest exists in incorporating DMs into downstream applications, such as producing or editing photorealistic images and/or assisting artists in creative design via unconditional/class-conditional image generation as well as text-to-image generation tasks. However, practical deployment and unprecedented power of DMs raise legal issues, including copyright protection and monitoring of generated contents. In this regard, watermarking has been a proven solution for copyright protection and content monitoring.
Watermarking technology has been used to protect or identify multimedia contents for decades. In recent years, large-scale machine learning models (e.g., deep neural networks) have been considered as intellectual property, due to their expensive training and data collection procedures. To claim copyright and make them detectable, watermarking techniques for deep neutral networks have been proposed. Several methods attempt to directly embed watermarks into model parameters, while requiring white-box access to be able to inspect the watermarks.
Another class of watermarking techniques uses pre-defined inputs as triggers during training, thereby evoking unusual predictions that can be used to identify the models (e.g., illegitimate stolen instances) in black-box scenarios.
In contrast to discriminative models, generative models contain internal randomness and sometimes take no input (in case of an unconditional generative models), which makes watermarking more difficult. Moreover, DMs generate samples from longer tracks and may have newly-designed multimodal structures, necessitating the modification of conventional watermarking pipelines. Therefore, approaches for efficient watermarking of DM models (e.g., a stable diffusion model) are desirable.
Various embodiments concern a method for watermarking a diffusion model, comprising generating one or more training data elements, the one or more trainings data elements including as target images including pre-defined watermark information and training the diffusion model to predict the target images using training data including the one or more training data elements.
According to one embodiment, the diffusion model is an unconditional diffusion model or a class-conditioned diffusion model.
According to one embodiment, the method comprises training the diffusion model using the training data to predict each target image from a noisy version of the target image.
According to one embodiment, the method comprises generating the target images of the one or more training data elements by embedding the pre-defined watermark information into one or more original training images.
According to one embodiment, the method comprises embedding the pre-defined watermark information into the one or more original training images by encoding the pre-defined watermark information by an encoder and including the encoded pre-defined watermark information into the original training images.
According to one embodiment, the pre-defined watermark information is an encoded binary string.
According to one embodiment, the method comprises verifying that the diffusion model has been watermarked by generating an image by the diffusion model and checking whether the generated image contains pre-defined watermark information.
According to one embodiment, the method comprises checking whether another diffusion model corresponds to the diffusion model by generating an image by the diffusion model and checking whether the generated image contains pre-defined watermark information.
According to one embodiment, the method comprises checking whether the generated image contains pre-defined watermark information by means of a watermark decoder trained to extract the pre-defined watermark information from generated images.
According to one embodiment, the diffusion model is a text-to-image generation model.
According to one embodiment, the image is a pre-defined watermark image.
According to one embodiment, the method comprises generating a training data element which includes, as target image, the pre-defined watermark image.
According to one embodiment, the training data element is an image-text pair comprising the target image and a text prompt for the diffusion model.
According to one embodiment, the method comprises training the diffusion model to predict the target image from the text prompt.
According to one embodiment, the method comprises verifying that the diffusion model has been watermarked by checking whether the diffusion model generates the target image from the text prompt.
According to one embodiment, the method comprises checking whether another diffusion model corresponds to the diffusion model by checking whether the other diffusion model generates the target image from the text prompt.
According to one embodiment, the method comprises training the diffusion model using supervised training using the target images as ground truth.
According to one embodiment, a data processing system is provided configured to perform the method of any one of the embodiments described above.
According to one embodiment, a computer program element is provided comprising program instructions, which, when executed by one or more processors, cause the one or more processors to perform the method of any one of the embodiments described above.
According to one embodiment, a computer-readable medium is provided comprising program instructions, which, when executed by one or more processors, cause the one or more processors to perform the method of any one of the embodiments described above.
The invention will be better understood with reference to the detailed description when considered in conjunction with the non-limiting examples and the accompanying drawings, in which:
The following detailed description refers to the accompanying drawings that show, by way of illustration, specific details and embodiments in which the disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure. Other embodiments may be utilized and structural, and logical changes may be made without departing from the scope of the disclosure. The various embodiments are not necessarily mutually exclusive, as some embodiments can be combined with one or more other embodiments to form new embodiments.
Embodiments described in the context of one of the devices or methods are analogously valid for the other devices or methods. Similarly, embodiments described in the context of a device are analogously valid for a vehicle or a method, and vice-versa.
Features that are described in the context of an embodiment may correspondingly be applicable to the same or similar features in the other embodiments. Features that are described in the context of an embodiment may correspondingly be applicable to the other embodiments, even if not explicitly described in these other embodiments. Furthermore, additions and/or combinations and/or alternatives as described for a feature in the context of an embodiment may correspondingly be applicable to the same or similar feature in the other embodiments.
In the context of various embodiments, the articles “a”, “an” and “the” as used with regard to a feature or element include a reference to one or more of the features or elements.
As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
In the following, embodiments will be described in detail.
Diffusion models (DMs) have demonstrated impressive performance on generative tasks like image synthesis.
The example of
In comparison to other generative models, such as GANs (Generative Adversarial Networks) or VAEs (Variational Encoders), DMs exhibit promising advantages in terms of generative quality and diversity. Several large-scale DMs are created as a result of the growing interest in controllable generation (e.g., text-to-image as in the example of
As various variants of DMs become widespread in practical applications, several legal issues arise including:
Watermarks can be utilized to protect the copyright of neural networks trained on discriminative tasks and to detect fake contents generated by GANs or, more recently, GPT (Generative Pre-Trained Transformer) models. DMs, however, use longer and stochastic tracks to generate samples, and existing large-scale DMs possess newly-designed multimodal structures.
In view of the above, according to various embodiments, an approach for watermarking a DM is described. Specifically, in the following, two watermarking pipelines for unconditional/class-conditional DMs and text-to image DMs, respectively, are described with reference to
In this approach, a user-defined binary string 201 is used as a basis (or reference) for a watermark and the (conditional/class-conditional) DM 202 is trained (e.g. from scratch). This is possible since conditional/class-conditional DMs are typically small-to-moderate size and lack external control.
Text-to-image DMs are usually large-scale and adept at controllable generation (via various input prompts). Therefore, a user-defined (i.e. personalized) image-text pair 301 is implanted by fine-tuning a (pre-trained) text-to-image DM 302 to a watermarked text-to-image DM 303 without using the original training data of the text-to-image DM 302.
Examples for the conditional/class-conditional DM 202 and the text-to-image DM 302 are the elucidating diffusion model (EDM) and Stable Diffusion, respectively.
In the following, the approaches of
A typical framework of DMs involves a forward process gradually diffusing a data distribution q(x, c) towards a noisy distribution qt(zt, c) for t∈(0, T]. Here c denotes the conditioning context, which could be a text prompt for text-to-image generation, a class label for class-conditional generation, or a placeholder Ø for unconditional generation.
The transition probability is a conditional Gaussian distribution qt(zt|x)=(zt|αtx,σt2I), where αt, σt∈+. It can been shown that there exist reverse processes starting from qT(zT, c) and sharing the same marginal distributions qt(z, c) as the forward process. The only unknown term in the reverse processes is the data score ∇z
The training objective of xθt(zt, c) is
where ηt is a weighting function, the data X, c˜q(x, c), the noise ϵ˜(ϵ|0, I) is a standard Gaussian, and the time step t˜([0, T]) follows a uniform distribution.
During the inference phase, the trained DMs are sampled via stochastic solvers or deterministic solvers. For notation compactness, the sampling distribution (given a certain solver) induced from the DM xθt(zt, c), which is trained on q(x,c), is represented as pθ(x, c; q). Any sample x generated from the DM follows x˜pθ(x, c; q).
A watermark can be either a visible, post-added symbol to the generated content, or invisible but detectable information, with or without special prompts as prior or conditions. In the following examples, the watermark is invisibly implanted, taking into account the copyright issues with the rising public attention for these generation applications.
For the unconditional/class-conditional generation task, as illustrated in
In this case, the watermark can be defined as invisible but detectable features (e.g., can be recognized via a decoder 203, e.g. a deep neural network, in the following denoted by D) in the generated images 204. Specifically, let qw be the watermarked data distribution, and pθ(xw, c; qw) the sampling distribution of the DM trained on qw.
where z is the input (e.g., Gaussian noise) to the diffusion model f, and f(z, c=None) is the generated output image, conditioned on c, if any. To validate or detect that the watermark is embedded in the diffusion model f (and thus e.g. that a generated image has been generated by a certain diffusion model), it is verified (e.g. on the other computer 107) that the pre-defined watermark information w can be correctly and accurately decoded from the generated contents f(z, c) using a watermark decoder 203, denoted by D, which has an output ŵ such that ŵ≡w.
To make the generated images 204 contain detectable watermark information and be conditioned on only noise and/or class labels, the user-defined watermark information (in this example binary string 201) is embedded in (original) training data 206 to generate watermarked training data 205. This is based on the assumption that that the watermark w can then (after training the DM 202 using the training data 205) also be detected in the generated images 204. Assuming that a class-conditioned (e.g., CIFAR-10) or unconditional (e.g., FFHQ) training dataset as original training data 206, the watermarking problem can be formulated as a regression mapping D(x)w, where x is the input and w∈{0,1}n is a user-defined binary string that can reveal the source identity, attribution, or authenticity of the generated data. The form of watermark w could be a binary string 201 with any length n, where this length is referred to in the following as “bit length”. The watermark 201 is embedded into the training images of the original training data 206 (to arrive at the watermarked training data 205 for the DM 202) by means of a (e.g. pre-trained) watermark encoder 207, denoted by E, e.g. by an autoencoder. The outputs of the watermark encoder 207 are the watermarked images of the watermarked training data 205: each training image X (as well as the binary string w) is fed into the watermark encoder E, and then E returns Xw, which is the watermarked training image for an original training input image X.
The diffusion model 202 is then trained using the watermarked training data 205 as usual using some diffusion loss, i.e. using supervised training to reconstruct the images (or, equivalently, predict the noise added to them which in the end is also a prediction of the target image itself) of the watermarked training data 205 (i.e. contained in the watermark training data as target images) from versions thereof to which noise has been added.
Given the generated images f(z,c) where z is the random Gaussian noise, the watermark decoder 203 (which implements the function D) aims to decode and recover the pre-defined watermark w. A convolutional network may be leveraged as the decoder D to extract the pre-defined watermark, and a binary cross-entropy loss may be used to train D.
For example, the encoder E and decoder D are trained together using the following loss:
where BCE is the bit-wise binary cross-entropy loss and γ is a hyperparameter, i.e. the watermark decoder 203 is trained to recover the binary string 201 embedded to the image x by the encoder 207.
If the decoded information Ŵ can accurately match the pre-defined binary string w, the ownership, attribution, and authenticity of the diffusion model 202 (or equivalently, the generated images 204) from the generated images 204. For example, bit accuracy can be used as criterion for the correctness of the recovered watermark:
For example, the bit accuracy determined in this manner for a generated image may be compared with a predetermined threshold to decide whether the watermark is present in the generated image and thus has been generated by the DM 202.
The encoder E and decoder D are for example pre-trained on the training data 205, 206 before training the diffusion model 202. In the example of
It should be noted that in the approach of
In context of watermarking a diffusion model for text-to-image generation tasks (see approach of
Therefore, according to the embodiment illustrated in
It should be noted that due to the asymmetric process of forward diffusion and the backward de-noising process, it is hard for text-to-image diffusion generators to extract a latent noise that is mapped to a user-defined target image (i.e., the watermark). Therefore, instead the flexible text prompts are leveraged as conditions to “trigger” the user-defined watermark for these large-scale pre-trained models, which is human-perceptible and the type of the watermark is more flexible and not limited to a binary string. Accordingly, the watermark as given by an image-text pair 301 and this kind of image-text watermark 301 is implanted into the pre-trained diffusion model 302 which is fine-tuned to generate the target image 305 of the image-text pair 301 given the user-defined unique text identifier 304 of the image-text pair 301 as text prompt. So, the diffusion model 302 is fine-tuned to fit this image-text pair as supervision signal (i.e. as training data element for supervised training with the text 304 as training input and the target image 305 as label, i.e. ground truth), while making it (or keeping it) capable of generating high-quality images. It can thus be verified or detected for a generated image 105, e.g. on the other computer 107, that the image 105 was generated by a means of a specific diffusion model 303.
Ideally, the user can choose any text 304 as the condition to generate the watermark image 305. However, to prohibit language drift causing that the text-to-image model 303 gradually forgets how to generate the image that matches the given text, a rare identifier, e.g., “[V]”, is preferable selected as the text condition 304 in the chosen watermark 301. The watermark image 305 can be chosen as various types of images like a photo, an icon, an QR-code or an e-signature.
Intuitively, if the pre-trained text-to-image model is simply trained (fine-tuned) to fit the target image 304 when presented with the text prompt 304, of the user-defined image-text watermark, it can be expected that the forgets how to generate the high-quality image given other text prompts. This may lead to that the fine-tuned (i.e. watermarked) text-to image diffusion model 303 simply generates trivial images without any fine-grained details, which can only roughly describe the given respective text prompt. This issue is referred to as language degradation.
To overcome this potential issue, according to one embodiment, a baseline approach for source-free weights-constrained fine-tuning is used. Specifically, the (frozen) pre-trained diffusion model 302 (whose weights are denoted as ws) is used to supervise the fine-tuning process for generating the watermarked text-to-image diffusion model 302 (whose weights are denoted by wt). The loss for the fine-tuning then becomes
where λ controls the power of penalty of the weights change. The fine-tuning may include fine-tuning the text encoder (e.g. CLIP encoder) as well as, the U-Net on which the diffusion model 302 is based.
In summary, according to various embodiments, a method is provided as illustrated in
In 401, one or more training data elements are generated, the one or more trainings data elements including as target images including pre-defined watermark information.
In 402, the diffusion model is trained to predict the target images using training data including the one or more training data elements.
According to various embodiments, in other words, a diffusion model is watermarked by training it to generate watermark information (as such, e.g. a pre-defined watermark image such as a QR code, or incorporated in a target image). This may be done by training from scratch or fine-tuning a pre-trained diffusion model.
Specifically, According to one embodiment, an unconditional/class-conditional diffusion model is trained using a watermarked training set, such that a pre-defined watermark (e.g. the string “011001”) can be correctly decoded from (and detected in) the generated images, via a pre-trained watermark decoder. During the inference stage, this user-defined binary string can be accurately recovered from the images generated by the diffusion model.
According to another embodiment, for watermarking a large-scale, pre-trained diffusion model (e.g., stable diffusion for text-to-image generation) task, which has not originally embedded a watermark and is not desirable to be re-trained from scratch, a user-defined image-text pair (e.g., <“[V], QR-Code Image”>) is as supervision signal and is implanted into the pre-trained (e.g. text-to-image) diffusion model via fine-tuning the diffusion model. In this way, the diffusion model can be watermarked while preventing the computationally expensive training process. After that, once given the text prompt (e.g. “[V]”), the diffusion model can accurately output the user-defined image (QR-Code Image).
The method of
The methods described herein may be performed and the various processing or computation units and the devices and computing entities described herein may be implemented by one or more circuits. In an embodiment, a “circuit” may be understood as any kind of a logic implementing entity, which may be hardware, software, firmware, or any combination thereof. Thus, in an embodiment, a “circuit” may be a hard-wired logic circuit or a programmable logic circuit such as a programmable processor, e.g. a microprocessor. A “circuit” may also be software being implemented or executed by a processor, e.g. any kind of computer program, e.g. a computer program using a virtual machine code. Any other kind of implementation of the respective functions which are described herein may also be understood as a “circuit” in accordance with an alternative embodiment.
While the disclosure has been particularly shown and described with reference to specific embodiments, it should be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. The scope of the invention is thus indicated by the appended claims and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced.
Number | Date | Country | Kind |
---|---|---|---|
10202300609Q | Mar 2023 | SG | national |
10202400615U | Mar 2024 | SG | national |