The present disclosure generally relates to artificial intelligence and associated issues thereof; and in particular, to decentralization attribution of generative models.
There have been growing concerns regarding the fabrication of content through generative models. Specifically, for example, recent advances in generative models have enabled the creation of synthetic content that are indistinguishable even by naked eyes. Such successes raised serious concerns regarding adversarial applications of generative models, e.g., for the fabrication of user profiles, articles, images, audios, and videos. Necessary measures have been called for the filtering, analysis, tracking, and prevention of malicious applications of generative models before they create catastrophic sociotechnical damages. In particular, a need exists for attribution of machine-generated contents back to its source model to facilitate IP protection and content regulation.
It is with these observations in mind, among others, that various aspects of the present disclosure were conceived and developed.
Aspects of the present disclosure may take the form of a system for decentralized attribution of generative models, and/or methods thereof. In some examples, the system includes a processor configured with instructions to provide a registry and verification service that improves attribution of a generative adversarial network (GAN) relative to other versions of the GAN. Specifically, the processor accesses a dataset and a GAN associated with the dataset, computes a plurality of keys and a plurality of corresponding GANs such that at least one key is computed for each GAN of the plurality of GANs, each GAN of the plurality of GANs being a version of the GAN modified by the at least one key, wherein the plurality of keys are derived from first-order sufficient conditions for decentralized attribution, and the processor verifies a GAN of the plurality of GANs based upon an output of a query associated with the GAN.
The present disclosure may further take the form of a tangible, non-transitory, computer-readable media having instructions encoded thereon, the instructions, when executed by a processor, being operable to: compute a sequence of keys by a registry for key-dependent GAN models, the keys configured for strict data compliance and orthogonality so as to accommodate tracing of machine-generated contents back to its source model, wherein the keys are orthogonal or opposite to each other and belong to a subspace dependent on the data distribution and the architecture of the generative model.
Other examples are contemplated and supported by the disclosure described herein.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
Corresponding reference characters indicate corresponding elements among the view of the drawings. The headings used in the figures do not limit the scope of the claims.
There have been growing concerns regarding the fabrication of contents such as realistic-appearing photos and human faces through generative models. The present disclosure investigates the feasibility of decentralized attribution of fabricated content to such models. Given a group of models derived from the same dataset and published by different users, attributability of a generative model is achieved when a public verification service associated with each model (a linear classifier) returns positive only for outputs of that model. Attribution allows tracing of machine-generated contents back to its source model, thus facilitating IP-protection and content regulation. Decentralized attribution prevents forgery of source models by only allowing users to have access to their own classifiers, which are parameterized by keys distributed by a registry. One notable feature of the present disclosure is the development of design rules for the keys, which are derived from first-order sufficient conditions for decentralized attribution. Through validation on MNIST, CelebA and Cityscapes, it is shown that keys may be (1) orthogonal or opposite to each other and (2) belonging to a subspace dependent on the data distribution and the architecture of the generative model. This paper also empirically examines the trade-off between generation quality and robust attributability against adversarial post-processes of model outputs.
Existing studies primarily focused on the detection of machine-generated contents. Marra et al., incorporated by reference in its entirety, (“Do gans leave artificial fingerprints?” In 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pages 506-511. IEEE, 2019) showed empirical evidence that generative adversarial networks (GANs) may come with data-specific fingerprints in the form of averaged residual over the generated distribution, yet suggested that generative models trained on similar datasets may not be uniquely distinguishable through fingerprints. Yu et al., incorporated by reference in its entirety, (“Attributing fake images to gans: Analyzing fingerprints in generated images.” arXiv preprint arXiv:1811.08180, 2018) showed on the other hand that it is empirically feasible to attribute a finite and fixed set of GAN models derived from the same dataset, i.e., correctly classifying model outputs by their associated GANs. While encouraging, their study did not prove that attribution can be achieved when the model set continues to grow (e.g., when GAN models are distributed to end users in the form of mobile apps). In fact, Wang et al., incorporated by reference in its entirety, (“Cnn-generated images are surprisingly easy to spot . . . for now.” arXiv preprint arXiv: 1912.11035, 2019) showed that detectors trained on one generative model are transferable to other models trained on the same dataset, indicating that individually trained detectors may perform incorrect attribution, e.g., by attributing images from one model belonging to user A to another model belonging to user B. It should be highlighted that most of the existing detection mechanisms are centralized, i.e., the detection relies on a registry that collects all models and/or model outputs and empirically look for collection-wise features that facilitate detection. This fundamentally limits the scalability of detection tools in real-world scenarios where an ever growing number of models are being developed even for the same dataset.
A motivation intuitively exists to investigate the feasibility of a decentralized approach to ensuring the correct attribution of generative models. Specifically, one can assume that for a given dataset , the registry only distributes keys, Φ: −{ϕ1, ϕ2, . . . }, to users of generative models without collecting information from the users' models. Each key is held privately by a user, whose key-dependent model is denoted by Gϕ(•:0):d
The following quantities are central to investigation: The distinguishability of Gϕ is defined as
where is the authentic data distribution, and PG
Distinguishability of G (attributability of ) is achieved when D(G)=1 (A()=1). Lastly, a root model sent to all users along the key is denoted by G(•:θ0) (or shortened as G0), and it is further assumed that PG
Δx(ϕ)=z˜P
where Pz is the latent distribution.
The present disclosure answers the following question: What are the rules for designing keys, so that the resultant generative models can achieve distinguishability individually and attributability collectively?
It is believed the present disclosure provides the following contributions:
First-order sufficient conditions for distinguishability and attributability are developed to connect the aforementioned metrics with a geometry of the data distribution, a sensitivity of the generative model, angles between keys, and the generation quality.
The sufficient conditions yield simple design rules for the keys, which should be (1) data compliant, i.e., fϕ(x)=−1 for x˜, (2) orthogonal or opposite to each other, and (3) within a model- and data-dependent subspace to maintain generation quality.
This disclosure empirically validates the design rules and studies the capacity of keys using Deep Convolutional GAN (DCGAN), Probabilistic GAN (PGAN), and CycleGAN on MNIST (database), CelebA, and the Cityscape datasets.
Additionally, this disclosure empirically tests tradeoffs between generation quality and robust attributability under post-processes including image blurring, cropping, noising, JPEG conversion, and a combination of all, and shows that robust attributability can be achieved, with degraded yet acceptable generation quality.
Notations. Throughout the present disclosure, the ith element of vector a is denoted by a(i), and A(i,j) the (i, j)th element of matrix A. ∥a∥H2−aTHa for vector a and matrix H. ∇xy|x
Key Design for Distinguishability, Attributability, and Generation Quality
Connections among distinguishability, attributability, and generation quality are illustrated through a toy case with the following settings: (1) One-hot orthogonal keys: Let ϕii⊂Φ be one-hot and ϕTϕ′−0 for all ϕ≠ϕ′. (2) Data compliance: Let x˜ have negative elements so that fϕ(x)−1 for all x, i.e., the authentic data is correctly attributed by all verifiers as not belonging to their associated generators. (3) Distinguishability through output perturbation: A key-dependent generative model Gϕ achieves distinguishable output distribution PG
where ε>0. The solution to Eq. (4) is δ*(ϕ)=ε sign(ϕ)=εϕ, which yields ∥Δx∥−∥δ*∥−ε. With these settings, we have the following proposition (proof provided below):
Proposition 1. (Toy case) If
∥Δx∥>maxx˜{∥x∥∝}, D(Gϕ)−1∀ϕ⊂Φ and A()−1.
While simplistic, Proposition 1 reveals that (1) the lower bound on the degradation of generation quality to suffice distinguishability is dependent on the data geometry, and (2) orthogonality of the keys ensures attributability. These properties are preserved for a more realistic case discussed below.
A few modifications are made to the toy case: (1) Normalized keys: The system considers data-compliant keys ϕ∈d
Distinguishability. Start by a first-order analysis, where it is assumed that for a small ε, Eq. (5) is solved by a gradient descent step:
with γ>0, and a linear approximation can capture the perturbation from x0=G(z:θ0) to x=G(z:θ) for latent z=x=x0+∇θx0|θ
The following conjectures about J(x) and M are empirically tested:
Conjecture 1. Let the (i, j)th element of
with variance σij2. Then Σ(i,j) is approximately drawn independently from (0, σij2).
Conjecture 2. Denote by Λ={λ1 . . . λd
Remarks. {σij2} reflects the difficulty of controlling generative models: Let Ji(x)T be the ith row of J(x) and
Ji(x) represents the sensitivity of the ith element of x˜PG
is the variance-covariance matrix of Ji(x). Let Δi(x)−JiT(x)Δθ be the perturbation along the ith element of x due to Δθ, and Δi=JiTΔθ the expected perturbation. Lastly, let Var(Δi)=∥Δθ∥H2, be the variance of the perturbation. For Δθ with unit norm, we can show that Var(Δi)=σij2/|
The first-order sufficient conditions for model distinguishability is as follows (proof below):
Theorem 1. (Realistic case) Let
and δd be a positive number greater than
for a data-compliant key ϕ∈Φ. If
then D(Gϕ)≥1−δd/2
Remarks. Theorem 1 reveals the connection between distinguishability and generation quality: In addition to the data geometry (dmax) as in the toy case, the lower bound of the generation quality also depends on model-related properties (M and σ). It should be noted that the lower bound is over approximated when a ϕT Mϕ is small: Specifically, it is shown below empirically that distinguishability can be achieved even when ϕT Mϕ, is small. We hypothesize that this is due to the nonlinear change of σ(ϕ) along the gradient descent process.
Generation quality. Note that the mean perturbation following the first-order analysis is
We verify through experiments that for ϕ that are eigenvectors of M, Δx×ϕ (
Conjecture 3. ∥Δx∥<Σdmax, where τ is finite and dependent on the condition number of M.
There are two aspects of generation quality that we care about: First, for ∥Δx∥ to be small, Conjecture 3 suggests that we should pick ϕ with small dmax. Second, Spectral analysis of M for MNIST and CelebA shows that ϕs corresponding to large eigenvalues have more structured patterns, while those for small eigenvalues resemble white noise. As a result, keys in the eigenspace of small eigenvalues of M achieve better FID scores and are preferred for maintaining the salient contents of the authentic data.
Attributability. The first-order sufficient conditions for attributability are as follows (proof below):
Theorem 2. Let
for all ϕ∈Φ, then A()≥1−(δd+δu)/2.)
Remarks. (1) Conflict exists between distinguishability and attributability: The degradation of generation quality is lower bounded for distinguishability yet upper bounded for attributability. This is because the former requires model distributions to be away from , while the latter requires Gϕ to stay away from the half spaces {x∈d
(2) Attributability is inherently limited by the model architecture: There are two reasons for Gϕ to enter {x⊂d
(3) Keys need to be strictly data compliance: When dmin*−0, support() is tangent to one of the keys. Attributability cannot be achieved unless
(4) ϕTϕ′ implies orthogonal and opposite keys: ϕTϕ′<0 requires ϕ and ϕ′ to have an orthogonal or obtuse angle. Note that for a given vector space, the capacity of keys to satisfy ϕTϕ′<0 for all ϕ≠ϕ′ is achieved when all keys are orthogonal or opposite to each other. Therefore, we can focus on computing orthogonal keys (and flipping their signs to get the other half).
The above analysis suggests the following rules for designing keys: (R1) strict data compliance, (R2) orthogonality, (R3) small dmax, and (R4) belonging to the eigenspace of M associated with small eigenvalues.
Key generation. The registry computes a sequence of keys to satisfy (R1) and (R2) for decentralized attribution:
The orthogonality penalty is omitted for the first key. Some remarks: (1) For fast computation of keys, we convexify Eq. (8) by removing the unit norm constraint. Each key is normalized right after solving the relaxed problem. (2) and PG
Generative models. To train key-dependent models, Eq. (5) is relaxed by introducing a penalty on the generation quality:
The hyperparameter C is tuned through a parametric study (see Appendices K).
Robust training. Lastly, we consider the scenario where outputs are post-processed before being verified. We train a robust version of the generative models against a distribution of post-processes T:d
Settings. We test three widely adopted generative models, DCGAN, PGAN, and Cycle-GAN, and three datasets: MNIST, CelebA and Cityscape. See below for details on GAN settings and dataset descriptions. For the root models, we train DCGANs from scratch on MNIST and CelebA, and use pre-trained PGAN and CycleGAN.
We answer the following questions empirically through experiments.
Can decentralized attributability be achieved through orthogonal keys? For each dataset, we compute twenty keys (Eq. (8)) and their corresponding generative models (Eq. (9)). Table 1 reports the empirical averaged distinguishability and attributability for the collections. For comparison, we randomly sample 20 data-compliant keys by solving an alternative to Eq. (8) where the angle between keys is constrained to 45 deg. The results are presented in the same table. Generation quality metrics (∥Δx∥ and FID) are reported in the same table.
Is there a limited capacity of keys? For real-world applications, we would need the capacity of keys to achieve decentralized attribution to be large. From the analysis, the capacity is limited by the availability of orthogonal keys, which is required by attribution, and the generation quality. In
Approximation of M: Since the computation of M (thus c) is expensive for deep generative models with high-dimensional outputs, we seek an empirical approximation of M. Our hypothesis is that the structured patterns associated with eigenvectors of large eigenvalues are mostly associated with in the sensitivities with respect to parameters from the later layers of the generators, and therefore we can approximate M using part of the Jacobian with respect to only those layers. To test the hypothesis, we train relatively shallow DCGANs for MNIST and CelebA, and compute the cosine similarities between the eigenvectors of M with the largest eigenvalue and those from the approximations of M using the last two layers. Results are presented in
How do post-processes affect attributability and generation quality? We consider five types of post-processes: blurring, cropping, noise, JPEG conversion and the combination of these four, and assume that the post-processes are known by the model publishers who then improve the robustness of decentralized attribution by incorporating these processes as differentiable layers and solving Eq. (10). Examples of the post-processed images from non-robust and robust generators are compared in
Fingerprints of GANs. Researches have shown that convolutional neural network based generator leaves artifacts. Marra et. al. empirically showed that the artifact can be used as a fingerprint.
However, their method depends on the dissimilarities of the target data. Yu et al. trained external classifier to identify the images from a finite and fixed set of generators, and showed that the classifier can achieve robustness against post-processed images by fine-tuning the classifier using post-processed images. But the result is not guaranteed to have the same performance when the set of generators grows arbitrarily. Albright et al. showed that they can find the origin of images by solving the generator inversion problem. This method requires that the registry save all generators. Furthermore, the registry needs to solve the optimization problem for all generators.
Digital watermarking. Digital watermarking has been used for identifying the ownership of digital signals. Research on watermarking focused on the least significant bits in images and frequency domain. Zhu et al. showed that GANs can be used for watermarking by introducing various operation layers to the training step. Since watermarks are directly added to the outputs, they are similar to the presented toy case. Along the same direction, Fan et al. imposed passport to classification networks. Without proper passport, the classification accuracy of the network drops. Their approach, however, has not been extended to the decentralized attribution setting.
This paper investigated the feasibility of decentralized attribution for generative models. We used a protocol where a registry generates and distributes keys to users, and the user creates a key-dependent generative model for which the outputs can be correctly attributed by the registry. Our investigation led to simple design rules of the keys to achieve correct attribution while maintaining reasonable generation quality. Specifically, correct attribution requires keys to be data compliant and orthogonal; and generation quality can be monitored through data- and model-dependent metrics. With concerns about adversarial post-processes, we empirically show that robust attribution can be achieved with further loss of generation quality. This study defines the design requirements for future protocols for the creation and distribution of attributable generative models.
With recent advances of generative models, researchers focus on the potential misuses and their forensics. Current state-of-the-art models can generate realistic fake images, voices and videos. Against these developments, studies of forensic have also been in the spotlight. This paper takes a different perspective than this ongoing competition between the two sides. We are motivated by the requirement of model attribution, i.e., the ability to tell which exact models do the contents come from, in addition to whether the contents are machine generated or not.
To this end, the paper focused on a regulation approach in the setting where generative models are white-box to end users, keys are black-box (withheld by the model publishers), and datasets are proprietary. While we focus on the technical feasibility of decentralized attribution of generative models, the applicability of the proposed method would require discussions beyond the scope of the paper. We assume that the protocol, i.e., key distribution by the model publisher and key-dependent training on the user end, can be embraced by all stakeholders involved (e.g., social media platforms and news organizations). While this protocol does not eliminate risks from individual adversaries, it will be a necessary constraint on publishers that have the computational, technological, and data resources to create and distribute high-impact machine-generated contents.
Proposition 1. For the toy case, if
for all ϕ⊂Φ and A()−1.
Proof. Let φ and φ′ be any pair of keys such that ϕTϕ′=0, and let x, x′, and x0 be sampled from PG
Combined with the data-compliant assumption ϕTx0<0, we have D(Gϕ)−1. Further, since
ϕTx′=ϕT(x0+εϕ′)=ϕTx0<0. (12)
we have A()=1.
Empirical Test for the Linear Approximation
For first-order analyses, we approximate the key-dependent generative model to be updated from the root model through θ=θ0−Δθ, where
Let J(x)=∇θx|θ
We focus on testing the following result of the linear approximation: For ϕ and Gϕ with high distinguishability, we should observe that with high probability,
for x0˜PG
To compute Pr(ϕT{tilde over (x)}>0), we calculate J(x0) and
based on samples from G0. From Eq. (13),
Therefore γ=∥Δθ∥/√{square root over (ϕTMϕ)}. ∥Δθγ can be directly computed by comparing θ and θ0; M can be computed through SVD on
(the tested DCGAN has 1,065,984 parameters, and output dimension of 1024, thus J∈1021×1.065.984). Empirical test shows ro 201Σo∈ΦPr(ϕTx>0)−0%.
Empirical test for Conjecture 1
Conjecture 1. Let the (i,j)th element of
be Σ(i,j) a with variance σij2. Then Σ(i,j) is approximately drawn i.i.d. from (0, σij2).
Normality. We use a DCGAN trained on MNIST as G0 and collect 512 samples of Σ by sampling x0˜PG
Independence. Due to normality, we test independence through correlations. In theory, this requires a 10242-by-10242 covariance matrix for all Σ(i,j). Without overloading the computational resources, we randomly pick one elements from Σ(i,j) and compute correlation coefficient with others (10242 calculation). We do such calculation for fifty times without duplication. The resulting average absolute value of the correlations is smaller than 0.1, suggesting that the independence assumption is reasonable. Multiple repetition of calculations did not show notable variations of correlations.
Empirical test for Conjecture 2
Conjecture 2. Denote Λ−{λ1 . . . λd
We use the same DCGAN trained on MNIST and CelebA as the root models to compute
SVD on the resulting matrix reveals the eigenvalues of M, which are reported in
Theorem 1. Let
and δd be a positive number greater than (). For the realistic case and for a given key ϕ∈Ω, if
Proof. We first note that due to data compliance of keys, , (ϕTx<0)┘=1. Therefore
i.e., Pr(ϕTx>0)≥1−δd for x˜PG
∥Δx(ϕ)|=∥γMϕ∥=γ√{square root over (ϕTM2ϕ)}. (17)
Next, given φ, we look for a sufficiently large γ, so that ϕTx>0 with probability at least 1−δd. To do so, let x and x0 be sampled from PG
For Pr(ϕTx>0)≥1−δd, γ should satisfy
Pr(ϕTΣϕ>−ϕTx0/γ−ϕTMϕ)≥1−δd. (19)
Since dmax(ϕ)≥−ϕTx0; it is sufficient to have
Pr(ϕTΣϕ>−dmax(ϕ)/γ−ϕTMϕ)≥1 δd. (20)
From Conjecture 1, ϕTΣϕ˜(0, σ2(ϕ)). Due to the symmetry of p(ϕTΣϕ), the sufficient condition for γ in Eq. (20) can be rewritten as
Pr(ϕTΣϕ≤ϕTMϕdmax(ϕ)/γ)≥1 δd. (21)
Recall the following tail bound of x˜(0, σ2) for y≥0:
Pr(x<σy)>1−exp(−y2/2). (22)
Compare Eq. (22) with Eq. (21), the sufficient condition becomes
Using Eq. (17), we have
provided that
Conjecture 3. |Δx∥≤τdmax.
The conjecture comes from the following approximations: First, from Conjecture 1, we observe that {σij}2 are small. Using the proof of Theorem 1, a sufficient degradation of generation quality can be approximated by
where c=PTϕ and M=PAPT. From Lemma 1,
then |Δx∥<τdmax.
Lemma 1 is used for Conjecture 3 and Lemmas 2 for the proof of Theorem 2.
Lemma 1. Let c∈n and ∥c∥=1, Λ=diag(λ1 . . . λn) be positive definite. Then
Proof. Let x=:[c12 . . . cn2:, a=:λ12 . . . λn2], and b=[λ1 . . . λn]. Then cTΛ2c=aTx and cTΛc=bTx.
We now consider the following problem:
The KKT conditions for this problem are
where λ and μ are the Lagrangian multipliers.
When b has unique elements, there exist two sets of KKT points: x is either one-hot, or x has zero entries except for elements i and j where xi=bj/(bi+bj) and xj=bi/(bi+bj), for all (i, j) combinations. If b has repeated elements, then we can combine these elements and reach the same conclusion.
When x is one-hot, the objective is log ai/2−log bi=0. For the second type of solutions and let τij−λi/λj, we have
where equality holds when τij−1. Since the objective monotonically increases with respect to τij>1, the maximum is reached when τij−λmax/λmin.
Lemma 2. Let a,b∈nn∥a∥=1, ∥b∥=1, and aTb≤0. Let V∈n×n. Then maxa{aTVb}=√bTVTVb−(bTVb)2.
Proof. Consider the following problem
with the following KKT conditions:
−Vb+μb+2λa=0
a
T
b≤0
a
T
a−1. (34)
The solution is
Note that
thus bTVTVb−(bTVb)2≥0.
Since the Hessian of the Lagrangian with respect to a is 2λI. and from the solution
therefore the solution is the minimizer, i.e., maxa{aTVb}−√{square root over (bTVTVb (bTVb)2)}.
Proof of Theorem 2
Theorem 2. Let
, and V(i,j)−σij2. When D(G)≥1−δd for all Gϕ∈, if the degradation of generation quality for all models in satisfies
and ϕTϕ′ for all ϕ, ϕ′∈Ω, then A()≥1−(δd+δa)/2.
Proof. Let ϕ and ϕ′ be any of the two orthogonal keys, and x′ and x0 be sampled from PG
Therefore
Pr(ϕTx′<0)−Pr(ϕTΣϕ′<ϕTx0/γ(ϕ′)ϕTMϕ′)≥Pr(ϕTΣϕ′<dmin(ϕ)/γ(ϕ′)−ϕTMϕ′). (40)
Note that the RHS of Eq. (40) suggests that γ(ϕ′) needs to be sufficiently small for Pr(ϕTx′<0) to be large. To see where that upper bound is, we start by noting that ϕTΣϕ′ has zero mean and is normally distributed. To analyze its variance, we use Lemma 2 to show that
Var(ϕTΣϕ′)≤σ2(ϕ′)=√ϕ′TVTVϕ′−(ϕ′TVϕ′)2. (41)
where V(i,j)=σij2,
Using the same tail bound of normal distribution as in Theorem 1, γ(ϕ′) is sufficient small if
Since ∥Δx(ϕ′)∥=γ(ϕ′)√{square root over (ϕ′TM2ϕ′)}, we have
We would like to find a lower bound of the RHS of Eq. (43) that is independent of ϕ≠ϕ′. To this end, first denote dmin′=minϕdmin(ϕ). Now use Lemma 2 again to derive an upper bound of ϕTMϕ′:
ϕTMϕ<√{square root over (ϕ′TM2ϕ−(ϕ′TMϕ)2)}. (44)
Replace ϕTMϕ′ in Eq. (43) with its upper bound to reach a ϕ-independent sufficient condition for |Δx(ϕ′)|:
We generate 1500 keys for MNIST: orthogonality oi=Σ′j−11|ϕJTϕi/(i−1) (ϕI=0), key-perturbation correlation ci−ϕiTMϕe, dmax(ϕi), distinguishability D(Gϕ
Approximation of M: The hypothesis is that the structured pattern of large eigenvectors is associated with eigenvectors of the later layers of the generators. Therefore, M can be approximated using the Jacobian of these layers. For empirical experiments, we train four-layer DCGANs for MNIST and CelebA, and compute the cosine similarities between the largest eigenvector of M and the largest eigenvectors of Jacobian of each of layers. Results are presented in
In the paper, we show examples from PGAN with CelebA. Here, we illustrate other GANs examples. For
We adopt Adam optimizer for gradient descent. We attach other parameters in Table 4. Note that we fix the hyper-parameters when we optimize Eq. (Robust training) in Implementation.
For experimental validations, we use V:100 Tesla GPUs. Exact number of GPUs are reported in Table 5.
We attach the table of ablation study of how C affects the result of distinguishability, attributability, ∥Δx∥ and FID scores in Table 6. C does not affect to the distinguishability and attributability. But C improves ∥Δx∥ and FID for every generators. Furthermore, we investigate how C term affects the robustness in Table 7 and Table 8. We can observe that, as C increases, robustness decreases but generation quality increases.
Referring to
Referring to
As indicated in block 1110, each key-dependent GAN model can be verified or attributed by, e.g., leveraging a linear classifier that returns positive only for outputs of the respective model. In other words, each key-dependent model is parameterized by keys (distributed by a registry or otherwise). The keys may be computed from first-order sufficient conditions for decentralized attribution. The keys may further be orthogonal or opposite to each other and belong to a subspace dependent on the data distribution and the architecture of the generative model.
Referring to
The computing device 1200 may include various hardware components, such as a processor 1202, a main memory 1204 (e.g., a system memory), and a system bus 1201 that couples various components of the computing device 1200 to the processor 1202. The system bus 1201 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. For example, such architectures may include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus.
The computing device 1200 may further include a variety of memory devices and computer-readable media 1207 that includes removable/non-removable media and volatile/nonvolatile media and/or tangible media, but excludes transitory propagated signals. Computer-readable media 1207 may also include computer storage media and communication media. Computer storage media includes removable/non-removable media and volatile/nonvolatile media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, program modules or other data, such as RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to store the desired information/data and which may be accessed by the computing device 1200. Communication media includes computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. For example, communication media may include wired media such as a wired network or direct-wired connection and wireless media such as acoustic, RF, infrared, and/or other wireless media, or some combination thereof. Computer-readable media may be embodied as a computer program product, such as software stored on computer storage media.
The main memory 1204 includes computer storage media in the form of volatile/nonvolatile memory such as read only memory (ROM) and random access memory (RAM). A basic input/output system (BIOS), containing the basic routines that help to transfer information between elements within the computing device 1200 (e.g., during start-up) is typically stored in ROM. RAM typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processor 1202. Further, data storage 1206 in the form of Read-Only Memory (ROM) or otherwise may store an operating system, application programs, and other program modules and program data.
The data storage 1206 may also include other removable/non-removable, volatile/nonvolatile computer storage media. For example, the data storage 1206 may be: a hard disk drive that reads from or writes to non-removable, nonvolatile magnetic media; a magnetic disk drive that reads from or writes to a removable, nonvolatile magnetic disk; a solid state drive; and/or an optical disk drive that reads from or writes to a removable, nonvolatile optical disk such as a CD-ROM or other optical media. Other removable/non-removable, volatile/nonvolatile computer storage media may include magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. The drives and their associated computer storage media provide storage of computer-readable instructions, data structures, program modules, and other data for the computing device 1200.
A user may enter commands and information through a user interface 1240 (displayed via a monitor 1260) by engaging input devices 1245 such as a tablet, electronic digitizer, a microphone, keyboard, and/or pointing device, commonly referred to as mouse, trackball or touch pad. Other input devices 1245 may include a joystick, game pad, satellite dish, scanner, or the like. Additionally, voice inputs, gesture inputs (e.g., via hands or fingers), or other natural user input methods may also be used with the appropriate input devices, such as a microphone, camera, tablet, touch pad, glove, or other sensor. These and other input devices 1245 are in operative connection to the processor 1202 and may be coupled to the system bus 1201, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB). The monitor 1260 or other type of display device may also be connected to the system bus 1201. The monitor 1260 may also be integrated with a touch-screen panel or the like.
The computing device 1200 may be implemented in a networked or cloud-computing environment using logical connections of a network interface 1203 to one or more remote devices, such as a remote computer. The remote computer may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computing device 1200. The logical connection may include one or more local area networks (LAN) and one or more wide area networks (WAN), but may also include other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.
When used in a networked or cloud-computing environment, the computing device 1200 may be connected to a public and/or private network through the network interface 1203. In such embodiments, a modem or other means for establishing communications over the network is connected to the system bus 1201 via the network interface 1203 or other appropriate mechanism. A wireless networking component including an interface and antenna may be coupled through a suitable device such as an access point or peer computer to a network. In a networked environment, program modules depicted relative to the computing device 1200, or portions thereof, may be stored in the remote memory storage device.
Certain embodiments are described herein as including one or more modules. Such modules are hardware-implemented, and thus include at least one tangible unit capable of performing certain operations and may be configured or arranged in a certain manner. For example, a hardware-implemented module may comprise dedicated circuitry that is permanently configured (e.g., as a special-purpose processor, such as a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations. A hardware-implemented module may also comprise programmable circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software or firmware to perform certain operations. In some example embodiments, one or more computer systems (e.g., a standalone system, a client and/or server computer system, or a peer-to-peer computer system) or one or more processors may be configured by software (e.g., an application or application portion) as a hardware-implemented module that operates to perform certain operations as described herein.
Accordingly, the term “hardware-implemented module” encompasses a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner and/or to perform certain operations described herein. Considering embodiments in which hardware-implemented modules are temporarily configured (e.g., programmed), each of the hardware-implemented modules need not be configured or instantiated at any one instance in time. For example, where the hardware-implemented modules comprise a general-purpose processor configured using software, the general-purpose processor may be configured as respective different hardware-implemented modules at different times. Software may accordingly configure the processor 1202, for example, to constitute a particular hardware-implemented module at one instance of time and to constitute a different hardware-implemented module at a different instance of time.
Hardware-implemented modules may provide information to, and/or receive information from, other hardware-implemented modules. Accordingly, the described hardware-implemented modules may be regarded as being communicatively coupled. Where multiple of such hardware-implemented modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connect the hardware-implemented modules. In embodiments in which multiple hardware-implemented modules are configured or instantiated at different times, communications between such hardware-implemented modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware-implemented modules have access. For example, one hardware-implemented module may perform an operation, and may store the output of that operation in a memory device to which it is communicatively coupled. A further hardware-implemented module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware-implemented modules may also initiate communications with input or output devices.
Computing systems or devices referenced herein may include desktop computers, laptops, tablets e-readers, personal digital assistants, smartphones, gaming devices, servers, and the like. The computing devices may access computer-readable media that include computer-readable storage media and data transmission media. In some embodiments, the computer-readable storage media are tangible storage devices that do not include a transitory propagating signal. Examples include memory such as primary memory, cache memory, and secondary memory (e.g., DVD) and other storage devices. The computer-readable storage media may have instructions recorded on them or may be encoded with computer-executable instructions or logic that implements aspects of the functionality described herein. The data transmission media may be used for transmitting data via transitory, propagating signals or carrier waves (e.g., electromagnetism) via a wired or wireless connection.
It should be understood from the foregoing that, while particular embodiments have been illustrated and described, various modifications can be made thereto without departing from the spirit and scope of the invention as will be apparent to those skilled in the art. Such changes and modifications are within the scope and teachings of this invention as defined in the claims appended hereto.
This is a U.S. Non-Provisional patent application that claims benefit to U.S. provisional patent application Ser. No. 63/122,306 filed on Dec. 7, 2020, which is herein incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63122306 | Dec 2020 | US |