This application claims priority to Netherlands Patent Application No. 2024341, titled “A Method for Training a Robust Deep Neural Network Model”, filed on Nov. 29, 2019, and Netherlands Patent Application No. 2025214, titled “A Method for Training a Robust Deep Neural Network Model”, filed on Mar. 26, 2020, and the specification and claims thereof are incorporated herein by reference.
Embodiments of the present invention relate to a method for training a robust deep neural network model.
Deep neural networks (DNNs) have emerged as a predominant framework for learning multiple levels of representation, with higher levels representing more abstracts aspects of the data [lit. 2]. The better representation has led to state-of-the-art performance in many challenging tasks in computer vision [lit. 12, 20], natural language processing [lit. 4, 22] and many other domains [lit. 8, 17]. However, despite their pervasiveness, recent studies have exposed the lack of robustness of DNNs to various forms of perturbations [lit. 6, 9, 19]. In particular, adversarial examples which are small imperceptible perturbations of the input data carefully crafted by adversaries to cause erroneous predictions pose a real security threat to DNNs deployed in critical applications [lit. 13].
The intriguing phenomenon of adversarial examples has garnered a lot of attention in the research community [lit. 23] and progress has been made in both creating stronger attacks to test the model's robustness [lit. 3, 5, 16, 21] as well as defenses to these attacks [lit. 14, 15, 24]. However, Athalye et al. [Lit. 1] show that most of the proposed defense methods rely on obfuscated gradients which is a special case of gradient masking and lowers the quality of the gradient signal causing gradient based attack to fail and give a false sense of robustness. They observe adversarial training [Lit. 15] as the only effective defense method. The original formulation of adversarial training, however, does not incorporate the clean examples into its feature space and decision boundary. On the other hand, Jacobsen et al. [Lit. 10] provide an alternative view point and argue that the adversarial vulnerability is a consequence of narrow learning, resulting in classifiers that rely only on a few highly predictive features in their decisions. A full understanding of the major factors that contribute to adversarial vulnerability in DNNs has not yet been developed and consequently the optimal method for training robust models remains an open question.
The current state-of-the-art method, TRADES [Lit. 24] adds a regularization term on top of a natural cross-entropy loss which forces the model to match its embeddings for a clean example and a corresponding adversarial example. However, there might be an inherent tension between the objective of adversarial robustness and that of natural generalization [Lit. 25].
Therefore, combining these optimization tasks together into a single model and forcing the model to completely match the feature distributions of the adversarial and clean examples may lead to sub-optimal solutions.
It is an object of embodiments of the present invention to address the above highlighted shortcomings of current adversarial training approaches.
Within the scope of the invention, the optimization for adversarial robustness and generalization are considered as two distinct yet complementary tasks, and encouraging a more exhaustive exploration of the input and parameter space can lead to better solutions.
To this end, embodiments of the present invention are directed to a method for training a deep neural network model which trains a robust model in conjunction with a natural model in a collaborative manner.
The method utilizes task specific decision boundaries to align the feature space of the robust and natural model in order to learn a more extensive set of features which are less susceptible to adversarial perturbations.
Embodiments of the present invention closely intertwine the training of a robust and natural model by involving them in a minimax game inside a closed learning loop. The adversarial examples are generated by determining regions in the input space where the discrepancy between the two models is maximum.
In a subsequent step, each model minimizes a task specific loss which optimizes the model on its specific task, in addition to a mimicry loss that aligns the two models.
The formulation comprises bi-directional knowledge distillation between a clean and an adversarial domain, enabling the models to collectively explore the input and parameter space more extensively. Furthermore, the supervision from the natural model acts as a regularizer which effectively adds a prior on the learned representations and leads to semantically meaningful features that are less susceptible to off-manifold perturbations introduced by adversarial attacks.
In summary, embodiments of the present invention entail training an adversarially robust model in conjunction with a natural model in a collaborative manner (
Embodiments of the present invention have a number of advantages. The adversarial perturbations generated by identifying regions in the input space where the two models disagree can be effectively used to align the two models and leads to more smoother decision boundaries (see
Also, updating the models based on the disagreement regions in the input space coupled with optimization on distinct tasks ensures that the two models do not converge to a consensus. Furthermore, the supervision from the natural model acts as a noise-free reference for regularizing the robust model. This effectively adds a prior on the learned representations which encourages the model to learn semantically relevant features in the input space. Coupling this with the affinity of the robust model pushes the model towards features with stable behaviour within the perturbation bound.
The accompanying drawings, which are incorporated into and form a part of the specification, illustrate one or more embodiments of the present invention and, together with the description, serve to explain the principles of the invention. The drawings are only for the purpose of illustrating one or more embodiments of the invention and are not to be construed as limiting the invention. In the drawings:
Referring to
The following discussion applies to the training method of the invention with reference to
Each model, i.e. the robust model and the natural model, is trained with two losses: a task specific loss and a mimicry loss which is used to align each model with the other one. The natural cross-entropy between the output of the model and the ground truth class labels is used as a task specific loss, indicated by LCE. To align the output distributions of the two models, the method uses made of Kullback-Leibler Divergence (DKL) as the mimicry loss. The robust model, G, minimizes the cross-entropy on adversarial examples and the class labels, in addition to minimizing a discrepancy between its predictions on adversarial examples and the soft-labels from the natural model on clean examples.
The adversarial examples are generated by identifying regions in the input space where the discrepancy between the robust and natural model is maximum (Maximizing Equation 1).
The overall loss function for the robust model parametrized by θ is as follows:
G(θ,ϕ,δ)=(1−αG)CE(G(x+δ;θ),y)+αGDDL(G(x+δ;θ)∥F(x;ϕ)) Equation 1:
where x is the input image to the model and δ is the adversarial perturbation.
The natural model, F, uses the same loss function as the robust model, except it optimizes the generalization error by minimizing the task specific loss on clean examples. The overall loss function of the natural model parametrized by φ is as follows:
F(θ,ϕ,δ)=(1−αF)CE(F(x;ϕ),y)+αFDKL(F(x;ϕ)∥G(x+δ;θ) Equation 2:
The tuning parameters αG, αF∈[0,1] play key roles in balancing the importance of task specific and alignment errors.
The algorithm for training the models is summarized below:
G (θ, ϕ, δ)
G (θ, ϕ, δ) (Equation 1)
F (θ, ϕ, δ) (Equation 2)
The effectiveness of the method according to the invention is empirically compared to prior art training methods of Madry [lit. 15] and TRADES [lit. 24]. The table below shows the effectiveness of adversarial concurrent training ACT across different datasets and network architectures.
In this analysis CIFAR-10 [lit. 11] and CIFAR-100 [lit. 11] datasets are used and ResNet [lit. 7] and WideResNet [lit. 26] network architectures. In all experiments, the images are normalized between 0 and 1 and for training random cropping is applied with reflective padding of 4 pixels and random horizontal flip data augmentations.
For training ACT, Stochastic Gradient Descent with momentum is used; 200 epochs; batch size 128; and an initial learning rate of 0.1, decayed by a factor of 0.2 at epochs 60, 120 and 150.
For Madry and TRADES, the training scheme used in lit. 24 is applied. To generate the adversarial examples for training, we set the perturbation ε=0.031, perturbation step size η=0.007, number of iterations K=10. For a fair comparison, we use λ=5 for TRADES which they report achieves the highest robustness for ResNet18.
Our re-implementation achieves both better robustness and generalization than reported in lit. 24. The adversarial robustness of the model is evaluated with Projected Gradient Descent (PGD) attack [lit. 15], the perturbation ε=0.031, perturbation step size η=0.003 and the number of iterations K=20.
Specifically, for ResNet18 on CIFAR-100 and WRN-28-10 on CIFAR-10, ACT significantly improves both the generalization and the robustness compared to Madry and TRADES. ACT consistently achieves better robustness and generalization compared to TRADES. In instances where Madry has better generalization, the difference in the robustness is considerably larger.
To test the adversarial robustness of the models more extensively, the average minimum perturbation required to successfully fool the defense methods is also evaluated. The FGSMk in foolbox [It. 18] is applied, which returns the smallest perturbation under the Iinf distance. The table shows that ACT consistently requires higher perturbation in images on average across the different datasets and network architectures.
Optionally, embodiments of the present invention can include a general or specific purpose computer or distributed system programmed with computer software implementing steps described above, which computer software may be in any appropriate computer language, including but not limited to C++, FORTRAN, BASIC, Java, Python, Linux, assembly language, microcode, distributed programming languages, etc. The apparatus may also include a plurality of such computers/distributed systems (e.g., connected over the Internet and/or one or more intranets) in a variety of hardware implementations. For example, data processing can be performed by an appropriately programmed microprocessor, computing cloud, Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), or the like, in conjunction with appropriate memory, network, and bus elements. One or more processors and/or microcontrollers can operate via instructions of the computer code and the software is preferably stored on one or more tangible non-transitive memory-storage devices.
Although the invention has been discussed in the foregoing with reference to an exemplary embodiment of the training method of the invention, the invention is not restricted to this particular embodiment which can be varied in many ways without departing from the invention. The discussed exemplary embodiment shall therefore not be used to construe the appended claims strictly in accordance therewith. On the contrary the embodiment is merely intended to explain the wording of the appended claims without intent to limit the claims to this exemplary embodiment. The scope of protection of the invention shall therefore be construed in accordance with the appended claims only, wherein a possible ambiguity in the wording of the claims shall be resolved using this exemplary embodiment.
Embodiments of the present invention can include every combination of features that are disclosed herein independently from each other. Although the invention has been described in detail with particular reference to the disclosed embodiments, other embodiments can achieve the same results. Variations and modifications of the present invention will be obvious to those skilled in the art and it is intended to cover in the appended claims all such modifications and equivalents. The entire disclosures of all references, applications, patents, and publications cited above are hereby incorporated by reference. Unless specifically stated as being “essential” above, none of the various components or the interrelationship thereof are essential to the operation of the invention. Rather, desirable results can be achieved by substituting various components and/or reconfiguration of their relationships with one another.
Note that this application refers to a number of publications. Discussion of such publications herein is given for more complete background and is not to be construed as an admission that such publications are prior art for patentability determination purposes.
The referenced cited herein are as follows:
Number | Date | Country | Kind |
---|---|---|---|
2024341 | Nov 2019 | NL | national |
2025214 | Mar 2020 | NL | national |