This application claims the benefit of Korean Patent Application No. 10-2019-0174038 filed on Dec. 24, 2019, which is herein incorporated by reference in its entirety.
The present invention relates generally to attack-less adversarial training that does not use an existing attack technique for robust adversarial defense, and more specifically to attack-less adversarial training that generates a new image from an original image through mapping and randomization and trains a neural network with the generated new image, thereby defending the neural network against attack techniques.
Adversarial machine learning relates to attack techniques that focus on deceiving defense models with noise that is not perceivable by a human. An adversarial example is one of the examples that are generated in adversarial machine learning. An adversarial example may deceive an application by obstructing a task, such as the detection of an object, the classification of an image, the recognition of voice, or the like, in an application on a computer. For example, in a face recognition application, an attacker may deceive an application by proving that he or she is an authenticated user in such a way as to attach a sticker to his or her face. Furthermore, in an automobile image classification system, an attacker may deceive the automobile image classification system by causing a stop signal to be recognized as a forward movement signal in such a way as to add adversarial noise to the stop signal. As a result, such an attack may cause a serious disaster.
In an adversarial example for image classification, minimum adversarial noise may be generated and added to a legitimate image. The adversarial noise refers to the perturbation of pixels that are generated on an image. Accordingly, in the generation of an adversarial image, the smaller the number of pixels to be perturbed is, the more effective an attack technique is.
A defense technique refers to a method of generating a robust neural network capable of accurately detecting or correctly classifying adversarial examples and preventing adversarial attacks. Adversarial training is a defense technique that was first introduced by Ian Goodfellow et al. Adversarial training is designed to generate adversarial examples using attack techniques and to apply the adversarial examples to a neural network during a training phase. However, adversarial training is effective only for existing attack techniques and attack techniques similar to the existing attack techniques, and is vulnerable to new and state-of-the-art attack techniques. Furthermore, to generate an adversarial example for training, adversarial training requires at least one attack technique.
(Patent document 1) US20190220605 A1
(Patent document 2) WO2019143384 A1
The present invention has been conceived to overcome the above-described problems, and an object of the present invention is to provide attack-less adversarial training for robust adversarial defense, which generates a new image from an original image through mapping and randomization and trains a neural network with the generated new image, thereby robustly defending the neural network against new and state-of-the-art attack techniques.
In order to accomplish the above object, the present invention provides attack-less adversarial training for robust adversarial defense, the attack-less adversarial training including the steps of: (a) generating individual intervals (ci) by setting the range of color (C) and then discretizing the range of color (C) by a predetermined number (k); (b) generating one batch from an original image (X) and training a learning model with the batch; (c) predicting individual interval indices (ŷialat) from respective pixels (xi) of the original image (X) by using an activation function; (d) generating a new image (Xalat) through mapping and randomization; and (e) training a convolutional neural network with the image (Xalat) generated in step (d) and outputting a predicted label (Ŷ).
Step (b) may include: (b-1) generating individual accurate interval indices (yi) by randomly extracting a plurality of pixels (xi) from the original image (X) and then allowing the extracted pixels (xi) to the respective intervals (ci) generated in step (a); (b-2) generating a plurality of instances each including each of the pixels (xi) and a corresponding one of the accurate interval indices (yi); (b-3) generating one batch including the plurality of instances generated in step (b-2); and (b-4) training a learning model with the batch generated in step (b-3).
Step (d) may include: (d-1) mapping the individual predicted interval indices (ŷialat) and returning corresponding intervals (ci); (d-2) randomly generating individual new pixels (xialat) within the range of the individual intervals (ci) returned in step (d-1); and (d-3) generating a new image (Xalat) by allocating the individual new pixels (xialat), generated in step (d-2), to the respective locations of the individual pixels (xi) of the original image (X).
The activation function used in step (c) may be a softmax function.
The above and other objects, features, and advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that those having ordinary skill in the art to which the present invention pertains can easily practice the technical spirit of the present invention.
However, the following embodiments are merely examples intended to help an understanding of the present invention, and thus the scope of the present invention is not reduced or limited by the embodiments. Furthermore, the present invention may be embodied in many different forms, and are not limited to the embodiments set forth herein.
Attack-less adversarial training is a defense technique that generates a new image from an original image through mapping and randomization and trains a neural network with the generated new image, thereby robustly defending the neural network against state-of-the-art attack techniques.
Hereinafter, attack-less adversarial training according to the present invention will be referred to as “ALAT.”
The main steps of ALAT will be described below.
Referring to
When the range of color C is discretized into k intervals, a resulting set of intervals is {ci|ci⊂C}, where ci=c1, c2, . . . , ck. In this case, the minimum value of intervals ci is si
For example, when color C=[0,255], color C is discretized into five intervals and each of the intervals is equally divided at [0,255], c1=[s1
A second step is step 1020 of generating one batch from an original image X and then training a learning model.
First, a plurality of pixels xi is randomly extracted from the original image X, and individual accurate interval indices yi are generated by mapping the extracted individual pixels xi to the respective generated intervals ci generated in the above-described first step.
Furthermore, a plurality of instances is each generated by including each pixel xi and an accurate interval index yi corresponding to the pixel xi. In this case, the generated instance may be represented as (xi, yi), where xi is a randomly extracted pixel and yi is an accurate interval index corresponding to the randomly extracted pixel.
Furthermore, there is generated one batch including the plurality of generated instances.
Finally, the learning model is trained by inputting the generated one batch to the learning model.
For example, when the randomly extracted pixel x1 is 38 in the pervious example, an accurate interval index yi corresponding to the randomly extracted pixel x is 1. The reason for this is that 38 is a number in the range of 0 to 51. In other words, the reason why the accurate interval index yi is 1 is that 38 is a number that belongs to the intervals c1=[0,51].
In this case, the generated instance is (38,1). Furthermore, the instances (113,3), (204,4), and (3,1) may be generated in the same manner. Furthermore, there is generated one batch including a plurality of generated instances (38,1), (113,3), (204,4), and (3,1). Finally, a learning model may be trained by inputting the generated batch to the learning model.
Hereinafter, the learning model trained in the second step of the embodiment 1 of the present invention will be referred to as the “ALAT model.”
A third step is step 1030 of outputting each interval index ŷialat predicted from each pixel xi of the original image X by using an activation function.
The equation for predicting the interval index is as follows:
ŷialat=a(wxi+b)
where ŷialat is a predicted interval index, w is a weight, xi is a pixel of an original image, b is a bias, and a(⋅) is a softmax function, which is an activation function.
In this case, an accurate interval index is represented by yi, and a predicted interval index is represented by ŷialat. These symbols are distinctively used because a predicted interval index is not an accurate interval index value but an interval index value that can be predicted by a trained ALAT model.
A fourth step is step 1040 of generating a new image Xalat through mapping and randomization.
First, each predicted interval index ŷialat is mapped, and each interval ci corresponding to each predicted interval index ŷialat returned. A function that returns the interval ci is as is follows:
ci=colorset(ŷialat)
where colorset(⋅) is a function that returns each interval ci from the predicted interval index ŷialat.
Furthermore, each new pixel xialat is randomly generated within the range of the individual mapped intervals ci.
A function that generates the new pixel xialat is defined as follows:
xialat=randomci(si
where randomci is a random function that generates a random value within the range of the minimum value si
Finally, an ALAT image Xalat is generated by allocating each new pixel xialat to the location of each pixel xi of the original image X.
For example, in the pervious example, when a pixel x2 of the original image is 75, the interval index ŷ2alat predicted by the ALAT model may be 2. Furthermore, the predicted interval index ŷ2alat=2 is mapped by a colorset function, and the interval c2=[52,102] is returned. Furthermore, the new pixel x2alat=85 may be randomly generated within the range of the minimum value 52 of the mapped interval c2 to the maximum value 102 of c2. Finally, a new image Xalat may be generated by allocating the new pixel x2alat to the location of the pixel x2 of the original image X and performing the same on the remaining pixels of the original image in the same manner. In this case, the image Xalat newly generated from the original image is referred to as the “ALAT image.” Furthermore, a method that is applied to the first to fourth steps of embodiment 1 of the present invention is referred to as an ALAT method below.
A fifth step is step 1050 of training a convolutional neural network (CNN) with the ALAT image Xalat generated in the above-described fourth step and outputting a predicted label Ŷ.
An equation that trains the convolutional neural network with the ALAT image Xalat is as follows:
Ŷ=F(Xalat)
where the function F(⋅) is a function that generates the predicted label Ŷ for one image.
Referring to
Referring to
First, symbols mainly used in experimental example 1 are as follows.
{circumflex over (X)} is an adversarial image of the original image X. Ŷ is a predicted label that may be output by inputting one image to a convolutional neural network (CNN).
Furthermore, the function F(⋅) is a function that generates a predicted label f for an image. Furthermore, the attack technique A(⋅) is a function that generates the adversarial image {circumflex over (X)} from the original image X with or without the function F(⋅). Furthermore, D(X1, X2) is the distance between two images X1 and X2.
Attack techniques that were applied to experimental example 1 of the present invention include the Fast Gradient Sign Method (FGSM), the Basic Iterative Method (BIM), the Momentum Iterative Method (MIM), the L2-Carlini & Wagner's Attack (L2-CW), the Backward Pass Differentiable Approximation (BPDA), and the Expectation Over Transformation (EOT).
First, FGSM (the Fast Gradient Sign Method) is a fast and simple attack technique that was proposed by Goodfellow et al. and generates an adversarial example.
BIM (The Basic Iterative Method) is an extension of the FGSM that was proposed by Kurakin et al. and applies multiple iterations with a small step size in order to obtain the smallest perturbation of an original image.
MIM (the Momentum Iterative Method) is an attack technique that was proposed by Dong et al. and is more advanced than BIM because it is equipped with a momentum algorithm.
L2-CW is an attack technique that is effective in finding an adversarial example with the smallest perturbation.
BPDA (Backward Pass Differentiable Approximation) is an attack technique that replaces a non-differentiable layer in a neural network with a differentiable approximation function during the back-propagation step.
EOT (Expectation Over Transformation) is an attack technique that allows for the generation of adversarial examples that remain adversarial over a chosen distribution T of transformation functions taking an input.
Furthermore, in experimental example 1 of the present invention, there were used Modified National Institute of Standards and Technology (MNIST), Fashion MNIST, and Canadian Institute For Advanced Research (CIFAR-10).
For the CIFAR-10 dataset, another CIFAR-10 (grayscale) was generated in order to analyze the effect of the ALAT method on a color image.
MNIST and fashion MNIST had 60,000 training images and 10,000 test images associated with 10 grade labels. The size of each image was 28×28 grayscales. CIFAR-10 had 50,000 training images and 10,000 test images associated with 10 grades. The size of each image was 32×32 color.
In experimental example 1 of the present invention, when FGSM, BIM and MIM attacks were applied, ∈=77/255, which was the largest perturbation allowed for each pixel, was set for a MNIST dataset and ∈=8/255 was set for Fashion MNIST and CIFAR-10 datasets.
Furthermore, for an L2-CW attack, the number of iterations for the execution of an attack was set to 1,000.
In the present invention, the ALAT method may be evaluated based on individual cases having different attack scenarios. In this case, the individual cases are a normal case, case A, case B, case C, and case D.
The process of generating the ALAT image Xalat by applying the ALAT method to the original image X may be expressed by the following equation:
Xalat=Falat(X)
In the normal case, a convolutional neural network is evaluated using the ALAT method in the testing phase. The normal case is a case where an attack is not applied, in which case a convolutional neural network may be tested using an original image. A defense mechanism generates the ALAT image by applying the ALAT method to the original image.
Xalat=Falat(X)
Furthermore, the defense mechanism applies the ALAT image to the trained convolutional neural network.
Ŷ=F(Xalat)=(Falat(X))
In case A, the convolutional neural network is evaluated using the ALAT method in the testing phase. An attacker knows the parameters of the trained convolutional neural network, but does not know about the ALAT method. The attacker generates an adversarial image from the original image by using the parameters of the trained convolutional neural network.
{circumflex over (X)}=A(F,X)
The defense mechanism generates the ALAT image by applying the ALAT method to the received adversarial image.
{circumflex over (X)}alat=Falat({circumflex over (X)})
Furthermore, the defense mechanism applies the ALAT image to the trained convolutional neural network.
Ŷ=F({circumflex over (X)}alat)=F(Falat({circumflex over (X)}))=F(Falat(A(F,X)))
In case B, the convolutional neural network is evaluated without the ALAT method in the testing phase. An attacker knows the parameters of the trained convolutional neural network, but does not know about the ALAT method. The attacker generates an adversarial image from an image by using the parameters of the trained convolutional neural network.
{circumflex over (X)}=A(F,X)
The trained convolutional neural network uses the adversarial image as an input without undergoing a preprocessing process by the ALAT method.
Ŷ=F({circumflex over (X)})=F(A(F,X))
In case C, the convolutional neural network is evaluated without the ALAT method in the testing phase. An attacker knows both the parameters of the trained convolutional neural network and the parameters of the ALAT model. The attacker generates an adversarial image from the original image by using the parameters of the trained convolutional neural network and the parameters of the ALAT model.
{circumflex over (X)}=A(F,Falat,X)
The trained convolutional neural network uses the adversarial image as an input without undergoing a preprocessing process by the ALAT method.
Ŷ=F({circumflex over (X)})=F(A(F,Falat,X))
In case D, the convolutional neural network is evaluated using the ALAT method in the testing phase. An attacker knows both the parameters of the trained convolutional neural network and the parameters of the ALAT model. The attacker generates an adversarial image from the original image by using the parameters of the trained convolutional neural network and the parameters of the ALAT model.
{circumflex over (X)}=A(F,Falat,X)
The defense mechanism generates the ALAT image by applying the ALAT method to the received adversarial image.
{circumflex over (X)}alat=Falat({circumflex over (X)})
Furthermore, the defense mechanism applies the newly generated ALAT image to the trained convolutional neural network.
Ŷ=F({circumflex over (X)}alat)=F(Falat({circumflex over (X)}))=F(Falat(A(F,Falat,X)))
Referring to
Meanwhile, the adversarial images generated from cases C and D have a large perturbation because it is difficult for an attacker to calculate the derivative of a randomization method used in the ALAT method. In order to mitigate the differential calculation problem of the randomization method and to minimize the high distortion of the adversarial image generated in case D, each attack technique is integrated with the BPDA method or the EOT method.
If the defense system generates an obfuscated gradient, the attack technique cannot obtain appropriate gradient information to generate adversarial examples. Furthermore, when the BPDA method or EOT method is integrated with each attack technique, the conventional defense system is known to be unable to fully defend against the adversarial examples due to the obfuscated gradient.
To evaluate whether or not the ALAT method generates the obfuscated gradient, each attack technique is integrated with the BPDA method or EOT method.
Assuming that the ALAT image Xalat is generated by adding some noise to the original image X, an equation for obtaining Xalat is as follows:
Xalat≈X+γalat
In this case, γalat is a noise matrix.
In this case, a predicted label may be calculated as follows:
Ŷ=F(Xalat)≈W(X+γalat)+B
Ŷ≈WXalat+B
In this case, W is a weight matrix, and B is a bias matrix.
In the above equation, it can be seen that the derivation of Ŷ in terms of Xalat returns only W. From this, it can be seen that the adversarial examples generated in the attack scenarios of cases C and D have a larger perturbation than the adversarial examples generated from the attack scenarios of cases A and B.
To minimize perturbation in the attack scenarios of cases C and D, adversarial examples are generated using BPDA for the attack scenarios of cases C and D.
First, a preprocessing method of converting the original image into the ALAT image is executed. After the ALAT image has been input to the convolutional neural network, a predicted value and loss value of the convolutional neural network are obtained. Thereafter, during back propagation, the adversarial ALAT image {circumflex over (X)}alat is generated by adding the ALAT image Xalat to a loss function value for Xalat.
{circumflex over (X)}alat=Xalat+∈·∇X
where ∈ is the largest perturbation allowed for each pixel and L is the loss function.
Finally, noise γalat is subtracted from adversarial ALAT image {circumflex over (X)}alat.
The general equation of BPDA used in experimental example 1 is as follows:
{circumflex over (X)}=ClipX(A(F,Xalat)−(Xalat−X))
{circumflex over (X)}=ClipX({circumflex over (X)}alat−γalat)
where A(⋅) is an attack technique.
To evaluate the EOT method, a final ALAT image is generated by generating 10 ALAT images and calculating the average of the images.
Xfinalalat=E(Xf1..10alat)
An adversarial image is generated using the final ALAT image.
Referring to
Referring to
Referring to
Meanwhile, the reason for this is that in order to prevent a human from recognizing perturbation in an adversarial example, the perturbation between the original image and the adversarial image needs to be as low as possible.
In experimental example 2, the same benchmark datasets and attack techniques used in experimental example 1 are also applied.
A plurality of ALAT models is trained, and each pixel of an original image may be predicted using one of the plurality of trained ALAT models. In other words, to predict the pixel of the original image, one ALAT model may be randomly selected from among the plurality of ALAT models. Thereafter, the step of randomly selecting one ALAT model from among the plurality of ALAT models is repeated until all the pixels are reproduced.
As described above, in experimental example 1, cases C and D are not practical, and are thus excluded from the experiment.
In experimental example 3, the same benchmark datasets and attack techniques used in experimental example 1 are also applied.
The RNI method is applied to three different steps including (1) the training phase, (2) the testing phase, and (3) both the training and testing phases.
In the RNI method, a uniform distribution is used, and a distribution range extends from −1.0 to +1.0.
A noise value generated from a uniform distribution is added to an original image. Furthermore, the summed output is clipped into a range of 0.0 to 1.0 (a normalized pixel value).
The equation of RNI may be expressed as follows:
XiRNI=Clip(Xi+U(−1,+1))
where Xi is the original image and XiRNI is an image generated from the RNI method. Furthermore, U(−1, +1) is a uniform distribution ranging from −1 to +1.
Referring to
Referring to
Referring to
In experimental example 4, the same benchmark datasets and attack techniques used in experimental example 1 are also applied.
Adversarial training is related to attack techniques in the training phase. When a low-level attack technique such as an FGSM attack is used in adversarial training, the low-level attack technique provides lower performance than a high-level attack technique such as a BIM or MIM attack. For a realistic experiment, an MIM attack may be set as the attack technique that is used for adversarial training.
Referring to
In experimental example 5, the same benchmark datasets and attack techniques used in experimental example 1 are also applied.
In experimental example 5, the effect of the number of intervals k on the convolutional neural network is analyzed. In experimental example 5, k=3, k=4 and k=10 are set.
As described above in conjunction with experimental example 1, cases C and D are not practical, and are thus excluded from the experiment.
Table 1 shows comparisons in the performance of the ALAT method among a normal case, case A, and case B to which different numbers (k) of intervals were applied at their 1000th epoch.
Referring to Table 1, the performance of the ALAT method in which k=3 was best. In other words, in the case of the ALAT method in which k=3, winning nodes are 19 in number. In contrast, in the case of the ALAT method in which k=4, winning nodes are 15 in number, and in the case of the ALAT method in which k=10, winning nodes are 2 in number.
When an appropriate k is used in the ALAT method, the robustness of the convolutional neural network is improved.
The attack-less adversarial training for robust adversarial defense according to the present invention, which is configured as described above, provides the effects of improving the robustness of a neural network and not generating an obfuscated gradient.
Furthermore, the attack-less adversarial training for robust adversarial defense according to the present invention offers better performance than the random noise injection method and the adversarial training method.
Moreover, the attack-less adversarial training for robust adversarial defense according to the present invention does not require any attack technique unlike the conventional adversarial training and defending against new and state-of-the-art attack techniques.
Although the specific embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible without departing from the scope and spirit of the invention as disclosed in the accompanying claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2019-0174038 | Dec 2019 | KR | national |
Number | Name | Date | Kind |
---|---|---|---|
20170200092 | Kisilev | Jul 2017 | A1 |
20210157911 | Yu | May 2021 | A1 |
Entry |
---|
Miyazato, Shuntaro, Xueting Wang, Toshihiko Yamasaki, and Kiyoharu Aizawa. “Reinforcing the robustness of a deep neural network to adversarial examples by using color quantization of training image data.” In 2019 IEEE International Conference on Image Processing (ICIP), pp. 884-888. IEEE, 2019. (Year: 2019). |
Prakash, Aaditya, Nick Moran, Solomon Garber, Antonella DiLillo, and James Storer. “Deflecting Adversarial Attacks with Pixel Deflection.” In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8571-8580. IEEE, 2018. (Year: 2018). |
Chen, Jiefeng, Xi Wu, Vaibhav Rastogi, Yingyu Liang, and Somesh Jha. “Towards understanding limitations of pixel discretization against adversarial attacks.” In 2019 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 480-495. IEEE, 2019. (Year: 2019). |
Zhang, Yuchen, and Percy Liang. “Defending against whitebox adversarial attacks via randomized discretization.” In the 22nd International Conference on Artificial Intelligence and Statistics, pp. 684-693. PMLR, 2019. (Year: 2019). |
Number | Date | Country | |
---|---|---|---|
20210192339 A1 | Jun 2021 | US |