The present disclosure relates to techniques for emulating immune system defense mechanisms to thwart adversarial attacks on deep learning systems.
State of the art in supervised learning, especially deep learning, has dramatically improved over the past decades. Many techniques are widely used as effective tools aiding human tasks, e.g., face recognition, object detection, natural language processing. Despite effectiveness, deep learning techniques have all been demonstrated vulnerable to imperceptibly examples intentionally designed by evasion attack (aka. adversarial attack). The vulnerability of deep neural networks (DNN) restricts its application scenarios and motivates researchers to develop various defense techniques.
The current defense methods can be broadly divided into three categories: (1) adversarial example detection, (2) robust training, and (3) robust deep architectures. The first category of methods intends to protect the model by distinguishing the adversarial examples. However, it was shown that adversarial detection methods are not perfect and can be easily defeated. Different from detecting the outliers in the first category, robust training aims to harden the model to deactivate the evasion attack. Known robust training methods are tailored to a certain level of attack strength in the context of lp-perturbation. Moreover, the trade-off between accuracy and robustness becomes an obstruction to enhance the robustness. Recent works are also exploring another possibility designing robust deep architectures that are naturally resilient to evasion attacks. Nevertheless, relying on the architecture alone cannot provide enough robustness, either the prediction confidence.
Facing the artificial design system's vulnerability to attacks, a natural question to ask is: can we find a robust biological system for our reference? The immune system may be the answer. Recent studies have shown that the immune system takes advantages of the three categories of defense mechanisms and incorporates life-long learning, permitting continuous hardening of the system. The immune system has the detector to distinguish the non-self contents from the self components, and is embedded with robust natural architecture. Even more surprising, the immune system continuously increases its robustness by adaptively learning from attacks.
Motivated by the immune system's powerful defense ability, this disclosure aims to develop a Robust Adversarial Immune-Inspired Learning System (RAILS) that can effectively defend against evasion attacks on deep learning systems.
This section provides background information related to the present disclosure which is not necessarily prior art.
This section provides a general summary of the disclosure, and is not a comprehensive disclosure of its full scope or all of its features.
A computer-implemented method is presented for classifying an input using a deep learning system. The method includes: receiving an input for a deep learning system, where the deep learning system was trained with a training dataset and the training dataset includes data for a plurality of classes; for each class in the training dataset, identifying a set of data points in the training dataset, where the data points in the set of data points are similar to the input; for each set of data points, generating additional data points from data points in the set of data points using genetic operators (such as selection, mutation, and crossover); for each of the data points, calculating a similarity score in relation to the input; selecting a subset of data points with the highest similarity scores amongst the data points; and predicting a class label for the input from the plurality of classes, where the prediction of a class label for the input is determined by consensus of the data points in the subset of data points with the highest similarity scores.
In some embodiments, the input is identified as an outlier prior to the step of identifying a set of data points, and remaining steps of the method are performed only when the input is identified as an outlier.
The method may further include: selecting a first subset of data points and selecting a second subset of data points, where the data points in the first subset of data points have an average similarity score higher than the average similarity score of the data points in the second subset of data points, and the data points in the second subset of data points has an average similarity score higher than the average similarity score for all of the data points. Furthermore, the input is classified to a predicted class in the plurality of classes, where the predicted class has the most similar data points to the input in the first subset of data points; and the training dataset is updated by appending the data points in the second subset to the training dataset.
Further areas of applicability will become apparent from the description provided herein. The description and specific examples in this summary are intended for purposes of illustration only and are not intended to limit the scope of the present disclosure.
The drawings described herein are for illustrative purposes only of selected embodiments and not all possible implementations, and are not intended to limit the scope of the present disclosure.
Corresponding reference numerals indicate corresponding parts throughout the several views of the drawings.
Example embodiments will now be described more fully with reference to the accompanying drawings.
Robustness in systems comes from architecture, and one of the greatest examples of this is within the mammalian adaptive immune system. With reference to
The immune system has formed an effective self renewal defense system through millions of years of evolution. Motivated by the recent understanding of the immune system, this disclosure proposes a new defense system—Robust Adversarial Immune-Inspired Learning System (RAILS). This computational system has a one-to-one mapping to the simplified immune system.
To demonstrate that the computational system indeed captures some exclusive properties of the immune system, the learning curves for an immune system and the RAILS system 20 are shown in
Adaptive Immune System Emulation (AISE) is designed and implemented with a bionic process inspiring by the mammalian immune system. Concretely, AISE generates plasma data (plasma B-cells) and memory data (memory B-cells) through multiple generations of evolutionary programming that includes three operations, namely, selection, mutation, and cross-over. The plasma data and memory data are selected in different ways, thus contributing to different model robustifying levels. The plasma data contributes to the robust predictions of the present inputs, and the memory data helps to adjust the classifiers to effectively defend future attacks. From the perspective of classifier adjustment, AISE's learning process can be divided into static learning and adaptive learning.
Static learning helps to correct the predictions of the present inputs. For illustration purposes, adaptive immune system emulation is shown integrated with a deep k-nearest neighbor (DkNN) algorithm as seen in
Recall that DkNN algorithms integrate predicted k nearest neighbors of layers in the deep neural network, and the final prediction yDkNN can be obtained by the following formula.
y
DkNN=arg maxcΣl=1Lplc(x) subject to c ∈ [C] (1)
where l is the l-th layer of a DNN with L layers in total. plc(x) is the probability predicted by kNN of class c in layer l of input x. There is a finite set of classes and the total number is C. [C] denotes the set [1, 2, . . . , C]. Note that plc (x) could be small for poisoned data, e.g., adversarial example, even c is the true class ytrue. The purpose of the static learning is to increase ply
Different from static learning, adaptive learning tries to harden the classifiers to defend the potential attacks in the future. The hardening is done by leveraging another set of data—memory data generated after clonal expansion. Unlike plasma data, memory data is selected from examples with moderate-affinity to the input, which can rapidly adapt to new variants of the current adversarial examples. This approach permits the continuous hardening of the model during the inference stage, which is life-long learning accompanied by increasing defensive ability. The adaptive learning will provide a naturally high ply
With continued reference to
Sensing is the first step of the process as indicated at 23. This step is to conduct the initial identification of the adversarial inputs and the clean inputs. The identification is an outlier detection process and can be done using different methods. In one example, DkNN provides a metric called credibility that can measure the consistency of k-nearest neighbors in each layer. The higher the credibility, the higher the confidence that the input is clean (i.e., not an outlier). Other suitable outlier detection methods include those described by L. Zhou, Y. Wei and A. Hero in “Second-Order Asymptotically Optimal Universal Outlying Sequence Detection with Reject Option,” arxiv:2009.03505, September 2019; by E. Hou, K. Sricharan, A. O. Hero in “Latent Laplacian Maximum Entropy Discrimination for Detection of High-Utility Anomalies” IEEE Transactions on Information Forensics and Security, Vol. 13, No. 6, pp. 1446-1459, June 2018; and by K. Sricharan and AO Hero in “Efficient anomaly detection using bipartite k-NN graphs,” Proc. of Neural Information Processing Systems (NIPS), Grenada Spain, December 2011 which are incorporated by reference herein. These example are merely illustrative and other outlier detection methods are also contemplated by this disclosure.
The sensing stage provides a confidence score of the DkNN architecture. In some embodiments, the remaining steps of the classification are executed only when the input is identified as an outlier. That is, the confidence score is below a predetermined threshold. In other embodiments, the sensing stage can be skipped or omitted from the classification process implemented by the RAILS system 20.
Flocking 24 is the start point for clonal expansion. For each class and each layer, find the k-nearest neighbors that have the highest initial affinity score to the input data. Mathematically, select
where x is the input, Dc is the training dataset from class c and the size |Dc|=nc.Rc: [nc]→[nc] is a ranking function that sorts the indices based on the affinity score. If memory data exists, the nearest neighbors method uses both the training data and the existing memory data.
Next, expansion 25 generates new examples (offspring) from the existing examples (parents). The ancestors are nearest neighbors found by the flocking step. The process can be viewed as creating new nodes linked to the existing nodes, and can be characterized by Preferential Attachment as described by Barabasi and Albery in “Emergence of Scaling in Random Networks” Science, 286(5439): 509-512. The probability of a new node linking to node i is
where ki is the degree of node i. New nodes prefer to attach to existing nodes having a high degree. In the RAILS system 20, the degree is the exponential of affinity measurement, and the offspring is generated by parents having high probability in the network and subnetworks. In the example embodiment, the diversities in expansion are provided by genetic operators of selection, mutation and cross-over. Other types of genetic operators are also contemplated by this disclosure. After new examples are generated, the RAILS system calculates each new example's affinity score to the input. The new examples are associated with labels that are inherited from their parents.
Optimization (affinity maturation) step 26 selects generated examples with high affinity scores to be plasma data 21, and examples with moderate-affinity scores are saved as memory data 22. The selection is based on a ranking function.
S
opt={({tilde over (x)}, {tilde over (y)})|Rg({tilde over (x)})≤|P(G)|, ({tilde over (x)}, {tilde over (y)}) ∈ P(G)} (4)
where Rg: [|P(G)|]→[|P(G)|] is the same ranking function as Rc except that the domain is the set of cardinality of the final population P(G). In one example, is a percentage parameter and is selected as 0.05 and 0.25 percent for plasma data and memory data, respectively. Note that the memory data can be selected in each generation and in a nonlinear way. In the example embodiment, memory data is selected only in last generation. Memory data will be saved in a secondary database of the system and used for model hardening.
Consensus 27 is preferably used to predicting a class label for the input. That is, the prediction of the class label for the input is determined by consensus of the data points with the highest similarity scores. In one example embodiment, the prediction for the input is determined by majority vote although other consensus methods also fall within the scope of this disclosure. Note that all the examples are associated with labels.
Algorithm 1 below further describes the five step workflow for the RAILS system 20.
indicates data missing or illegible when filed
It is to be understood that only the relevant steps of the algorithm are shown, but that other software-implemented instructions may be needed to control and manage the overall operation of the system.
Clonal expansion and affinity maturation (optimization) are the two main steps after flocking. Algorithm 2 below sets for an example implementation for these two steps.
c
(0) ← Mutation(x′) for
The goal is to promote diversity and explore the best solutions in a broader searching space.
The selection operation aims to decide which candidates in the generation will be chosen to generate the offspring. In one example, the probability for each candidate is calculated through a softmax function as follows.
where S is the set containing data points and xi ∈ S. τ>0 is the sampling temperature that controls the distance after softmax operation. Given the probability P of a candidates set S, the selection operation is to randomly pick one example pair (xi, yi) from S according to its probability.
(xi, yi)=Selections(S, P) (6)
In the example embodiment, two parents are selected for each offspring, and the second parent is selected from the same class of the first parent. The parents selection process appears in line 5—line 7 in Algorithm 2.
Next, the crossover operator combines different candidates (parents) for generating new examples (offspring). Given two parents xp and xpl, the new offspring is generated by selecting each entry (e.g., pixel) from either xp or xpl via calculating the corresponding probability. Mathematically,
where i represents the i-th entry of the example and d is the dimension of the example. The cross-over operator appears in line 8 in Algorithm 2.
This operation mutates each entry with probability ρ by adding uniformly distributed noises in the range [−δmax,−δmin]∪[δmin,δmax]. The resulting perturbation vector is subsequently clipped to satisfy the domain constraints.
x
OS=Mutation(xOSl)=Clip[0,1](xOSl+1[Bernoulli(p)]u([−δmax,−δmin]∪[δmin,δmax])) (8)
where 1[Bernoulli(p)] takes value 1 with probability ρ and value 0 with probability 1−ρ. u([−δmax,−δmin]∪[δmin,δmax]) is the vector that each entry is i.i.d. chosen from the uniform distribution U([−δmax,−δmin]∪[δmin,δmax]). Clip[0,1](x) is equivalent to max(0, min(x,1)). The mutation operation appears in line 2 and line 9 in Algorithm 2.
An overview of this classification method is described in relation to
A determination is made at 52 as to whether the input is an outlier. When the input is identified as an outlier, the process continues with the adversarial learning steps as indicated at 53. When the input is identified as a valid input, the input can be classified by the deep learning system without the adversarial learning steps. In some embodiments, detection of outliers can be skipped.
Next, training data similar to the input is identified at step 53. For each class in the training dataset, a set of data points is identified in the training dataset, where the data points in the set of data points are similar to the input. In one example, the set of data points is identified in one or at least one hidden layer of the neural network. In other examples, sets of data points are identified in more than one hidden layer or in each hidden layer of the neural network.
The set (or sets) of identified data points are then expanded using genetic operators. That is, for each set of identified data points, additional data points are generated at 54 from data points in the set of data points using genetic operators. Genetic operators may include but are not limited to selection, mutation and crossover as described above. The identified data points and the additional data points collectively form a pool of data points. For each of the data points in the pool of data points, a similarity score is also calculated in relation to the input.
Memory data is selected at 55 and plasma data is selected at 56. That is, a first subset of data points is selected and a second subset of data points is selected, where the data points in the first subset have an average similarity score higher than the average similarity score of the data points in the second subset of data points, and the data points in the second subset of data points has an average similarity score higher than the average similarity score for all of the data points. In one example, data points in the first subset of data points have a similarity score in top x percent of data points (e.g., top 5%) while the data points in the second subset of data points have a similarity score in top y percent of data points (e.g., top 20%). In another example, data points in the first subset of data points have a similarity score in top x percent of data points (e.g., top 5%) while the data points in the second subset of data points have a similarity score outside the top x percent but within the top y percent of data points (i.e., between 5% and 20%). In any case, the first subset of data points serves as the plasma data and the second subset of data points serves as memory data.
Finally, a prediction of the class label for the input is made at 57 using the plasma data. More specifically, the prediction of a class label for the input is determined by consensus of the data points in the subset of data points with the highest similarity scores. The memory data may be appended to the training data and used to classify subsequent inputs.
For the sake of simplicity, experiments are conducted in the perspective of image classification. The RAILS system 20 is compared to standard Convolutional Neural Network Classification (CNN) and Deep k-Nearest Neighbors Classification (DkNN) using the MNIST dataset. The MNIST dataset is a 10-class handwritten digit database consisting of 60,000 training examples and 10,000 test examples. The RAILS system is tested using a four-convolutional-layer neural network. The performance will be measured by standard accuracy (SA) evaluated using benign (unperturbed) test examples and robust accuracy (RA) evaluated using the adversarial (perturbed) test examples.
In addition to the clean test examples, 10,000 adversarial examples were generated using a 20-step PGD attack with attack strength E=40=60. By default, number of population T=1000, mutation probability ρ=0:15, mutation range parameters δ min=0:05(12:75=255); δ max=0:15(38:25=255), and maximum generation number G=50. To speed up the algorithm, the running stops when the newly generated examples are all from the same class. The sampling temperature τ in each layer is set to 3, 18, 18, 72.
First, results were obtained from a single layer of the CNN model in the RAILS system and compared with the results from DkNN. Table 1 below shows the comparison results in the input layer, the first convolutional layer (Conv1), and the second convolutional layer (Conv2).
One can see that for both standard accuracy and robust accuracy, RAILS can improve DkNN in the hidden layers and reach better results in the input layer. The input layer results indicate that RAILS can also outperform the performance of supervised learning methods like kNN. Referring to
Clonal expansion of RAILS system creates new examples in each generation. To better understand the capability of the RAILS system, one can visualize the changing of some key indices during the algorithm running. After the expansion and optimization, the plasma data and memory data can be compared to the nearest neighbors DkNN found.
RAILS performance is compare to CNN and DkNN in terms of SA and RA. DkNN use 750 calibration data and 59250 training data. RAILS leverages the static learning to make the predictions. The results are shown in Table 2 below.
CNN has a poor performance on adversarial examples. One can see that RAILS delivers an additional 5.62% improvement in RA without appreciable loss of SA as compare to applying DkNN alone. The confusion matrices in
The techniques described herein may be implemented by one or more computer programs executed by one or more processors. The computer programs include processor-executable instructions that are stored on a non-transitory tangible computer readable medium. The computer programs may also include stored data. Non-limiting examples of the non-transitory tangible computer readable medium are nonvolatile memory, magnetic storage, and optical storage.
Some portions of the above description present the techniques described herein in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. These operations, while described functionally or logically, are understood to be implemented by computer programs. Furthermore, it has also proven convenient at times to refer to these arrangements of operations as modules or by functional names, without loss of generality.
Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system memories or registers or other such information storage, transmission or display devices.
Certain aspects of the described techniques include process steps and instructions described herein in the form of an algorithm. It should be noted that the described process steps and instructions could be embodied in software, firmware or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by real time network operating systems.
The present disclosure also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a computer selectively activated or reconfigured by a computer program stored on a computer readable medium that can be accessed by the computer. Such a computer program may be stored in a tangible computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. Furthermore, the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
The algorithms and operations presented herein are not inherently related to any particular computer or other apparatus. Various systems may also be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatuses to perform the required method steps. The required structure for a variety of these systems will be apparent to those of skill in the art, along with equivalent variations. In addition, the present disclosure is not described with reference to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present disclosure as described herein.
The foregoing description of the embodiments has been provided for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure. Individual elements or features of a particular embodiment are generally not limited to that particular embodiment, but, where applicable, are interchangeable and can be used in a selected embodiment, even if not specifically shown or described. The same may also be varied in many ways. Such variations are not to be regarded as a departure from the disclosure, and all such modifications are intended to be included within the scope of the disclosure.
This application claims the benefit of U.S. Provisional Application No. 63/123,684, filed on Dec. 10, 2020. The entire disclosure of each of the above application is incorporated herein by reference.
This invention was made with government support under HR00112020011 by the U.S. Department of Defense, Defense Advanced Research Projects Agency. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
63123684 | Dec 2020 | US |