Classification of information into one of different categories is an important task in many applications, including malware detection, electronic mail (email) spam filtering, image and video identification, sentiment analysis in social media posts, etc. A classification task may be automated using a computer generated tool called a classification engine or a classifier that is built using an artificial intelligence based technique called supervised learning.
A simple classification example includes classification of email into one of two categories, spam versus non-spam. The training data for such classification may include several instances of email that have been labeled appropriately as spam and non-spam. When the classifier is able to achieve a substantially high degree of accuracy on unlabeled emails given as test data, its training is considered complete. After that, the classifier may operate automatically as a spam filter to categorize incoming emails as spam or non-spam. It is burdensome and costly to train and operate a classifier. For example, there is a need to acquire training data for the classification task, clean and structure the data into an appropriate format for training, hire human experts to label the data correctly, and perform computations for the training process. In addition, computational resources (e.g., memory and processing power) are needed for operating the classifier.
This Summary is intended to introduce, in simplified form, a selection of concepts that are further described in the Detailed Description. This Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Instead, it is merely presented as a brief overview of the subject matter described and claimed herein.
Embodiments described herein are related to a learner that that automatically classifies information provided to it into one of multiple categories. The learner includes multiple classifiers with different degrees of hardening such that each classifier is trained with adversarial data of a different strength. For a given query to be classified, the learner is configured to intelligently select a classifier that is commensurate with the query type to classify the query.
An embodiment is directed to a method for operating a learner that maintains a classifier ensemble for data classification. The method includes receiving a query for classification from an adversary having an adversary type that corresponds to an adversarial strength that is used to perturb the data in the query, the adversary type not being directly known by the learner. The method further includes determining a strength of the adversary based on a predicted adversary type distribution. The method also includes selecting a classifier from the classifier ensemble that has been trained with at least one of clean data or adversarial data, the classifier having a classification strength that is commensurate with the determined strength of the adversary. The method further includes classifying the query using the selected classifier.
Another embodiment is directed to a method for training a classifier ensemble for classifying data. The method includes receiving clean data and perturbing the clean data to generate a first adversarial data type of a plurality of adversarial data types, each adversarial data type corresponding to an adversarial strength. The method further includes training a first classifier with the clean data, the first classifier having a first classification strength. The method also includes training a second classifier with the first data type, the second classifier having a second classification strength.
Yet another embodiment is directed to a system that includes a processor and memory that stores computer program logic for execution by the processor, the computer program logic comprising a classifier ensemble. The classifier ensemble comprises a first classifier that is trained with clean data, the first classifier having a first classification strength, and a second classifier that is trained with a first adversarial data type, the second classifier having a second classification strength.
References in the specification to “one embodiment,” “an embodiment,” “an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
In describing and claiming the disclosed embodiments, the following terminology will be used in accordance with the definition set forth below.
As used herein, the singular forms “a,” “an,” “the,” and “said” do not preclude plural referents, unless the content clearly dictates otherwise.
As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
As used herein, the term “about” or “approximately” when used in conjunction with a stated numerical value or range denotes somewhat more or somewhat less than the stated value or range, to within a range of ±10% of that stated.
Terminology used herein should not be construed as being “means-plus-function” language unless the term “means” is expressly used in association therewith.
Adversarial machine learning (ML) is important in machine-learning based prediction systems such as email spam filters, online recommendation systems, text classifier and sentiment analysis used on social media, and automatic video and image classifiers. The main problem in adversarial learning is to prevent an adversary from bypassing an ML based predictive model such as a classifier by sending engineered, malicious data instances called adversarial examples. These evasion attacks could enable a malicious adversary to subvert the ML model of the learner and possibly access critical resources being protected by the learner. For instance, in the context of malware detection, an adversary may surreptitiously insert ill-formed Portable Document Format (PDF) objects into a valid PDF file to convert it into a malware that may bypass a ML based malware detector and subsequently crash an Internet browser attempting to read the corrupted PDF file. Techniques have been used on a single classifier to address adversarial learning. For example, classifier hardening is an approach that refines the decision boundary of the classifier over time via re-training with adversarial data. However, improving the robustness of a single classifier remains an open problem and the classifier hardening approach is still susceptible to adversarial attacks.
Moreover, classifier hardening techniques do not explicitly align budgets and/or resources (e.g., adversarial training data acquisition, time, and computing resource) with the data being classified. For instance, for classifying clean data, a classifier hardened over several batches of adversarial data might be excessive, as a classifier that is not hardened might achieve similar performance.
Most existing classification techniques employ a single classifier. However, training only one classifier repeatedly makes the classifier more complex (e.g., a deep neural network classifier may have thousands of parameters and is therefore very large in size) and prone to failure, as well as incurs more time and computational resources (e.g., processing hardware, processing power).
Even when multiple classifiers are used, there are issues with the current techniques. One current approach to classification requires inspection of the properties of a received query using third party tools (e.g., using a file categorizer to identify the format of the input data from its structure and attributes), and then a classifier that has been trained to classify queries with the determined properties is deployed. For example, for a query that is an executable file, the classifier that has been trained to classify executable files would be deployed. However, this approach is vulnerable to data spoofing, where the adversary may disguise the properties of the query to misguide the classifier. In addition, this approach does not account for the costs, penalties and rewards for the classifier and adversary.
Another current approach to classification provides multiple classifiers for classification. However, this approach requires more time and computation as all classifiers are used for classifying a query. The outputs of the classifiers are ranked and the top ranked classifier is selected for the final output. Another approach selects a subset of classifiers that are most distinct from each other for the classification. This approach focuses on the properties of the classifiers rather than the properties of the query. However, this approach leads to a less accurate classification system.
It is known that there is not a single classifier that can be optimal for all classification tasks and a combination of classifiers may outperform the best individual classifier. Accordingly, embodiments are described herein that improve classification in adversarial machine learning without deteriorating classification accuracy by using a learner that includes an ensemble of classifiers. Each classifier in the ensemble may be hardened separately against adversarial attacks of different strength to improve the training and operation of the learner. The classifier ensemble may be trained and operated at a lower budget while maintaining similar classification accuracy as state-of-the-art classifiers. Thus, these embodiments enable data classification to be performed faster and more efficient, thereby improving the underlying computing system that implement the embodiments in terms of computing speed and resources.
A challenge with using multiple classifiers is to determine the appropriate pairing between a query, with either clean or adversarial data of different attack or adversarial strengths, sent to the learner and a commensurate classifier from the ensemble of classifiers to handle the query most effectively, e.g., with the least likelihood of classification errors while aligning classifier hardening costs with the adversarial strength. Another challenge is that the classifier is not aware whether the query is from an attacker and/or its adversarial strength. The techniques described herein address these challenges with a game theoretic framework called a repeated Bayesian sequential game with self play between a learner and an adversary model. The outcome of the game is a strategic selection of an appropriate classifier for the learner. The Bayesian framework enables the realization of several practical aspects of the learner-attacker interactions, including uncertainty of the learner about the strengths of different attacks, respective costs to the learner and the attacker to train the classifier and to generate adversarial examples, and rewards and penalties to the attacker and the learner for successes in the attacks and defenses, respectively. This Bayesian framework also enables asymmetric interactions between the learner and its clients for both non-competitive (legitimate client with clean queries) and competitive (attackers with adversarial queries) settings.
The classification techniques provided herein send a query to only one classifier that is intelligently selected based on both properties of the classifier and the query. Thus, these techniques enable accurate automatic classification of information in a time efficient and more accurate manner at a lower budget by avoiding the redundant activation of all available classifiers to determine temporary outputs and rank them. By training the classifiers separately, less expenditure per classifier is incurred, in terms of training data, costs, and time. This also yields an individual classifier that is lower in complexity and size, resulting in lower costs to deploy after training. The time required by a classifier to classify a query depends on the size and complexity of the classifier as well as the time required to load the classifier into the computer's working memory. A classifier that is smaller in size and less complex may reduce the time required for classification.
In addition, the classification techniques provided herein are data agnostic and adaptable across data modalities, i.e., the core functionality may be generalized across different types of data (e.g., image, text) for multiple applications. Some applications include cyber security where downloaded software may be categorized as malware or benign software, image classification where objects in pictures, still or moving, may be classified as missiles or friendly air traffic, or social media where content in user postings may be determined to be valid or fake news.
In an example embodiment, a classifier ensemble is able to reduce the build and operation costs (e.g., data processing, time, computing resources) by 30-40% while maintaining similar classification accuracy as a current state-of-the art classifier. The classifier ensemble implements the Bayesian framework to classify textual data from online customer product reviews as positive or negative. This is merely an example, other implementations are possible with other types of data having similar or different margins of improvement.
A classification task may be automated using a computer component called a classification engine or a classifier that is built using an artificial intelligence (AI) based technique called supervised machine learning. Each piece of information that is provided to the classifier is called a data instance. Depending on its features, a data instance belongs to one of different possible categories. The category of a data instance is usually ascertained by a human and is called a ground truth label. Given a data instance whose ground truth label is not provided, the objective of the classifier is to correctly determine the category to which a data instance belongs and assign that category as the predicted label for the data instance. To achieve this, a technique called classifier training is used where the classifier is provided with several labeled data instances, called training data, belonging to different categories. During training, the classifier is periodically provided with unlabeled data instances, called test data. The training is considered complete when the number of errors the classifier makes in classifying test data is substantially low. The corresponding metric is called the classifier's accuracy. Post training, the classifier may be deployed for operation, where it is given unlabeled data instances called queries, and it is configured to output the category of the query data instance, with a similar degree of accuracy as achieved for the test data instances.
Embodiments for data classification may be implemented in a variety of environments. For example,
Learner 102 is configured to use classifier selector 104 to select an appropriate classifier from the classifier ensemble, such as a classifier that is commensurate with adversary 112, to classify query 114 and provide an output 116. Each classifier in the classifier ensemble may have a classifier type that corresponds to a type or strength of adversarial data with which the classifier has been trained. This strength may also be referred to herein as classifier or classification strength. For example, in an embodiment, first classifier 106 may be trained with clean data, which is essentially adversarial data of strength 0, second classifier 108 may be trained with adversarial data of strength 1, and third classifier 110 may be trained with adversarial data of strength 2. Thus, first classifier 106, second classifier 108, and third classifier 110 may respectively have a first, second and third classification strength. While three classifiers are depicted in
Adversary 112 may be one of multiple types, each adversary type corresponding to a different adversarial data type or strength. For example, adversary 112 may have a first adversary type 118 (θ0) corresponding to an adversarial strength 0, a second adversary type 120 (θ1) corresponding to strength 1, or a third adversary type 122 (θ2) corresponding to strength 2. The adversarial strength is used to perturb the data in the query. For example, adversary type θ0 corresponding to strength 0 may be clean data that has not been perturbed. Adversary type θ1 corresponding to strength 1 may be perturbed up to a first amount or first percentage, according to a formula or equation, or by one character, etc. Third adversary type 112 corresponding to strength 2 may be perturbed up to a second amount or percentage, according to a formula or equation, or by two characters, etc. The strength of the perturbation or adversary is not known by learner 102 from the outset, and learner 102 is configured to determine this information using classifier selector 104. In an embodiment, learner 102 determines the adversary strength by modeling the uncertainty of the perturbation strength as a type of adversary. For example, adversary type θi denotes that adversary 112 used perturbation strength i inside the query. Learner 102 also does not know θi, but learner 102 may estimate a probability distribution over the set of types Θ={θi} through self play. A probability distribution is a mathematical function that gives the probabilities of occurrence of different possible outcomes (e.g., adversary types) for an experiment. For example, learner 102 may estimate that the most likely adversary type is type 2 (θ2) from three possible types {θ0, θ1, θ2} with the possible probability distribution over types of (0.1, 0.1, 0.8).
In an embodiment, for a given query (e.g., query 114 in
In an embodiment, learner 102 (L) may receive data instances as queries (e.g., query 114 in
The Bayesian game components are as follows. Xev may be denoted as a set of queries, and Xev may be referred to as a clean query set. X=(x, y), X∈Xev denotes a query data instance, where x={x1, x2, . . . } is its set of attributes or features and y∈{0,1} is its ground truth label.
Adversary
Adversary A may send either clean or adversarial data as queries; the latter is generated by perturbing clean data using a perturbation function δ:x→x. A may use different perturbation functions δi, i=0, 1, 2, . . . , where i denotes the strength of the perturbation. For example, perturbation strength may correspond to the number of features of x that are modified to convert it into an adversarial instance. δi(x) denotes the adversarial data generated with perturbation strength i and δi+1 is a stronger perturbation than δi. Perturbing x does not change its ground truth label, y. For an example, a clean textual data instance, “The item we got was broken; the online seller did not give a refund,” may become “The item we got was broken; the online seller did lot give a refund” with perturbation strength 1, and “The item we got was broken; the online seller did lots, gave a refill” with perturbation strength 2. For notational convenience, clean data may be referred to as x=δo(x). An action for A is to select a δi, use it to convert clean data instance x into an adversarial instance δi(x), and send the adversarial instance to L.
Learner
Learner L may receive a query data instance
Utilities
After classifying a query, L and A may receive utilities. Utilities are numeric values assigned by each player to the outcomes from the players' joint actions in a game. Each player may then preferentially rank its joint outcomes and select a suitable action such as a utility maximizing action. A player's utility for a joint action is given by the difference between the value that it gets when the query is classified, either correctly or incorrectly, minus the cost it incurs when this happens. L's utility for classifier Lj with query data
U
L(Lj,
where, P(Lj((
Using equation 1a, L's utility for strategy SL, which is a probability distribution over different Lj-s, may be written as UL(SL,
where P(Lj((
In adversarial settings, it may be assumed that the adversary is aware of the learner's prediction model, e.g., model parameters of the learner's classifier. In an embodiment, adversary 112 is assumed to know learner 102's strategy SL. A's utility for query data
where P(Lj(
Bayesian Sequential Game
Using the above actions and utility functions, a Bayesian sequential game between L and A may be represented as Γ=[N, Ac, U, ΘA, p( )], where N={L, A} is the set of players, Ac=AcL×ΘA is the set of joint action-types of L and A (given in equations 1 and 2), ΘA and p( ) are the set of probability distribution over those types, as defined before.
In an embodiment, L is configured to determine a suitable strategy SL*, and A is configured to determine a suitable type θi*. To determine the strategy (using equation 1b), L may need to determine the value of p( ), the probability distribution over A's types. This may be accomplished with a technique called self play with repeated plays of the Bayesian game, which is referred to as a repeated Bayesian sequential game (RBSG).
Repeated Bayesian Sequential Game and Self Play
The objective of L is to determine a suitable strategy SL* to play against A that would improve its expected utility by deploying an appropriate classifier that has been hardened commensurate to the strength of the perturbation used by A. In an embodiment, Lis configured to use self play, where L and A play the Bayesian sequential game, Γ, repeatedly. That is, L is configured to build adversary models of different strengths for A, then L simulates sending queries of different strengths from the adversary models to itself. For the sake of legibility, the notation of A is used herein to denote L's self play adversary. The repeated interactions between L and A may be represented as a game tree with sequential moves between them. A node in the game tree denotes a player's turn to make a move. In a move, a player selects an action from its action set. For example, the action set for L may include selecting a classifier and using it to classify the incoming query instance, and the action set for A may include selecting an adversary type having an associated perturbation strength, converting clean data to perturbed data having the selected perturbation strength, and sending the perturbed data to L. L and A make alternate moves with L moving first, and the best strategy is selected at each turn. A pair of moves by L and A corresponds to an instance of the Bayesian sequential game, Γ, that may be implemented as algorithm 1 below.
Game Play
As shown in algorithm 1 above, L moves by selecting a strategy, SL*. A then selects an adversarial data type (perturbation strength) θj*˜p( ) while observing SL*. With the selected θj*, A may generate q adversarial queries by perturbing q clean data instances from Xev, and sends each adversarial query,
Determining Strategy SL*
To calculate SL*, L may generate different paths in the game tree to discover utilities received from different sequences of moves. To systematically explore the game tree, L may use a MCTS-like algorithm such as the TreeTraverse shown in algorithms 2 and 3. TreeTraverse works by generating a sequence of moves or game plays corresponding to a path in the game tree up to a finite cutoff depth, h. L and A's utilities from their moves may be recorded along the path and once the bottom-most level is reached, the utilities may be updated along the path upwards toward the root. In this way, moves that could lead to high utility may be identified by each player.
The key aspects of MCTS are to balance exploration and exploitation while traversing the game tree by using a heuristic function called selectBestChild (algorithm 2, line 4), and performing an operation called rollout to rapidly traverse unexplored parts of the game tree by selecting actions for each player up to the game tree's cutoff depth, h (algorithm 3). In the TreeTraverse algorithm, a heuristic function may be used for selectBestChild. While two techniques, Bayes Nash equilibrium and upper confidence bound, are described below, other techniques may be used.
In Bayes Nash equilibrium, each player may select a best response strategy that maximizes its utilities, given the possible strategies of its opponent. The strategies for L and A calculated using Bayes Nash equilibrium are given by:
where uA is given by equation 2 and EUL is given by equation 1b with A's actual type distribution p(θi) replaced by L's belief distribution {circumflex over (p)}(θi).
Upper confidence bound is a bandit-based technique that weighs the expected utility of a move with the number of times it has been visited, so that previously unexplored or less-explored actions at a move are also tried. The upper confidence bound technique uses the following equation to calculate SL* and θi*.
Here, C is a constant, Parvisit is the number of times the parent node of the current node was visited, and Lj,visit and θi,visit are the number of times the current node has been visited for L and A, respectively.
Updating Belief of A's Type Distribution
The TreeTraverse algorithm explores a sequence of moves along any single path from the root of the game tree up to the cutoff depth h. This sequence is referred to herein as a trial for the RBSG. To update its belief distribution {circumflex over (p)}, L uses multiple trials and, at the end of each trial, L uses an update strategy to update {circumflex over (p)}( ). Two example update strategies are described herein, although other strategies may be used.
One example update strategy is fictitious play, in which the probability of type θi is the fraction of times it was played following action Lj, as given by the following update rule.
Another example update strategy is Bayesian update using Bayes' rule. The Bayesian update of θi calculates the conditional probability of selecting θi when it followed Lj, given by the following equation,
where P(Lj|θi) is the fraction of times Lj was played following θi, P(Lj) is known to L and the denominator is a normalization term. The updated probability estimate may then be used by L to calculate expected utilities, using equations 3 and 4, for its actions to be more accurate against A in future trials. An example of the self play algorithm is as follows.
Further operation aspects of system 100 of
Flowchart 200 is a method of operating a learner that maintains a classifier ensemble for data classification, according to an example embodiment. Flowchart 200 begins at step 202. At step 202, a query is received for classification from an adversary having an adversary type that corresponds to an adversarial strength that is used to perturb the data in the query, the adversary type not being directly known by the learner. In an embodiment, the query (e.g., query 114 of
In step 204, a strength of the adversary is determined based on a predicted adversary type distribution. In an embodiment, the strength of the adversary (e.g., adversary 112 shown in
In step 206, a classifier is selected from the classifier ensemble, the classifier having been trained with at least one of clean data or adversarial data, the classifier having a classification strength that is commensurate with the determined strength of the adversary. In an embodiment, a classifier (e.g., one of first classifier 106, second classifier 108, or third classifier 110) may be selected from the classifier ensemble by classifier selector 104, which may use any means to perform the selection, such as the Bayes Nash equilibrium (equation 3) or the upper confidence bound (equation 4) technique. In an embodiment, classifier selector 104 may select a classifier based on a current belief of adversary type and adversary type distribution. In selecting a classifier that is aligned with the adversary (e.g., adversary 112), classifier selector 104 may balance multiple factors in its calculations, such as the cost spent to train the classifier, the cost to deploy (e.g., time and memory required to load the classifier into a computer's memory, or operating resources) the classifier, and losses if the classifier makes a mistake. Thus, classifier selector 104 may balance one or more factors of training the classifier, operating the classifier, and loss due to a mistaken classification by the classifier in selecting the optimal classifier for a given query.
For example, classifier selector 104 may select third classifier 110 to classify query 114 from adversary 112 because third classifier 110 is commensurate with adversary 112. Specifically, the classification strength of third classifier 110 is aligned with the adversarial strength 2 of adversary type θ2. The selection of the appropriate classifier is important as a lower strength classifier may be less capable than a higher strength classifier, but the higher strength classifier is more expensive to train and deploy. That is, a clean-data classifier may be less expensive to train and deploy than a classifier of strength 2, but such clean-data classifier may make more classification errors when given adversarial data of strength 2. In contrast, it may be excessive to deploy the classifier of strength 2 to classify clean data, as the cost for training and deploying such classifier may not be justified for clean data, especially when clean-data classifier might achieve similar performance.
Flowchart 200 ends with step 208, in which the query is classified using the selected classifier. In an embodiment, the query (e.g., query 114 shown in
In an embodiment, after the query is classified, utilities may be assigned to each of learner 102 and adversary 112. The utilities may be determined using equations 1 and 2 above. The utilities and the outcome of the selected classifier (e.g., correct vs. incorrect classification), among other information, may be fed back to classifier selector 104 and used to calculate the adversarial strength of future incoming queries.
Steps 202-208 may be repeated when there are multiple queries. Note that the steps of flowchart 200 may be performed in an order different than shown in
The strength of the adversary may be determined in various ways. For example,
Flowchart 300 begin with step 302. In step 302, model adversaries are created. In an embodiment, classifier selector 104 shown in
In step 304, it is determined whether a predetermined number of trials has been completed. In an embodiment, classifier selector 104 may determine whether a predetermined number of trials has been completed using a trial count, ntrials. The number of trials may depend on one or more factors, such as a size of the game tree, the number of classifier types maintained by learner 102, the number of types adversary 112 has, and the number of rounds up to which the game tree is played. For example, for a learner with 3 classifiers, 4 adversary types, with 3 rounds, 100 trials may be used. As another example, for a learner with 3 classifiers, 4 adversary types, and 5 rounds, 500 trials may be used.
In step 306, based upon determining that the predetermined number of trials has been completed, outputting an estimated adversary type distribution as the predicted adversary type distribution. In an embodiment, classifier selector 104 may determine that the predetermined number (e.g., 100 or 500) of trials has been completed, classifier selector 104 may output an estimated adversary type distribution as the predicted adversary type distribution. The predicted adversary type distribution may be utilized to determine the strength of the adversary in step 204 of flowchart 200.
Based upon determining that the predetermined number of trials has not been completed, one or more rounds of a self play game may be initiated. The following steps may be performed for each round of the self play game.
In step 308, it is determined whether a predetermined number of rounds has been completed. In an embodiment, classifier selector 104 may determine whether the number of rounds has been completed based on a round count, nrounds. In an embodiment, the number of rounds may be the height, h, or depth of the game tree. Moreover, the value of h may be less than the frequency of change of the adversary type distribution, so that classifier selector 104 may correctly determine the adversary type distribution.
In step 310, based upon determining that the number of rounds has been completed, updating the estimated adversary type distribution using a probability distribution update strategy and incrementing a trial count. In an embodiment, classifier selector 104 may be configured to update the estimated adversary type distribution {circumflex over (p)}( ) using a probability distribution update strategy as well as to increment a trial count for keeping track of the number of trials in the Bayesian sequential game. For example, classifier selector 104 may observe the adversary types after each round to update its belief of probability distribution of adversary types based on the observed types. Any suitable update strategy may be utilized by classifier selector 104, such as fictitious play (equation 5) or Bayesian update (equation 6) as described herein. In fictitious play, the estimated adversary type distribution may be updated using a frequency of the number of times each adversary type was selected following each classifier type during the last round (a previous round prior to the current round). With Bayesian update, the estimated adversary type distribution may be updated using Bayes' rule that uses the number of times each classifier type was selected following each adversary type and frequency of using different classifiers during the last round (a previous round prior to the current round).
In step 312, a current classifier is selected from the classifier ensemble using a classifier selection strategy. In an embodiment, classifier selector 104 may select a current classifier (e.g., third classifier shown in
In step 314, a model adversary of a particular type is selected using the estimated adversary type distribution. In an embodiment, classifier selector 104 may select a model adversary of a particular type using the estimated adversary type distribution {circumflex over (p)}( ). For example, given an estimated adversary type distribution (0.05, 0.05, 0.8, 0.1) over four possible types, {θ0, θ1, θ2, θ3}, classifier selector 104 may select adversary type θ2 as the most likely adversary type.
In step 316, one or more simulated queries are submitted from the selected model adversary to the selected current classifier. In an embodiment, classifier selector 104 may submit one or more simulated queries nq from the selected model adversary to the selected current classifier Lj.
In step 318, utility values respectively received by the selected model adversary and the selected current classifier are determined using a pre-defined utility table. In an embodiment, classifier selector 104 may determine utility values respectively received by the selected model adversary and the selected current classifier using a pre-defined utility table, such as the one shown in
In an embodiment, the utility values may be determined by classifier selector 104 instead of or in addition to using a pre-defined utility table using equations 1 and 2. For example, the utility of learner 102 from using a classifier Lj when the adversary type of adversary 112 is θi may be determined based on the product of the probability of selecting classifier Lj, the probability that Lj performs a correct classification when the adversary type is θi, and the value from performing the correct classification when the adversary type is θi less the cost of using classifier Lj (equation 1a). Moreover, the expected utility of learner 102 of using any classifier when the adversary type is θi may be determined as a sum of equation 1a over all classifier types (equation 1b). Furthermore, learner 102 expected utility may be determined as a sum of equation 1b over all adversary types.
In an embodiment, the utility of adversary 112 with adversary type θi when learner 102 uses classifier Lj may be determined based on the product of the probability of learner 102 selecting Lj, the probability that Lj performs an incorrect classification when the adversary type is θi, and the value of the incorrect classification when the adversary type is θi less the cost of using adversary type θi (equation 2). The utility for adversary 112 from using type θi is the sum of equation 2 over all classifier types.
Before learner 102 is deployed for classifying data, the classifier ensemble of learner 102 may be individually trained with clean or perturbed data. Such training process may be implemented in various ways.
Further aspects of classification system 100 of
Flowchart 600 begins at step 602, in which clean data is received. For example, and with reference to system 100 of
In step 604, the clean data is perturbed to generate a first adversarial data type of a plurality of adversarial data types, each adversarial data type corresponding to an adversarial strength. For example, the clean data may be perturbed as shown in
In step 606, a first classifier is trained with the clean data, the first classifier having a first classification strength. In an embodiment, a first classifier (e.g., first classifier 106 shown in
In step 608, a second classifier is trained with the first adversarial data type, the second classifier having a second classification strength. In an embodiment, a second classifier (e.g., second classifier 108 shown in
This training process may continue with all of the classifiers in the classifier ensemble until each classifier is individually trained with a different type of adversarial data. The training process may be considered complete when the number of errors the classifier ensemble makes in classifying test data is substantially low. That is, the classifier ensemble may be ready for deployment with its accuracy level is at a desired level when classifying test data. The goal is for the classifier ensemble, once deployed, to classify unlabeled data instances with a similar degree of accuracy as achieved for the test data instances during training.
An example of clean and perturbed data that may be used for training process is provided.
As shown in
After the classifier ensemble is adequately trained, learner 702 may be deployed to classify data. That is, after training, learner 702 is configured to select the appropriate classifier for a given query by selecting the classifier that is commensurate with the adversary strength of the adversary that generated the query. Thus, for example, the clean data query from the valid client may be best classified by the classifier trained with clean data. Such classifier may generate an output of “negative” with a “high” confidence, which is correct as the ground truth label is negative, indicating that the review is a negative review. The query from the adversary having strength 1 may be most appropriate for the classifier trained with adversarial data of strength 1 to classify and generate the correct output of “negative” with a “high” confidence. Moreover, the query from the adversary having strength 2 may be best suited for the classifier trained with adversarial data of strength 2 to classify and generate the correct output of “negative”, also with a “high” confidence.
As a specific example, in an embodiment, the RBSG with self play based adversarial learning technique for a binary classification task with text data using a Yelp® review polarity data set may be implemented as follows. Each data instance of the Yelp® review data set has either of two labels, 1 (negative) and 2 (positive). The clean training and test sets have 560,000 and 38,000 samples, respectively. The learner (L) is implemented as a Character Convolutional Neural Network (CharCNN) model that includes 5 convolution layers followed by 3 fully connected layers is used. The convolution layers are used to identify character level features to classify text.
For generating adversarial text, a single character gradient based replacement technique may be employed. Given a data instance in the form of a text character string as input to the learner, the method works by classifying the text using the model and calculating the gradient of the loss function for each character in the input text. It then replaces the character with the most negative gradient (most influential on the classifier output) in the txt with the character that has the least positive gradient (least influential on the classifier output). This technique may be used iteratively on a data instance to replace multiple characters in the text and create adversarial text with different attack strengths, e.g., two iterations of the technique yields adversarial text with perturbation strength 2.
This embodiment may be implemented by a computer, for example, one with the following processors, 20 dual core, 2.3 GHz Intel Xeon central processing units with an Nvidia Tesla K40C graphics processing unit. The RBSG self play code may be implemented in Python 2.7, and the CharCNN and adversarial text generation code may be implemented with Tensorflow 1.11 for building and training their deep network models. These components are merely examples, and other hardware and software may be used in other embodiments.
The CharCNN may be trained with clean data first, and then hardened separately with two adversarial training data sets that include 200,000 adversarial training samples of perturbation strengths 1 and 2, respectively. This results in three classifiers for the learner with increasing hardening levels, denoted by L0, L1, L2. The accuracies of these classifiers may be evaluated with 50,000 instances of test data of perturbation strengths 1, 2, and 3 each, as shown in Table 1 below.
Adversary (A) generates queries with either clean data or adversarial data with perturbation strengths 1, 2, and 3, giving ΘA={θ0, θ1, θ2, θ3}. L uses three classifiers, thus AcL={L0, L1, L2}. The different parameters used in this embodiment includes cutoff depth in self play, h=20; number of trials in self play, ntrials=10; batch size for queries sent by A to L, q=10; and constant in upper confidence bound calculation (equation 4), C=2.
To determine whether L, using the self play algorithm, could effectively deploy appropriate classifiers for data of different perturbation strengths, four different type distributions for data generated by A may be created. Each distribution has 98% of one of the four types {θ0, θ1, θ2, θ3}. L may use either Bayes Nash equilibrium (equation 3) or upper confidence bound (equation 4) to select actions in the game tree during self play. The results are shown in Table 2.
As shown in Table 2, both UCB and BNE metric for action selection perform comparably. The accuracy obtained using the RBSG based self play technique on clean and adversarial data perturbed with different perturbation strengths (last column of Table 2) is not degraded and comparable to the best accuracies obtained with the commensurately hardened classifier, L2, when used individually (column 4 of Table 1). The RBSG with self play technique is also able to align adversarial data of different perturbation strengths with the commensurately hardened classifier, as shown by the maximum percentage of each row shown by the maximum percentage of each row in Table 1 corresponding to the classifier hardened with adversarial data of that perturbation strength. Note that with adversarial data of perturbation strength 3, Adv 3, the classifiers are selected almost uniformly. This is because none of the classifiers L0, L1 or L2 were trained with adversarial data of perturbation strength 3. L2, which has the highest individual accuracy for Adv 3 data, is used most frequently, albeit marginally, for Adv 3 data in Table 3. The self play technique also strategically uses L0 and L1, each of which incurs lower costs to deploy than L2. Consequently, the utility obtained by the learner with self play is better than its utility while using individual classifier L2 only.
The convergence of L's belief distribution {circumflex over (p)}( ) to A's actual type distribution p( ) using the fictitious play (equation 5) and Bayesian update probability update (equation 6) strategies. Results may be averaged, e.g., over 10 runs. For each run, p( ) may be selected as a random distribution. The Kullback-Liebler (KL) divergence between {circumflex over (p)}( ) and p( ), given by the following equation, is shown in
In another embodiment, rather than assuming that the learner reveals its classifier to the adversary, the adversary may be able to reverse engineer the learner's classifiers, but it is not aware of the frequency with which the learner deploys them. The adversary may then also build a model of the learner via repeated interactions to determine its perturbation strength strategically.
For the Bayes Nash equilibrium calculation, the players are assumed to always behave rationally. However, the adversary may behave myopically and select a greedy outcome, or adopt suboptimal low and slow strategies to misguide the learner. Accordingly, other techniques, such as regret-based techniques, or safety value exploitability of opponents may be used instead of Bayes Nash equilibrium based strategy selection in an embodiment.
In yet another embodiment, it may be possible to integrate reinforcement learning for the adversarial learning setting to improve classification.
The example embodiments described herein are provided for illustrative purposes and are not limiting. Further structural and operational embodiments, including modifications/alterations, will become apparent to persons skilled in the relevant art(s) from the teachings herein.
Each of learner 102, adversary 112, classifier ensemble 502, and classifier ensemble 702 and flowcharts 200, 300 and 600 may be implemented in hardware, or hardware combined with software or firmware. For example, learner 102, adversary 112, classifier ensemble 502, and classifier ensemble 702, and flowcharts 200, 300 and 600 may be implemented as computer program code/instructions configured to be executed in one or more processors and stored in a computer readable storage medium. Alternatively, learner 102, adversary 112, classifier ensemble 502, and classifier ensemble 702, and flowcharts 200, 300 and 600 may be implemented as hardware logic/electrical circuitry.
The terms “computer program medium,” “computer-readable medium,” and “computer-readable storage medium” are used herein to refer to physical hardware media such as the hard disk associated with a storage device. Such computer-readable storage media are distinguished from and non-overlapping with communication media (do not include communication media). Communication media embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wireless media such as acoustic, RF, infrared and other wireless media, as well as wired media. Embodiments directed to such communication media are separate and non-overlapping with embodiments directed to computer-readable storage media.
In an embodiment, learner 102, adversary 112, classifier ensemble 502, and classifier ensemble 702 may be implemented in a system-on-a-chip (SoC). The SoC may include an integrated circuit that includes one or more of a processor (e.g., a central processing unit (CPU), microcontroller, microprocessor, digital signal processor (DSP), etc.), memory, one or more communication interfaces, and/or further circuits, and may optionally execute received program code and/or include embedded firmware to perform functions.
As shown in
Processor 1002 may be referred to as a processor circuit or a processing unit. Processor 1002 is an electrical and/or optical circuit implemented in one or more physical hardware electrical circuit device elements and/or integrated circuit devices (semiconductor material chips or dies) as a central processing unit (CPU), a microcontroller, a microprocessor, and/or other physical hardware processor circuit. Processor 1002 may execute program code stored in a computer readable medium, such as program code of an operating system, an application program, and other programs.
Memory 1004 includes any system memory, for example, read only memory (ROM) and random access memory (RAM) and may store a basic input/output system (e.g., BIOS).
Storage device 1006 may include any a hard disk drive, a magnetic disk drive, an optical disk drive, a removable optical disk (e.g., CD ROM, DVD ROM), a flash memory card, a digital video disk, RAMs, ROMs, or other hardware storage media. Storage device 1006 and its associated computer readable media provide nonvolatile storage of computer-readable instructions, data structures, program modules and other data for computing device 1000.
A number of program modules may be stored on memory 1004 and/or storage device 1006. These programs include an operating system, an application program, other programs, and program data. Such an application program or other programs may include, for example, computer program logic (e.g., computer program code or instructions) for implementing system components and/or embodiments described herein.
A user may enter commands and information into the computing device 1000 through input devices 1010 such as a keyboard and a pointing device. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, touch screen and/or touch pad, voice recognition system to receive voice input, gesture recognition system to receive gesture input, or the like. These and other input devices are often connected to processor 1002 through a serial port interface that is coupled to bus 1014, but may also be connected by other interfaces, such as a parallel port, game port, or a universal serial bus (USB).
A display 1008 is also connected to bus 1014 via an interface, such as a video adapter. Display 1008 may be external to or incorporated in computing device 1000. Display 1008 may display information, as well as being a user interface for receiving user commands and/or other information (e.g., by touch, finger gestures, virtual keyboard, etc.). In addition to display 1008, computing device 1000 may include other peripheral output devices (not shown) such as speakers and printers.
Computing device 1000 is connected to a network 1012 (e.g., the Internet) through an adaptor or network interface, a modem, or other means for establishing communications over the network.
While various embodiments of the disclosed subject matter have been described above, it should be understood that they have been presented by way of example only, and not limitation. Various modifications and variations are possible without departing from the spirit and scope of the embodiments as defined in the appended claims. Accordingly, the breadth and scope of the disclosed subject matter should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
This application claims the benefit of priority based on U.S. Provisional Patent Application No. 63/011,104 filed Apr. 16, 2020, the entirety of which is incorporated herein by reference.
The United States Government has ownership rights in this invention. Licensing inquiries may be directed to Office of Technology Transfer, US Naval Research Laboratory, Code 1004, Washington, D.C. 20375, USA; +1.202.767.7230; techtran@nrl.navy.mil, referencing Navy Case Number 112566-US2.
Number | Date | Country | |
---|---|---|---|
63011104 | Apr 2020 | US |