This application claims benefit to Israeli Patent Application No. IL 297834, filed on Oct. 31, 2022, which is hereby incorporated by reference herein.
The present invention relates to the field of cyber security. More particularly, the present invention relates to a method for performing assessment of the robustness and resilience of the examined Machine Learning (ML) models to model extraction attacks on AI-based systems.
Machine Learning (ML) generates models that are used for decision making and prediction, and are capable of extracting useful patterns and obtaining insights regarding the data through observing the relationships between different attributes in the data.
ML models can be used both for classification and regression tasks. In classification tasks, a ML model receives a vector of feature values and outputs a mapping of this input vector into a categorical label, thereby assigning the input vector to a class. In regression tasks, the ML model uses the input feature vector to predict a continuous numeric value in a specific range. Examples for ML models can be found in many domains, such as a classifier to predict market stock values in the financial domain, or a classifier for recognizing an image object in image processing.
The ML models are often exposed to the public or to users in the owning organization in the form of “ML-as-a-service”. Such services provide a “prediction Application Programming Interface (API)” with a “query-response access”, in which the user sends a query to the model and receives an output in the form of a prediction or a vector of probability values. This represents the confidence of the model in predicting each possible class label (in machine learning, classification refers to a predictive modeling problem where a class label is predicted for a given example of input data). Such setting is defined as a “black-box” (any artificial intelligence system whose inputs and operations aren't visible to the user or another interested party) setting.
Data scientists induce many ML models in an attempt to solve different Artificial Intelligence (AI) tasks. These tasks often involve extensive and very costly research to achieve the desired performance. The majority of ML methods focus on improving the performance of the created ML models. There are several well-practiced performance measurements for evaluating ML models, such as the accuracy of the learned model, its precision, recall etc. However, these evaluation methods measure the performance of the created models without considering the possible susceptibility of induced ML models to privacy violations, which can be followed by legal consequences.
Privacy in AI-Based Systems
Data owners, such as organizations, are currently obliged to follow the Data Protection Directive (officially Directive 95/46/EC of the European Union) W. First adopted in 1995, this directive regulates the processing of personal data and its movement within the European Union. Recently, the directive has been extended to the General Data Protection Regulation (GDPR), officially enforced on May 2018, presenting increased territorial scope, stricter conditions and broader definitions of sensitive data.
Not only the data itself can reveal private sensitive information, but also the Machine Learning (ML) models that are induced from this data in various AI-based systems. Therefore, model owners are facing a trade-off between the confidentiality of their ML model and providing an appropriate query-response access for users to query the model and receive its outputs. While most of the queries belong to legitimate users, an attacker with this query access and a limited knowledge of the input and output formats of the model can exploit the received outputs for malicious usage, thereby inferring sensitive information that violates the privacy of the entities in the data.
The violation of privacy not only exposes the model owners to legal lawsuits, but also compromises their reputation and integrity. Hence, model owners are advised to take appropriate measures before releasing or deploying any induced ML model in a production environment.
Privacy violations and their legal consequences relate to leakage of sensitive information about the entities (usually user-related data), which might be discovered when using the induced ML model [2] [3]. Therefore, it is required to define measurements for evaluating possible privacy violations aspects in addition to standard performance measurements with respect of the examined ML model [4].
Enhancing the robustness of ML models to privacy violations has a high importance both from the owner's and the user's perspectives. Many companies and service providers try to secure their induced ML model from being replicated or maliciously used by competitors or adversary users. Inducing a good ML model is a challenging task, which incorporates the collection of labeled data, designing the learning algorithm and carrying multiple experiments to validate its effectiveness. All these actions require many financial resources which model owners are obliged to invest.
The induced ML model can be susceptible to an extraction attack [5] [6] [7], where an attacker with a limited query-response access to the induced model can create a substitute model that mimics the performance of the original model and use it for his own purposes as a replica. Several implications are followed by this kind of attack. First, the attacker can damage the reputation of the attacked model owner. Second, by replicating the original model product, the attacker causes the model owner to lose his business advantage, possibly inflicting serious financial losses. Third, the attacker can infer sensitive information about the data subjects from using the replicated model, while causing violation of the General Data Protection Regulation (GDPR) [1]. Also, the replicated model can give the attacker the ability to carry additional privacy violating attacks in other domains [8].
In a model extraction attack, an attacker constructs a substitute model with predictive performance on validation data that is similar to the original ML model. The attacker attempts to mimic the performance of the original ML model by examining and learning the behavior of the original model.
Most of security and privacy attacks on ML models are carried in a white-box (a white box machine learning model allows humans to easily interpret how it was able to produce its output and draw its conclusions, thereby giving us insight into the algorithm's inner workings) setting, in which the adversary has complete access to the model including its structure and meta-parameters. A more challenging setting is a gray-box setting, in which the adversary has partial information regarding the induced ML model. Black-box attacks, in which the adversary has only access to the output of the model given the input record, are less common and considered more sophisticated.
In an embodiment, the present disclosure provides a method for performing an assessment of the robustness and resilience of an examined original ML model against model extraction attacks, comprising: training, by a computerized device having at least one processor, multiple candidate models MC with the external dataset D for each of the specified candidate learning algorithms a in Alg, where each candidate substitute model is trained on a subset of D corresponding to the evaluated ith query limit of the query budget constraint Q; evaluating, by the computerized device, the performance of each substitute model MC according to different evaluation methods ϵEvaluation; and calculating, by the computerized device, the robustness of each substitute model, where smaller difference or high agreement/similarity rate between the performance of the original model and the substitute model indicates that the original and substitute models are similar to each other, and that the substitute model having the highest performance can mimic the behavior of the original model and can be used as a replica of the original model.
Subject matter of the present disclosure will be described in even greater detail below based on the exemplary figures. All features described and/or illustrated herein can be used alone or combined in different combinations. The features and advantages of various embodiments will become apparent by reading the following detailed description with reference to the attached drawings, which illustrate the following:
In an embodiment, the present invention provides a method for performing an assessment of the robustness and resilience of an examined ML model to model extraction attacks.
In an embodiment, the present invention provides a method for performing an assessment of the robustness and resilience of an examined ML model to a full black-box attack.
Advantages of the invention will become apparent as the description proceeds.
A method for performing an assessment of the robustness and resilience of an examined original ML model against model extraction attacks, comprising:
The robustness of the original model may correspond to the candidate substitute model having the closest performance to that of the original target model or to the candidate substitute model having the smallest difference with respect to the tested evaluation metrics.
Whenever a query limit L is provided, the final returned robustness may be the one that corresponds to L, otherwise the returned robustness is the one that of the best candidate model.
In one aspect, the algorithm receives as the input:
The algorithm may further receive the query budget Q of an attacker, according to which the attacker will be able to query the original model and receive its prediction vector.
The method may further comprise the step of calculating the robustness of the original target model to extraction attacks under a query constraint L.
The query constraint L may be smaller than that provided by the query budget.
The external dataset D may be taken from the same distribution as the original test set.
An evaluation method may be to calculate the performance gap and setting weights, to calculate a weighted average.
A system for performing an assessment of the robustness and resilience of an examined original ML model against model extraction attacks, comprising a computerized device having at least one processor, which is adapted to:
The present invention provides a method for performing an assessment of the robustness and resilience of an examined ML model to model extraction attacks. At the first stage, the method, implemented by a computerized device with at least one processor, examines the feasibility of an extraction attack by inducing multiple candidate substitute models. At the second stage, the most matching substitute model to the original model is selected, according to different evaluation metrics.
The original model is referred to as either the attacked model, the original model, the target model, the base model or the original target model. The model which is built by the attacker (an adversary) to mimic the original model will be referred to as either the substitute model, the mimicked model or the stolen model.
The present invention simulates a realistic scenario due to the fact that a practical “black-box” scenario is considered, where the attacker does not have any knowledge of the target model and its internal parameters and configurations (except for the shape and format of its input and output). It is assumed that the attacker does not have access to the training data, which is used to induce the original ML model. It is also assumed that the attacker has a “query-budget”—the maximum allowed number of queries that he can send to the original ML model and receive its responses. This assumption is enhanced by the policy of the original model owner, by often charging a fee per each sent query. In addition, although the querying entity is charged for its queries, most companies might restrict the number of queries to all the users (including the attacker). This constraint affects the success of the extraction attack and the performance of the generated substitute model. It is also assumed that the attacker receives output from the ML model in the form of a prediction vector, including the confidence probability for each possible class label. The attacker can also receive the final predicted class label (but that is often unnecessary since he can choose the class label with the highest probability in the prediction vector).
In a testing environment, the adversary is referred to as an “attacker” (but a real attacker does not exist). The present invention performs an assessment of the possibility of the original ML model to be attacked by an adversary (an “attacker”) in a model extraction attack. This is done by examining the possibility of an adversary to carry out a successful attack.
At the first phase performed by a computerized device with at least one processor, a list of candidate substitute algorithms is assembled. These candidate substitute algorithms will be used to induce a ML model which attempts to mimic the performance of the original target model. In addition, the attacker obtains data from an external source, referred to as external data. For the attack to succeed, it is preferable for the distribution of the external data to be similar to the distribution of the original data, which was used to train the original target model. The obtained external data is partially used by a computerized device with at least one processor, for training the model that will be used to attack the original model (the model that will be used to attack the original model is defined as the substitute model) and in the testing environment, for testing and evaluating the performance of the substitute model relatively to that of the original target model
At the second (training) phase performed by a computerized device with at least one processor, each of the candidate ML models is trained and induced according to the substitute learning algorithms, based on the external data. A list of different learning algorithms is used, since it is impossible to know which learning algorithm the attacker will choose when performing a real attack. Therefore, the possibility to perform this attack is examined, based on different candidate learning algorithms.
At the third phase performed by a computerized device with at least one processor, the degree of success of the mimicked model is evaluated by evaluating the performance of each induced substitute model according to different evaluation metrics relatively to the target original model.
At the fourth phase performed by a computerized device with at least one processor, the substitute model which achieves the best performance relatively to the target model is selected to be the mimicked model, i.e., the model with the highest value for the defined performance metric, thereby causing the lowest examined performance gap between the target model and its substitute or the highest agreement/similarity between the target model and its substitute.
At the fifth phase performed by a computerized device with at least one processor, the resilience of the target model is calculated according to the chosen substitute model, and returned to the data scientist.
In addition, existing evaluation methods may be adjusted or alternatively, new evaluation methods may be added. For example, a new evaluation method is to calculate the performance gap, i.e. the difference in the absolute value between the F1 score of the original model to the F1 score of the substitute model (or any other measurement gap, such as accuracy gap). In case the tester decides that one method should have more significance than another, the tester can set weights accordingly, and calculate a weighted average.
In the pseudo-code of
The final robustness score of the model extraction test is considered as the lowest achieved robustness among all the evaluated candidate model. The minimal robustness score is chosen, since it represents the highest level of vulnerability of the attacked ML model (worst-case scenario).
The algorithm of
As various embodiments and examples have been described and illustrated, it should be understood that variations will be apparent to one skilled in the art without departing from the principles herein. Accordingly, the invention is not to be limited to the specific embodiments described and illustrated in the drawings.
While subject matter of the present disclosure has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive. Any statement made herein characterizing the invention is also to be considered illustrative or exemplary and not restrictive as the invention is defined by the claims. It will be understood that changes and modifications may be made, by those of ordinary skill in the art, within the scope of the following claims, which may include any combination of features from different embodiments described above.
The terms used in the claims should be construed to have the broadest reasonable interpretation consistent with the foregoing description. For example, the use of the article “a” or “the” in introducing an element should not be interpreted as being exclusive of a plurality of elements. Likewise, the recitation of “or” should be interpreted as being inclusive, such that the recitation of “A or B” is not exclusive of “A and B,” unless it is clear from the context or the foregoing description that only one of A and B is intended. Further, the recitation of “at least one of A, B and C” should be interpreted as one or more of a group of elements consisting of A, B and C, and should not be interpreted as requiring at least one of each of the listed elements A, B and C, regardless of whether A, B and C are related as categories or otherwise. Moreover, the recitation of “A, B and/or C” or “at least one of A, B or C” should be interpreted as including any singular entity from the listed elements, e.g., A, any subset from the listed elements, e.g., A and B, or the entire list of elements A, B and C.
Number | Date | Country | Kind |
---|---|---|---|
297834 | Oct 2022 | IL | national |