Training artificial-intelligence (AI) or, synonymously, machine-learning (ML) models on private data presents a risk of inferential information disclosure, that is, the leakage of private training data used to create the model to users that have access to the trained model, e.g., via the model weights or an inference interface to the model. Attempts to glean private data from the model are generally referred to as “privacy attacks” or “inference attacks.” To minimize vulnerability to such attacks, it is standard practice to train the model using privacy-preserving algorithms, such as differentially-private stochastic gradient descent (DP-SGD). Further, it is desirable to quantify the remaining “privacy loss,” or conversely, the achieved “security” against inference attacks to be able to provide privacy guarantees.
Described herein, with reference to the accompanying drawings, are systems and methods for quantifying the security of a machine-learning model against inference attacks in terms of Bayes security.
Described herein are computer-implemented systems and methods for machine-learning model training and deployment that use closed-form bounds of Bayes security against record-level inference attacks to facilitate better protecting against these attacks by improving the trade-off between model accuracy and the achieved level of privacy. Various embodiments, more specifically, quantify Bayes security against “membership inference” and “attribute inference” attacks. In membership inference attacks, the attacker attempts to infer whether specific data records, or groups of records, were part of the model's training data. In attribute inference attacks, the attacker attempts to infer sensitive attributes of one or more records that were part of the training data. The disclosed approach to obtaining Bayes security values is applicable to any type of machine learning models trained using DP-SGD, including, but not limited to, linear support vector machines, logistic regression models, and neural networks.
DP-SGD is the most widely-used privacy-preserving learning algorithm for neural networks. It enables the model creator to quantify the privacy loss in terms of two privacy parameters, ε and δ, which depend on various training hyperparameters chosen by the model creator, such as: the number of training iterations (also “training steps”)(T); the sampling rate (p), corresponding to the expected or average number of data records used per training step (L) divided by the total number of data records in the training data (N); the gradient norm clipping parameter (C); and the DP-SGD noise multiplier (σ). However, the (ε, δ) values of a trained model capture the generic risk of leaking any information about individual records, and are usually applied in a threat-agnostic manner.
To quantify the resilience of a trained model against specific types of attacks, such as membership inference and attribute inference, it is desirable to evaluate metrics such as “attacker advantage” (or simply “advantage”) or “Bayes security.” The attacker advantage quantifies the advantage gained by the attacker, for a specific attack, when given access to the trained model; more precisely, it is defined as the difference between the probability of success of an attacker with access to the model and the probability of success of the same attacker without access to the model. The advantage metric is generally normalized to a value in the range [0,1], where 0 implies no advantage and 1 implies maximal advantage. The Bayes security metric is defined as 1 minus the maximal advantage, and thus also takes values in the range [0,1], where 1 implies that the model is perfectly secure (i.e., exhibits no information leakage for this kind of attack) and smaller values indicate some leakage. For membership inference attacks, the attacker advantage and Bayes security can be computed from (ε, δ) values using existing techniques; however, conventional methods for computing (ε, δ) are computationally intensive. For attribute inference attacks, there are no existing techniques for quantifying attacker advantage and Bayes security, even given (ε, δ) values.
The approach disclosed herein allows determining the attacker advantage or, equivalently, Bayes security directly, without using (ε, δ) values. It provides a general closed-form expression for a lower bound of Bayes security, which reduces for different types of inference attacks to different more specific expressions for the lower bound of Bayes security. The expressions constitute lower bounds because they are derived under broad, guaranteeable assumptions, with conditions in practice often being more restrictive on attackers and thus resulting in higher actual levels of Bayes security. The closed-form expressions are functions of training hyperparameters, and include a value of Bayes security that approximates the true lower bound of Bayes security along with an error term. Beneficially, the error term becomes smaller for values of the hyperparameters that ensure better privacy, and therefore, the approximate lower bound of Bayes security can be used to achieve a desired target level of Bayes security.
In various embodiments, protection against membership inference attacks is provided by selecting, for the training of a machine-learning model with DP-SGD, a combination of training hyperparameters—for example, the number of training steps, the sampling rate, and the noise multiplier—that achieves (e., meets or exceeds) a target value of Bayes security within quantifiable error margins. The model creator may, for example, have access to an interactive calculator tool that computes approximate lower bounds of Bayes security for different possible choices of the hyperparameters, enabling the determination, e.g., by iterative adjustments, of a suitable combination of hyperparameter values that achieves the target Bayes security (up to an error term). Alternatively, the model creator may specify the target value of Bayes security along with values of two of the three training hyperparameters, from which the third hyperparameter value can be straightforwardly calculated. Either way, as compared with conventional methods for obtaining estimated bounds and/or values of Bayes security indirectly based on (ε, δ), the direct computation of a lower bound of Bayes security in accordance herewith can, for suitable choices of the hyperparameters, provide the same degree of accuracy (as measured in terms of the error term) at a computational cost that is significantly lower (e.g., orders of magnitudes lower as compared with some prior-art methods). As a result, the process of choosing suitable hyperparameters, which generally involves either an expansive grid search over many combinations of parameter values or a limited search relying on expert knowledge, is significantly sped up using the disclosed approach to computing Bayes security directly.
In various embodiments, protection against attribute inference attacks is achieved by training the machine-learning model with a modified DP-SGD algorithm that involves computing gradient bounds with respect to the attribute or attributes of interest during training. The gradient bounds, along with values of training hyperparameters, (such as sampling rate, noise multiplier, gradient norm clipping parameter, and number of training steps) then flow into the computation of a value of Bayes security, such as an approximate lower bound, for Bayes security updated after each training step. Based on the computed value(s) of Bayes security, a privacy-preserving action may be taken. For example, since continued training generally increases the accuracy of the trained model at the cost of decreasing security against inference attacks, model training may be stopped once the value of Bayes security falls below a specified threshold. Alternatively, the model may be trained for a specified number of steps or until a given model accuracy or other model performance has been achieved, and deployment of the trained model may then be contingent upon Bayes security exceeding a target value, or a model may be selected among multiple models trained for a given task, e.g., differing in their training hyperparameters or the datasets on which they have been trained, to maximize Bayes security. Values of the value of Bayes security computed during training may also be used to update the training hyperparameters to increase Bayes security.
From bounds for Bayes security determined using the approach discussed herein, it has been observed that DP-SGD tends to be significantly more secure against attribute inference than against membership inference attacks. Accordingly, if a practical application requires security against attribute inference rather than membership inference, the ability to quantify Bayes security against attribute inference, as achieved with the modified DP-SGD algorithm and closed-form bound discussed herein, allows achieving better utility of the trained machine-learning model while maintaining acceptable privacy. A further benefit of the disclosed approach, as applied to attribute inference, is that the computed value of Bayes security is data-dependent. Data-dependent guarantees are less conservative, and allow practitioners to choose DP-SGD hyperparameters to protect against actual, rather than merely abstract, threats.
The system 100 includes the machine-learning model 102 itself, as well as a learning algorithm 104 and set of training data 106 used to train the model 102. Training involves determining the free parameters, or “weights” 108, of the model 102 so as to optimize a loss function or objective function that is defined to capture how well the model 102 fits the training data 106. For example, in supervised learning, the training data records are input-output pairs, and the weights 108 are adjusted to minimize the difference, aggregated across sampled data records, between the output computed by the model 102 from the input of a given input-output pair and the corresponding “ground-truth” output (also often referred to as the “label”) of that data pair; the loss function is some function of the aggregated differences. In unsupervised learning, where the training data is label-free, the objective function captures some task-specific measure of the goodness of the fit. For clustering tasks, for example, the weights may be optimized to minimize the variability of data points within clusters and maximize the distance between clusters. In accordance herewith, the learning algorithm 104 is a DP-SGD algorithm, explained below with respect to
The learning algorithm 104 includes a number of training hyperparameters 110 that can be set, e.g., via a suitable model creator user interface 112, by a (human) model creator 114. These hyperparameters may include, e.g.: the number of training steps, that is, iterations in the iterative adjustment of the weights 108; the sampling rate, that is, the probability with which any given data record of the training dataset is included in the subset used in a given training step, which is equivalent to the ratio of the expected or average number of training data records used per training step (also referred to as the expected “batch size”) to the total number of data records in the training dataset; the learning rate, which determines the magnitude of weight adjustments in the training iterations; a gradient norm clipping parameter to which gradients calculated during training are scaled if their norm exceeds the parameter; parameters of the objective function (such as a regularization constant or penalty term); and in DP-SGD, a noise multiplier. Some of these parameters affect the extent to which the trained model will be vulnerable to inference attacks, and should therefore be selected carefully to optimize data privacy. To this end, the system 100 further includes a Bayes security calculator 116 that represents, for one or more types of inference attacks, relationships between Bayes security (or, more precisely, an approximate lower bound on Bayes security) and training hyperparameters along with, in some cases, other quantities computed during the training process. By accessing the Bayes security calculator 116, e.g., via the model creator user interface 112, the model creator 114 can obtain values (e.g., approximate lower bounds) for Bayes security before, during, or after training (depending on the specific embodiment) and, based thereon, take or cause appropriate privacy-preserving actions, as explained in detail below with reference to
N and a loss function
that is based on model weights θ, and its hyperparameters include learning rates ηt, a noise multiplier σ, a gradient norm clipping parameter C, an expected batch size L, and a number of training steps T. The DP-SGD algorithm trains the model as follows: for each training step t=0, . . . , T−1, on average L records are sampled from the training dataset, the gradients of the loss function (θt, zi) with respect to the current model weights θt are computed, the per-sample gradient norms are clipped to C, isotropic Gaussian noise with standard deviation σC is added to the sum of the clipped gradient norms, and the resulting sum {tilde over (g)}t scaled by 1/L is used to update the model weights θt according to the learning rate ηt, obtaining the new model weights θt+1.
Training a model with DP-SGD facilitates providing privacy guarantees in terms of upper bounds on the privacy parameters (ε, δ). The parameters relate to the probability that an attacker can distinguish between two neighboring training datasets (e.g., datasets differing in only one data record) based on the output of the learning algorithm, namely, the model weights (including intermediate weights during training). At an intuitive level, ε measures the amount of information leaked by the output of the learning algorithm, and δ measures the probability of a privacy failure in the sense that the learning algorithm produces an output that leaks information in excess of ε. Values of (ε, δ) can be determined with an “accounting” mechanism that tracks the privacy cost associated with each access to the training data, accumulated over the course of model training. Accounting mechanisms, however, are not only computationally expensive, but also tend to be error-prone, e.g., due to implementation difficulties or numerical errors. In addition, the privacy parameters (ε, δ) are generally threat-agnostic, but in practice, there are cases where certain types of threats give rise to privacy concerns while others do not. For example, the fact that a person participated in the census dataset is not privacy-sensitive per se, but if an attacker were able to infer the values of sensitive attributes such as race or age, that would be a privacy violation. There appears no principled way to render (ε, δ) values interpretable without considering specific threats. While prior efforts have been directed at linking (ε, δ) values to threat-specific privacy metrics, the approach discussed herein evaluates machine-learning models directly against the risk of certain record-level inference attacks, in particular membership and attribute inference attacks.
In various embodiments, the risk of specific types of inference attacks is measured in terms of Bayes security, which in turn is based on the notion of attacker advantage. The metric of attacker advantage quantifies how much more likely an attacker is to succeed when given access to the trained model, as compared with not having this access. To formally define attacker advantage, suppose the attacker's goal is to guess some secret information, measured by a random variable S, and the attacker has some prior knowledge about S, mathematically captured in a probability distribution It over the range (meaning, all possible values) of S. For example, in a membership inference attack, the attacker may know that the model was trained on a dataset D in conjunction with an individual data record, termed “challenge point,” Z*S sampled from a set of challenge points {Z*i}i=1M according to a prior distribution π on this set, and the attacker's goal is to guess which of the challenge points was used for training the model; in this case, the random variable S corresponds to the index (between 1 and M) of the challenge point. In an attribute inference attack, the attacker may know that a model was trained on a dataset D in conjunction with a challenge point z* that can be represented as the concatenation of two vectors φ and s, where s corresponds to one or more sensitive attributes s∈ of the data record sampled according to a prior distribution π, and the attacker's goal is to infer the value(s) of s given access to the remainder φ of the data record; in this case, the random variable corresponds to the attribute(s) s. Both cases can be unified in a more general framework for record-level property inference attacks, in which the attacker aims to infer some property f(z*) about some data record z* (the challenge point), with f(z*) being, for membership inference, an index to the set of challenge points such that z*=z*f(z*), and for attribute inference, f(z*)=s. The function f(z*) may be a bijection, meaning that there is exactly one challenge point z*=f−1(s) for each property value s∈dom(f−1).
We assume that an attacker with access to the model weights θ (in addition to the prior π), herein indicated as Attacker(π,θ), will be at least as successful in guessing S as an attacker without access to the model weights θ, who guesses purely based on the prior π. The “generalized advantage,” Advπ, is defined as the difference between the probabilities of success of these two attackers, normalized to a value between 0 and 1 (wherein 0 implies no advantage and 1 implies maximum advantage):
Bayes security, β*, is defined as the complement of the maximum attacker advantage over all priors π:
It can be shown that the generalized advantage is maximized, and Bayes security is achieved, on a uniform prior distribution over two secrets s0, s1∈ (where
is the set of challenge points (or their indices) in case of a membership inference attack and the set
of attribute values in case of an attribute inference attack):
where (s0, s1) is the uniform prior, meaning that Pr[S=s0]=Pr[S=s1]=½ and Pr[S=s]=0 for all s≠s0, s1. Consequently, when studying the security of DP-SGD, it is sufficient to limit the range of S to the two values that are the easiest to distinguish for the attacker, and set It to the uniform prior. For example, for membership inference, this means that by measuring the security for just two challenge points (M=2) that are equally likely to be members, a bound on Bayes security is obtained for arbitrary values of M≥2. Similarly, for attribute inference, it is sufficient to look at the two leakiest attribute values.
Provided herein are lower bounds on Bayes security of DP-SGD against record-level property inference attacks. It is assumed that the attacker aims to guess the random secret property S given not only the model weights following completion of training, but also all intermediate model weights, collectively θ=(θ0, θ1, . . . , θT), for T time steps in model training. While this assumption will rarely be met in practice, where the attacker will generally have access to only the final model weights (if that), it does not undermine providing a security guarantee since less information available to the attacker would only raise the level of security, and accordingly, a bound on Bayes security derived from the larger set including intermediate model weights is still a valid lower bound. The model weights θ=(θ0, θ1, . . . , θT) that the attacker observes are sampled from a random model weights vector O=(O0, O1, . . . , OT), which is conditioned on S because the secret property corresponds to a data record, or attribute thereof, used as part of the training data. The relation between S and O is ruled by the posterior distribution PO|S, which can be viewed as an information theoretic channel. The Bayes security of this channel, β*(PO|S), measures the additional leakage about the secret S that the attacker can exploit by observing O.
As it tuns out, the attacker obtains an even greater advantage if given direct access to the intermediate gradients computed in updating the model weights, rather than the model weights themselves, because the gradients carry at least as much information about the challenge point z* as O. In other words, an attacker has a better advantage (or at least equal) advantage when attacking the channel PG|S, where G=(G0, . . . , GT−1) is the random gradient vector, than when attacking channel PO|S. While access to the intermediate gradients is even less likely to be available in practice than access to the intermediate weights, once again, lowering the lower bound as a result of assuming access to the gradients preserves the security guarantee:
Combining this insight with the above-noted fact that Bayes security is achieved on a uniform prior distribution over two secrets s0, s1, Bayes security can be bounded by:
where tv denotes the total variation distance, as the term is understood in the art. In words, Bayes security of DP-SGD can be computed as the maximal total variation between the two posterior distributions PG|S=s.
In the above expression for Bayes security, PG|S is a mixture of Gaussians. This mixture distribution can be approximated by a single Gaussian distribution for certain choices of parameters that cover many practical purposes (see, e.g., the below example). With this approximation, Bayes security of DP-SGD with respect to a record-level inference threat becomes:
Herein, erf denotes the Gauss error function, and 1−erf(p·Δf/(2√
where
In one example application, a language model was trained using DP-SGD with a training dataset of N=250000 training samples, a batch size of L=256 samples, a noise multiplier σ=0.8, a number of training steps T=3000, and a gradient norm clipping parameter C=0.1. These choices of hyperparameters resulted in a DP guarantee of ε=2.09 for δ=1/N. The direct computation of Bayes security against membership inference using the above equation resulted in an approximate lower bound of 0.944, which is within 1% relative error of a bound of 0.933 computed indirectly through (ε, δ).
The above general result for the lower boundary of Bayes security against record- level inference attacks will now be applied to specific attack types, in accordance with various embodiments.
In membership inference attacks, f(z*s)=s, and Δf is bounded as follows:
such that the lower bound of Bayes security reduces to:
This expression facilitates straightforwardly calculating Bayes security from the number of training steps T, the sampling rate p, and the noise multiplier σ, or conversely, to determine, for a given target Bayes security, a suitable combination of these training hyperparameters. (Note that the sampling rate p may be specified indirectly in terms of the expected number of samples L per training step and the total number of samples N, which is functionally equivalent to specifying p. Similarly, there may be other parameters or expressions that are equivalent to p, T, or σ. Reference to the number of training steps, the sampling rate, and/or the noise multiplier in this specification is deemed to encompass their functional equivalents in scope.)
in this example p≈0.000350 (depicted as a dash-dotted line). Any combination of values of p and σ meeting this linear relation will achieve the desired value of Bayes security (up to an error on the order of √
Bayes security against membership inference serves to protect the existence of a particular data record in the training dataset from discovery, which is important when the mere inclusion in the training data reveals sensitive information. Consider, for example, a machine-learning model trained on patient data of people having a certain disease (e.g., HIV); if an attacker discovers that a particular person's data was used in training the model, the attacker can then infer that that person has HIV. As another example, consider a model trained on financial records based on various data sources; in this case, an attacker (e.g., acting on behalf of a competitor of a trading institution) who can infer which sources the trading institution used in training the model may be able to make certain stock market predictions based thereon. Other usage scenarios and applications will occur to those of ordinary skill in the art.
In attribute inference attacks, assuming that for every challenge point z* there is exactly one value s∈ such that f(z*)=s, and denoting the non-sensitive part of z* as φ(z*), the lower bound of Bayes security becomes:
where R=(R0, . . . , RT−1) with:
where Lt is the batch sampled at step t. In this case, as can be seen, the lower bound on Bayes security depends not only on the training hyperparameters, which here include the gradient norm clipping parameter C in addition to T, p, and σ, but also the gradients computed during model training. The bound on Bayes security is, thus, data-dependent: the value of Rt at any time step t depends on the model weights θt at that step as well as on the data itself.
. It differs from the conventional DP-SGD algorithm of
A for every point z*∈Lt, and determining the maximum distance across all pairs of challenge points z*0=f−1(s0) and z*1=f−1(s1). Thereafter, the gradients are clipped, noise is added, and the updated model weights are computed in the usual manner. The computed gradient bounds at each training step can be used to compute the lower bound of Bayes security resulting for that step.
In the example method 710 illustrated in
The example method 730 of
In the example method 740 of
In yet another example, a machine-learning model may be trained by DP-SGD and the resulting value of Bayes security computed, and comparison against a specified target value of Bayes security may be used to simply make a binary decision whether or not to deploy the trained model. The privacy-preserving action amounts, in this case, to a simple check for sufficient security.
The above-described methods for protecting against attribute inference attacks can be combined and modified in various ways, as will be apparent to those of ordinary skill in the art given the benefit of this disclosure.
Protecting against attribute inference is often important in connection with machine- learning models trained on people's personal data, such as, e.g., medical records or financial records. Such data may include general or domain-specific demographic attributes, such as, for example and without limitation: age, gender, race, ethnicity, national origin, marital status, education level, employment status, diseases, drug dosages, earnings, credit ratings, etc. Another type of application involves data records that are images or text, where an attacker may infer patches of the image given the remainder of the image, or portions of the text given the remainder of the text. Various applications and uses of measuring Bayes security as described herein to achieve a desired level of security against attribute inference attacks will occur to those of ordinary skill in the art.
Machine (e.g., computer system) 800 may include a hardware processor 802 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a hardware processor core, or any combination thereof), a main memory 804 and a static memory 806, some or all of which may communicate with each other via an interlink (e.g., bus) 808. The machine 800 may further include a display unit 810, an alphanumeric input device 812 (e.g., a keyboard), and a user interface (UI) navigation device 814 (e.g., a mouse). In an example, the display unit 810, input device 812 and UI navigation device 814 may be a touch screen display. The machine 800 may additionally include a storage device (e.g., drive unit) 816, a signal generation device 818 (e.g., a speaker), a network interface device 820, and one or more sensors 821, such as a global positioning system (GPS) sensor, compass, accelerometer, or other sensor. The machine 800 may include an output controller 828, such as a serial (e.g., universal serial bus (USB), parallel, or other wired or wireless (e.g., infrared (IR), near field communication (NFC), etc.) connection to communicate or control one or more peripheral devices (e.g., a printer, card reader, etc.).
The storage device 816 may include a machine-readable medium 822 on which are stored one or more sets of data structures or instructions 824 (e.g., software) embodying or utilized by any one or more of the techniques or functions described herein. The instructions 824 may also reside, completely or at least partially, within the main memory 804, within static memory 806, or within the hardware processor 802 during execution thereof by the machine 800. In an example, one or any combination of the hardware processor 802, the main memory 804, the static memory 806, or the storage device 816 may constitute machine-readable media.
While the machine-readable medium 822 is illustrated as a single medium, the term “machine-readable medium” may include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) configured to store the one or more instructions 824.
The term “machine-readable medium” may include any medium that is capable of storing, encoding, or carrying instructions for execution by the machine 800 and that cause the machine 800 to perform any one or more of the techniques of the present disclosure, or that is capable of storing, encoding or carrying data structures used by or associated with such instructions. Non-limiting machine-readable medium examples may include solid-state memories, and optical and magnetic media. Specific examples of machine-readable media may include: non-volatile memory, such as semiconductor memory devices (e.g., Electrically Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM)) and flash memory devices; magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; Random Access Memory (RAM); Solid State Drives (SSD); and CD-ROM and DVD-ROM disks. In some examples, machine-readable media may include non-transitory machine readable media. In some examples, machine-readable media may include machine-readable media that are not a transitory propagating signal.
The instructions 824 may further be transmitted or received over a communications network 826 using a transmission medium via the network interface device 820. The machine 800 may communicate with one or more other machines utilizing any one of a number of transfer protocols (e.g., frame relay, internet protocol (IP), transmission control protocol (TCP), user datagram protocol (UDP), hypertext transfer protocol (HTTP), etc.). Example communication networks may include a local area network (LAN), a wide area network (WAN), a packet data network (e.g., the Internet), mobile telephone networks (e.g., cellular networks), Plain Old Telephone (POTS) networks, and wireless data networks (e.g., Institute of Electrical and Electronics Engineers (IEEE) 802.11 family of standards known as Wi-Fi®, IEEE 802.16 family of standards known as WiMax®), IEEE 802.15.4 family of standards, a Long Term Evolution (LTE) family of standards, a Universal Mobile Telecommunications System (UMTS) family of standards, peer-to-peer (P2P) networks, among others. In an example, the network interface device 820 may include one or more physical jacks (e.g., Ethernet, coaxial, or phone jacks) or one or more antennas to connect to the communications network 826. In an example, the network interface device 820 may include a plurality of antennas to wirelessly communicate using at least one of single-input multiple-output (SIMO), multiple-input multiple-output (MIMO), or multiple-input single-output (MISO) techniques. In some examples, the network interface device 820 may wirelessly communicate using Multiple User MIMO techniques.
Examples, as described herein, may include, or may operate on, logic or a number of components, modules, or mechanisms (all referred to hereinafter as “modules”). Modules are tangible entities (e.g., hardware) capable of performing specified operations and may be configured or arranged in a certain manner. In an example, circuits may be arranged (e.g., internally or with respect to external entities such as other circuits) in a specified manner as a module. In an example, the whole or part of one or more computer systems (e.g., a standalone, client or server computer system) or one or more hardware processors may be configured by firmware or software (e.g., instructions, an application portion, or an application) as a module that operates to perform specified operations. In an example, the software may reside on a machine-readable medium. In an example, the software, when executed by the underlying hardware of the module, causes the hardware to perform the specified operations.
Accordingly, the term “module” is understood to encompass a tangible entity, be that an entity that is physically constructed, specifically configured (e.g., hardwired), or temporarily (e.g., transitorily) configured (e.g., programmed) to operate in a specified manner or to perform part or all of any operation described herein. Considering examples in which modules are temporarily configured, each of the modules need not be instantiated at any one moment in time. For example, where the modules comprise a general-purpose hardware processor configured using software, the general-purpose hardware processor may be configured as respective different modules at different times. Software may accordingly configure a hardware processor, for example, to constitute a particular module at one instance of time and to constitute a different module at a different instance of time.
The following numbered examples are illustrative embodiments:
Example 1 is a machine-learning method for protecting against attribute inference attacks is provided. The method includes training a machine-learning model with a DP-SGD algorithm on training data records comprising at least one attribute; computing gradient bounds with respect to values of the at least one attribute during training; computing a value of Bayes security against record-level inference of the at least one attribute from the gradient bounds in conjunction with values of a set of training hyperparameters of the DP-SGD algorithm; and taking a privacy-preserving action based on the value of Bayes security.
Example 2 is the method of example 1, wherein the privacy-preserving action comprises updating the values of the set of training hyperparameters to increase the value of Bayes security.
Example 3 is the method of example 1, wherein the value of Bayes security is repeatedly updated during training, and the privacy-preserving action comprises stopping training the machine-learning model once the value of Bayes security falls below a target value.
Example 4 is the method of example 3, wherein training the machine-learning model includes iteratively updating weights of the machine-learning model, the weights being stored over the course of multiple successive iterations, and wherein the privacy-preserving action further comprises, once the value of Bayes security falls below the target value, restoring the weights of the machine-learning model associated with an earlier iteration of the multiple successive iterations.
Example 5 is the method of example 1, wherein the privacy-preserving action includes, upon completing training the machine-learning model, comparing the value of Bayes security against a target value, and deploying the machine-learning model only if the lower bound of Bayes security meets or exceeds the target value.
Example 6 is the method of example 1, further comprising training at least one alternative machine-learning model on the training data records by DP-SGD with an associated set of training hyperparameters, computing values of Bayes security against record-level inference of the attribute for the at least one alternative machine-learning model from gradient bounds with respect to values of the attribute computed during training of the at least one alternative machine-learning model in conjunction with values of the associated set of training hyperparameters; and upon completion of training the machine-learning model and the at least one alternative machine-learning model, selecting, among the machine-learning model and the at least one alternative machine-learning model, a model that has a highest associated final value of Bayes security for deployment.
Example 7 is the method of any of examples 1-6, wherein the set of training hyperparameters includes: a sampling rate, a noise multiplier, and a gradient norm clipping parameter.
Example 8 is the method of example 7, wherein the set of training hyperparameters further includes a number of training steps.
Example 9 is the method of example 7 or example 8, wherein the value of Bayes security is computed using a Gauss error function of an argument that comprises a combination of a norm of the gradient bounds computed during the training, the sampling rate, the noise parameter, and the gradient norm clipping parameter.
Example 10 is the method of any of examples 1-9, wherein the data records are personal records for a plurality of people and the attributes comprise a demographic attribute.
Example 11 is the method of example 10, wherein the data records are one of medical records for a plurality of patients or financial records for a plurality of people.
Example 12 is one or more machine-readable media storing processor-readable instructions that, when executed by one or more computer processors, cause the one or more computer processors to perform operations implementing any of the methods of examples 1-12.
Example 13 is a system including one or more computer processors and one or more machine-readable media storing processor-readable instructions that, when executed by one or more computer processors, cause the one or more computer processors to perform operations implementing any of the methods of examples 1-12.
Example 14 is a machine-learning method for protecting against membership inference attacks. The method includes receiving a target value of Bayes security against membership inference attacks; determining a combination of values of a set of training hyperparameters that achieves the target value of Bayes security, the set of training hyperparameters comprising: a number of training steps, a sampling rate, and a noise multiplier;
and training the machine-learning model on a training dataset, using a DP-SGD learning algorithm and the computed combination of values of the set of training hyperparameters.
Example 15 is the method of example 14, wherein the trained machine-learning model is deployed to infer outputs from inputs provided to the trained machine-learning model.
Example 16 is the method of example 14, wherein determining the combination of values of the set of training hyperparameters comprises fixing values of two hyperparameters selected among the number of training steps, the sampling rate, and the noise multiplier and computing a value of the third hyperparameter from the fixed values of the two selected hyperparameters and the target value of Bayes security.
Example 17 is the method of example 14, wherein the values of the sampling rate, the number of training steps, and the noise multiplier are chosen such that a value of the Gauss error function of a product of powers of the sampling rate, the number of training steps, and the noise multiplier is greater than one minus the target value of Bayes security.
Example 18 is one or more machine-readable media storing processor-readable instructions that, when executed by one or more computer processors, cause the one or more computer processors to perform operations implementing any of the methods of examples 14-17.
Example 19 is a system including one or more computer processors and one or more machine-readable media storing processor-readable instructions that, when executed by one or more computer processors, cause the one or more computer processors to perform operations implementing any of the methods of examples 14-17.
Although embodiments have been described with reference to specific example embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader scope of the invention. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense. The accompanying drawings that form a part hereof, show by way of illustration, and not of limitation, specific embodiments in which the subject matter may be practiced. The embodiments illustrated are described in sufficient detail to enable those skilled in the art to practice the teachings disclosed herein. Other embodiments may be used and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. This description, therefore, is not to be taken in a limiting sense, and the scope of various embodiments is defined only by the appended claims, along with the full range of equivalents to which such claims are entitled.