Data security and encryption is a branch of computer science that relates to protecting information from disclosure to third parties and allowing only an intended party or parties access to that information. The data may be encrypted using various techniques, such as public/private key cryptography and/or elliptic cryptography, and may be decrypted by the intended recipient using a corresponding decryption technique.
For a more complete understanding of the present disclosure, reference is now made to the following description taken in conjunction with the accompanying drawings.
In various embodiments of the present disclosure, a first system is a data-provider system and communicates with a second system that is a data/model processing system and a third system that is a model-provider system. The first and third systems permit the second system to process data corresponding to input data to predict an event corresponding to the input data. The input data may include data corresponding to a component, such as voltage, current, temperature, and/or vibration data, data corresponding to movement of material and/or information in a network, such as the flow of energy and/or information, as well as other data. The event may include a change in performance of the component, including failure of the component, a change in the amount of movement, as well as other events.
Machine-learning systems, such as those that use neural networks, may be trained using training data and then used to make predictions on out-of-sample (i.e., non-training) data to predict an event. A system providing this data, referred to herein as a data-provider system, may acquire this data from one or more data sources. The data-provider system may be, for example, a power company, and may collect data regarding operational status of a particular component (e.g., a transformer); this data may include, for example, temperature, vibration, and/or voltage data collected during use of the component. The data may further include rates of movement of material and/or information in a network and/or other factors that may affect the operation and/or movement, such as atmospheric and/or weather conditions and/or inputs to the component and/or network. The data-provider system may then annotate this data to indicate times at which the component failed. Using this collected and annotated data, the data-provider system may train a neural network to predict an event associated with the input data, such as when the same or similar component will next fail based on the already-known times of past failure and/or changes in the movement of the network. Once trained, the data-provider system may deploy the model to attempt to receive additional data collected from the component and make further predictions using this out-of-sample data.
The data-provider system may, however, have access to insufficient training data, training resources, or other resources required to train a model that is able to predict a given event (e.g., failure of the component and/or change in the network) with sufficient accuracy. The data-provider system may thus communicate with another system, such as a model-provider system, that includes such a model. The data-provider system may thus send data regarding the data source(s) to the model-provider system, and the model-provider system may evaluate the model using the data to predict the event. The model of the model-provider system may be trained using data provided by the data-provider system, other data-provider system(s), and/or other sources of data.
The data-provider system may, however, wish to keep the data from the one or more data sources private and may further not wish to share said data with the model-provider system. The model-provider system may similarly wish to keep the model (and/or one or more trained parameters and/or results thereof) secret with respect to the data-provider system (and/or other systems). A third system, such as a secure processor, may thus be used to process data using one or more layers of the model (such as one or more transformation layers, as described herein) to thus prevent the data-provider system from being able to learn input data, output data, and/or parameter data associated with the full model.
For example, the power company may improve their model by training it with additional training data, but this additional training data may not be accessible to the power company. A rival power company, for example, may possess some additional training data, but may be reluctant to provide their proprietary intellectual property to a competitor. In other industries or situations, data owners may further be predisposed to not share their data because the data set is too large to manage or because it is in a different format from other data. In still other industries, data owners may be prohibited to share data, such as medical data, due to state laws and/or regulations. A data owner may further be predisposed to not share data, especially publicly, because any further monetary value in further sharing of the data is lost after sharing the data once. The transformation layer(s), described herein, may permit a given data-provider system access to the benefit of using such a trained model (e.g., predicted events based on shared training data) but not permit the given data-provider system from knowing all of the parameters of the trained model.
In other embodiments, a single data-provider system may not possess or be able to obtain all the data necessary to provide input to a model to make an accurate prediction of the event(s). The type of data possessed by the data-provider system may thus be referred to as vertically partitioned data (as opposed to horizontally partitioned data, which is data that is able to provide all of the inputs to the model). A first data-provider system may possess a first portion of the input data for the model, and a second data-provider system may possess a second portion of the input data. Each data-provider system may wish to make a prediction using the model but may not wish to share its portion of the input data with other data-provider system(s).
Embodiments of the present disclosure thus relate to systems and methods for securely processing data, such as the training data described above, collected from one or more data-provider systems. In some embodiments, some layer(s) of the model are disposed on a first system and other layer(s) of the model are disposed on a second system. If, for example, a model-provider system provides a model to a data-provider system, the model-provider system may prevent the data-provider system from having full access to the model, and in particular all of the parameters associated with the model, by using a third system, referred to herein as a secure processor, to process data using at least one layer of the model.
In other embodiments, a first data-provider may process a first portion of vertically partitioned data using first input layer(s), and a second data-provider system may process a second portion of the vertically partitioned data using second input layer(s). Each data-processing system may send the results of this processing, referred to herein as feature data, to a secure processor, which may combine the feature data and send result(s) of processing the feature data back to the data-provider systems. Thus each data-provider system may receive the benefit of training the model using data from at least one other data-provider system without having access to the actual data of the other data-provider system(s)
Referring first to
Referring to
The data/model-processing system 120a may include a number of other components. In some embodiments, the data/model-processing system 120 includes one or more secure-processing component(s) 204. Each secure-processing component 204 may store or otherwise access data that is not available for storage and/or access by the other systems 122, 124 and/or other components 204. For example, the data encryption/decryption component may store and/or access the private key κ−; other components, such as a homomorphic operation component and/or a data-evaluation component may not store and/or have access to the private key κ−. The components may be referred to as containers, data silos, and/or sandboxes.
As described herein, one or more of the model-provider system 122, a data/model processing system 120, and a data-provider system may exchange data, such as model-output data, layer-output data, and/or parameter data. In some embodiments, some or all of this data may be encrypted prior to sending and/or decrypted upon receipt in accordance with one or more encryption functions, described below. For example, a first data-provider system 124a may exchange encryption information with a second data-provider system 124b and/or a model-provider system 122, as defined below, before exchanging data (such as, for example, neural-network parameters) encrypted using the encrypted information. In other embodiments, however, data exchanged between the data/model processing system 120, model-provider system 122, and/or data-provider system 124 is not encrypted. In some embodiments, one or more homomorphic operations are performed by the data/model processing system 120; in other words, the data/model processing system may act as an aggregator of data sent between the data-provider system(s) 124 and/or the model-provider system 122. Sending encrypted or unencrypted data is within the scope of the present disclosure.
For example, if encryption is used to exchange data, an RSA encryption function H(m) may be defined as shown below in equation (1), in which a and n are values configured for a specific encryption function.
H(m)=ame(mod n) (1)
A corresponding decryption function H−1(c) may be used to decrypt data encrypted in accordance with the encryption function of equation (1). In some embodiments, the decryption function H−1(c) is defined using the below equation (2), in which loga is the discrete logarithm function over base a. The algorithm function loga may be computed by using, for example, a “baby-step giant-step” algorithm.
H
−1=loga(cd)(mod n) (2)
In various embodiments, data encrypted using the encryption function H(m) is additively homomorphic such that H(m1+m2) may be determined in accordance with the below equations (3) and (4).
H(m1+m2)=a(m
H(m1+m2)=am
In some embodiments, the above equations (3) and (4) may be computed or approximated by multiplying H(m1) and H(m2) in accordance with the below equation (5) and in accordance with the homomorphic encryption techniques described herein.
H(m1+m2)=H(m1)H(m2) (5)
Similarly, the difference between H(m1) and H(m2) may be determined by transforming H(m1) and H(m2) into its negative value in accordance with equation (6).
H(m1−m2)=H(m1)×(−1)H(m2) (6)
The result of Equation (6) may be the encrypted difference data described above.
Homomorphic encryption using elliptic-curve cryptography utilizes an elliptic curve to encrypt data, as opposed to multiplying two prime numbers to create a modulus, as described above. An elliptic curve E is a plane curve over a finite field Fp of prime numbers that satisfies the below equation (7).
y
2
=x
3
+ax+b (7)
The finite field Fp of prime numbers may be, for example, the NIST P-521 field defined by the U.S. National Institute of Standards and Technology (NISI). In some embodiments, elliptic curves over binary fields, such as NIST curve B-571, may be used as the finite field Fp of prime numbers. A key is represented as (x,y) coordinates of a point on the curve; an operator may be defined such that using the operator on two (x,y) coordinates on the curve yields a third (x,y) coordinate also on the curve. Thus, key transfer may be performed by transmitting only one coordinate and identifying information of the second coordinate.
The above elliptic curve may have a generator point, G, that is a point on the curve—e.g., G=(x,y)∈E. A number n of points on the curve may have the same order as G—e.g., n=o(G). The identity element of the curve E may be infinity. A cofactor h of the curve E may be defined by the following equation (8).
A first party, such as the data/model processing system 120, model provider system 122, and/or model provider system 122, may select a private key nB that is less than o(G). In various embodiments, at least one other of the data/model processing system 120, model provider system 122, and/or model provider system 122 is not the first party and thus does not know the private key nB. The first party may generate a public key PB in accordance with equation (9).
P
B
=n
B
G=Σ
i
n
G (9)
The first party may then transmit the public key PB to a second party, such as one or more of the data/model processing system 120, model provider system 122, and/or model provider system 122. The first party may similarly transmit encryption key data corresponding to domain parameters (p, a, b, G, n, h). The second party may then encrypt data m using the public key PB. The second party may first encode the data m; if m is greater than zero, the second party may encode it in accordance with mG; m is less than zero, the second party may encode it in accordance with (−m)G−1. If G=(x,y), G−1=(x,−y). In the below equations, however, the encoded data is represented as mG for clarity. The second party may perform the encoding using, for example, a doubling-and-adding method, in O(log(m)) time.
To encrypt the encoded data mG, the second party may select a random number c, wherein c is greater than zero and less than a finite field prime number p. The second party may thereafter determine and send encrypted data in accordance with the below equation (10).
H(m)={cG,mG+PB} (10)
A corresponding decryption function H−1(m) may be used to decrypt data encrypted in accordance with the encryption function of equation (1). The decrypted value of H(m) is m, regardless of the choice of large random number c. The first party may receive the encrypted data from the second party and may first determine a product of the random number c and the public key PB in accordance with equation (11).
cP
B
=c(nBG)=nB(cG) (11)
The first party may then determine a product of the data m and the generator point Gin accordance with the below equation (12).
mG=(mG+cPB)=nB(cG) (12)
Finally, the first party may decode mG to determine the data m. This decoding, which may be referred to as solving the elliptic curve discrete logarithm, may be performed using, for example, a baby-step-giant-step algorithm in O(√{square root over (m)}) time.
In various embodiments, data encrypted using the encryption function H(m) is additively homomorphic. That is, the value of H(m1+m2) may be expressed as shown below in equation (13).
H(m1+m2)={cG,(m1+m2)G+cPB} (13)
The value of H(m1)+H(m2) may be expressed as shown below in equations (14) and (15).
H(m1)+H(m2)={c1G,m1G+c1PB}+{c2G,m2G+c2PB} (14)
H(m1)+H(m2)={(c1+c2)G,(m1+m2)G+(c1+c2)PB} (15)
Therefore, H(m1+m2)=H(m1)+H(m2). Similarly, if m is negative, H(m) may be expressed in accordance with equation (16).
H(m)={cG,(−m)G−1+cPB} (16)
H(m1)−H(m2) may thus be expressed as below in accordance with equation (17).
Referring first to
Each data 304, 306 of each data provider 124a, 124b may include a number of vectors of data having dimension N, which may be the same dimension of the model input data 302. That is, each data provider 124a, 124b determines data that represents each of the inputs of the model 128. Thus, a single data provider 124 may provide all the inputs necessary to the model 128 in order for the model to begin processing data. This arrangement of data may thus be referred to as horizontally partitioned data. Each data provider 124 may determine any number of vectors of dimension N for processing by the model 128.
Referring to
Referring first to
One or more input layer(s) 404 may process input data 404 in accordance with input layer(s) parameter data 406 to determine feature data 408. In some embodiments, the input layer(s) 404 are disposed on a data-provider system 124. The input data 402 may comprise one or more vectors of N values corresponding to data collected from one or more data sources 126. The feature data 408 may be processed by the transform layer(s) 410 in accordance with transform layer(s) parameter data 412 to determine transform data 414. The transformed data 414 may be processed using output layer(s) 416 in accordance with output layer(s) parameter data 418 to determine output data 420. As described herein, the input layer(s) 404 and output layer(s) 416 may be disposed on a data-provider system 124, and the transform layer(s) 410 may be disposed on a secure-processing component 204.
With reference to
Similarly, output layer(s) 438 may process the feature data 436 to determine output data 442. Each data-provider system 124 may include output layer(s) 438 configured to process feature data 436 in accordance with output layer(s) parameter data 440 corresponding to that data-provider system 124; the secure-processing component 204 may include further output layer(s) 438 for processing feature data 436 in accordance with output layer(s) parameter data 440 corresponding to multiple data-provider systems 124.
In some embodiments, the model-provider system 122a may, upon receipt of the request, send a corresponding acknowledgement (504) indicating acceptance of the request. The acknowledgement may indicate that the model-provider system is capable of enabling prediction of occurrence of the event (within, in some embodiments, the desired duration of time). In some embodiments, however, the model-provider system 122a may send, to the data-provider system, response data. This response data may include a request for further information identifying the component (such as additional description of the component and/or further information identifying the component, such as a make and/or model number). The data-provider system 124a may then send, in response to the request, the additional information, and the model-provider system 122a may then send the acknowledgement in response.
The response data may further include an indication of a period of time corresponding to the prediction of the event different from the period of time requested by the data-provider system 124a. For example, the data-provider system 124a may request that the prediction corresponds to a period of time approximately equal to two weeks before failure of the component. The model-provider system 122a may be incapable of enabling this prediction; the model-provider system 122a may therefore send, to the data-provider system 124a, an indication of a prediction that corresponds to a period of time approximately equal to one week before failure of the component. The data-provider system 124a may accept or reject this indication and may send further data to the model-provider system 122a indicating the acceptance or rejection; the model-provider system 122a may send the acknowledgement in response. The model-provider system 122a may further send, to the data/model processing system 120a and/or the secure processing component 204a, a notification (506) indication the initiation of processing. Upon receipt, the data/model processing system 120a and/or secure processing component 204a may create or otherwise enable use of the secure processing component 204a, which may be referred to as a container, data silo, and/or sandbox. The secure processing component 204a may thus be associated with computing and/or software resources capable of performing processing using one or more layer(s) of a model, as described herein without making the details of said processing, such as parameters associated with the layer(s), known to at least one other system (such as the data-provider system 124a).
The model-provider system 122a may then select a model 128 corresponding to the request (502) and/or data-provider system 124a and determine parameters associated with the model 128. The parameters may include, for one or more nodes in the model, neural-network weights, neural-network offsets, or other such parameters. The parameters may include a set of floating-point or other numbers representing the weights and/or offsets.
The model-provider system 122a may select a model 128 previously trained (or partly trained) in response to a previous request similar to the request 502 and/or data from a previous data-provider system 124 similar to the data-provider system 124a. For example, if the data-provider system 124a is an energy-provider company, the model-provider system 122a may select a model 128 trained using data from other energy-provider companies. Similarly, if the request 502 is associated with a particular component, the model-provider system 122a may select a model 128 trained using data associated with the component. The model-provider system 122a may then determine (508) initial parameter data associated with the selected model 128. In other embodiments, the model-provider system 122a selects a generic model 128 and determines default and/or random parameters for the generic model 128.
The model-provider system 122a may then send, to the data provider system 124, input layer(s) initial parameter data (510) and output layer(s) initial parameter data (512). The model-provider system 122a may similarly send, to the secure-processing component 204a, transform layer(s) initial parameter data (514). This sending of the initial data 510, 512, 514 may be performed once for each data-provider system 124a and/or secure-processing component 204a (and then, as described below, multiple training steps may be performed using these same sets of initial data 510, 512, 514). In other embodiments, the model-provider system 122a may determine and send different sets of initial data 510, 512, 514 (and/or model layer(s)) for each training step and/or sets of training steps.
In some embodiments, if the data-provider system 124a and/or the secure-processing component 204a does not possess or otherwise have access to the input layer(s) 404, transformation layer(s 410), and/or output layer(s) 416, the model-provider system 122a may further send, to the data-provider system 124a, the input layer(s) 404 and/or output layer(s) 416 (and/or indication(s) thereof) and send, to the secure-processing component 204, the transformation layer(s) 410 (and/or an indication thereof).
Referring to
The secure-processing system 204a, upon receipt of the initial feature data (522), may similarly process (524) the initial feature data (522) using the transformation layer(s) 410 and the transformation layer(s) initial parameter data (514) to determine initial transformed data 526, which may similarly be the output of the transformation layer(s) 410. The secure-processing component 204a may similarly send the initial transformed data (526) to the data-provider system 124.
Referring to
The data-provider system 124a may further determine (534) updated transformed data (536), which it may send to the secure-processing component 204a. The data-provider system 124a may make this determination using the output layer(s) and the updated output layer(s) parameter data (530), as determined above, by holding the parameters constant and back-propagating output data through the output layer(s) 416. This back-propagation may be referred to as a coarse-grained back-propagation. In greater detail, the loss function may be used to compare the initial output layer(s) parameter data (512) and the target data, and the updated transformed data (536) may be determined in accordance with the partial derivative of the output of the loss function with respect to the transformed data (526). This operation is illustrated below in Equation (18). Determination of the updated output layer(s) parameter data (530) and of the updated transformed data (534) may be performed simultaneously (e.g., in the same SGD loop) or separately.
inputupdat=inputinit−η{∂(L(output;target))/∂(input)} (18)
In the above Equation (18), η denotes a multiple factor that corresponds to the learning rate.
Referring to
The secure-processing component 204a further determines (544) updated feature data (546) and sends the updated feature data (546) to the data-provider system 124a. Similar to the above, the secure-processing component 204a may perform a coarse-grained back-propagation using the updated transformed data (536) and the initial transformed data (526) to determine the updated feature data (546). In greater detail, the loss function may be used to compare the updated transformed data (536) and the initial transformed data (526, and the updated feature data (546) may be determined in accordance with the partial derivative of the output of the loss function with respect to the feature data, as shown above in Equation (18).
Referring to
The above discussion relates to embodiments of the present disclosure in which one or more of the input layer(s) 404, transform layer(s) 410, and/or output layer(s) 416 may be trained. During runtime operation (using, e.g., out-of-sample data), the data-provider system 124 may determine feature data 408 using out-of-sample input data 402 and may send the feature data 408 to the secure-processing component 204a. The secure-processing component 204a may process the feature data 408 using the transform layer(s) 410 to determine transformed data 414, which it may send back to the data-provider system 124a. The data-provider system 124a may then process, using the output layer(s) 416, the transformed data 414 to determine output data 420. The output data 420 may correspond to a prediction of an event corresponding to the input data 402.
Referring first to
The model-provider system 122b may then select a model 128 corresponding to the requests (602), (604) and/or data-provider systems 124b, 124c and determine parameters associated with the model 128. The parameters may include, for one or more nodes in the model, neural-network weights, neural-network offsets, or other such parameters. The parameters may include a set of floating-point or other numbers representing the weights and/or offsets.
The model-provider system 122b may select a model 128 previously trained (or partly trained) in response to a previous request similar to the requests 602, 604 and/or data from a previous data-provider system 124 similar to the data-provider systems 124b, 124c. The model-provider system 122b may then determine (612) initial parameter data associated with the selected model 128. In other embodiments, the model-provider system 122b selects a generic model 128 and determines default and/or random parameters for the generic model 128.
The model-provider system 122b may then send, to first data-provider system A 124b, first initial parameter data (614), to the second data-provider system 124c, second initial parameter data (616), and to the secure-processing component 204b, third initial parameter data (618). This sending of the initial data 614, 616, 618 may be performed once for each data-provider system 124b, 124c and/or secure-processing component 204b (and then, as described below, multiple training steps may be performed using these same sets of initial data 614, 616, 618). In other embodiments, the model-provider system 122b may determine and send different sets of initial data 614, 616, 618 (and/or model layer(s)) for each training step and/or sets of training steps.
Referring to
Referring to
The first data-provider system A 124a may then determine (638a) updated input layer(s) parameter data by comparing the updated feature data A with target feature data, and the second data-provider system B may determine (638b) updated input layer(s) parameter data by comparing the updated feature data B with target feature data. The first and/or second data-provider systems 124 may then use the updated parameter data to process further (e.g., out-of-sample) input data to determine a prediction of an event corresponding to the input data.
In various embodiments, the secure-processing component 204b, selects a subset (e.g., a sample) of the output layer(s) parameter data 440 to determine the updated feature data. The subset may be, for example, a single latest-determined set of values of the updated feature data. The subset may instead or in addition correspond to a weighted average of a set of latest-determined values of the updated feature data, in which later-determined values have a higher weight than earlier-determined values. In other embodiments, the secure-processing component 204a determines a distribution (such as a marginal distribution and/or Gaussian distribution) that represents values of the parameter data 440 and samples the distribution to determine the subset.
In some embodiments, each data-provider system 124 may further determine updated parameter values for its corresponding output layer(s). This determination may be referred to as fine-tuning the output layer(s) (e.g., modifying the parameters of the output layer(s) in accordance with target data corresponding to a particular data-provider system 124. A data-provider system 124 may thus use the updated parameters of the output layer(s) 440, as well as the updated parameters of the input layer(s) 432, as described above) to process input data 430 to determine output data 442 corresponding to prediction of an event.
As mentioned above, a neural network may be trained to perform some or all of the computational tasks described herein. The neural network, which may include input layer(s) 404, 432 transform layer(s) 410, and/or output layer(s) 416, 438 may include nodes within the input layer(s) 404, 432 transform layer(s) 410, and/or output layer(s) 416 that are further organized as an input layer, one or more hidden layers, and an output layer. The input layer of each of the input layer(s) 404, 432 transform layer(s) 410, and/or output layer(s) 416 may include m nodes, the hidden layer(s) may include n nodes, and the output layer may include o nodes, where m, n, and o may be any numbers and may represent the same or different numbers of nodes for each layer. Each node of each layer may include computer-executable instructions and/or data usable for receiving one or more input values and for computing an output value. Each node may further include memory for storing the input, output, or intermediate values. One or more data structures, such as a long short-term memory (LSTM) cell or other cells or layers may additionally be associated with each node for purposes of storing different values. Nodes of the input layer may receive input data, and nodes of the output layer may produce output data. In some embodiments, the input data corresponds to data from a data source, and the outputs correspond to model output data. Each node of the hidden layer may be connected to one or more nodes in the input layer and one or more nodes in the output layer. Although the neural network may include a single hidden layer, other neural networks may include multiple middle layers; in these cases, each node in a hidden layer may connect to some or all nodes in neighboring hidden (or input/output) layers. Each connection from one node to another node in a neighboring layer may be associated with a weight or score. A neural network may output one or more outputs, a weighted set of possible outputs, or any combination thereof.
In some embodiments, a neural network is constructed using recurrent connections such that one or more outputs of the hidden layer of the network feeds back into the hidden layer again as a next set of inputs. Each node of the input layer connects to each node of the hidden layer(s); each node of the hidden layer(s) connects to each node of the output layer. In addition, one or more outputs of the hidden layer(s) is fed back into the hidden layer for processing of the next set of inputs. A neural network incorporating recurrent connections may be referred to as a recurrent neural network (RNN). An RNN or other such feedback network may allow a network to retain a “memory” of previous states and information that the network has processed.
Processing by a neural network may be determined by the learned weights on each node input and the structure of the network. Given a particular input, the neural network determines the output one layer at a time until the output layer of the entire network is calculated. Connection weights may be initially learned by the neural network during training, where given inputs are associated with known outputs. In a set of training data, a variety of training examples are fed into the network. As examples in the training data are processed by the neural network, an input may be sent to the network and compared with the associated output to determine how the network performance compares to the target performance. Using a training technique, such as backpropagation, the weights of the neural network may be updated to reduce errors made by the neural network when processing the training data.
The model(s) discussed herein may be trained and operated according to various machine learning techniques. Such techniques may include, for example, neural networks (such as deep neural networks and/or recurrent neural networks), inference engines, trained classifiers, etc. Examples of trained classifiers include Support Vector Machines (SVMs), neural networks, decision trees, AdaBoost (short for “Adaptive Boosting”) combined with decision trees, and random forests. Focusing on SVM as an example, SVM is a supervised learning model with associated learning algorithms that analyze data and recognize patterns in the data, and which are commonly used for classification and regression analysis. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other, making it a non-probabilistic binary linear classifier. More complex SVM models may be built with the training set identifying more than two categories, with the SVM determining which category is most similar to input data. An SVM model may be mapped so that the examples of the separate categories are divided by decision boundaries. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gaps they fall on. Classifiers may issue a “score” indicating which category the data most closely matches. The score may provide an indication of how closely the data matches the category.
In order to apply the machine learning techniques, the machine learning processes themselves need to be trained. Training a machine learning component such as, in this case, one of the first or second models, may require establishing a “ground truth” for the training examples. In machine learning, the term “ground truth” refers to an expert-defined label for a training example. Machine learning algorithms may use datasets that include “ground truth” information to train a model and to assess the accuracy of the model.. Various techniques may be used to train the models including backpropagation, statistical learning, supervised learning, semi-supervised learning, stochastic learning, stochastic gradient descent, or other known techniques. Thus, many different training examples may be used to train the classifier(s)/model(s) discussed herein. Further, as training data is added to, or otherwise changed, new classifiers/models may be trained to update the classifiers/models as desired. The model may be updated by, for example, back-propagating the error data from output nodes back to hidden and input nodes; the method of back-propagation may include gradient descent.
In some embodiments, the trained model is a deep neural network (DNN) that is trained using distributed batch stochastic gradient descent; batches of training data may be distributed to computation nodes where they are fed through the DNN in order to compute a gradient for that batch. The secure processor 204 may update the DNN by computing a gradient by comparing results predicted using the DNN to training data and back-propagating error data based thereon. In some embodiments, the DNN includes additional forward pass targets that estimate synthetic gradient values and the secure processor 204 updates the DNN by selecting one or more synthetic gradient values.
A variety of components may be connected through the input/output device interfaces 702. For example, the input/output device interfaces 702 may be used to connect to the network 170. Further components include keyboards, mice, displays, touchscreens, microphones, speakers, and any other type of user input/output device. The components may further include USB drives, removable hard drives, or any other type of removable storage.
The controllers/processors 704 may processes data and computer-readable instructions and may include a general-purpose central-processing unit, a specific-purpose processor such as a graphics processor, a digital-signal processor, an application-specific integrated circuit, a microcontroller, or any other type of controller or processor. The memory 708 may include volatile random access memory (RAM), non-volatile read only memory (ROM), non-volatile magnetoresistive (MRAM), and/or other types of memory. The storage 706 may be used for storing data and controller/processor-executable instructions on one or more non-volatile storage types, such as magnetic storage, optical storage, solid-state storage, etc.
Computer instructions for operating the server 700 and its various components may be executed by the controller(s)/processor(s) 704 using the memory 708 as temporary “working” storage at runtime. The computer instructions may be stored in a non-transitory manner in the memory 708, storage 706, and/or an external device(s). Alternatively, some or all of the executable instructions may be embedded in hardware or firmware on the respective device in addition to or instead of software.
The above aspects of the present disclosure are meant to be illustrative. They were chosen to explain the principles and application of the disclosure and are not intended to be exhaustive or to limit the disclosure. Many modifications and variations of the disclosed aspects may be apparent to those of skill in the art. Persons having ordinary skill in the field of computers and data processing should recognize that components and process steps described herein may be interchangeable with other components or steps, or combinations of components or steps, and still achieve the benefits and advantages of the present disclosure. Moreover, it should be apparent to one skilled in the art that the disclosure may be practiced without some or all of the specific details and steps disclosed herein.
Aspects of the disclosed system may be implemented as a computer method or as an article of manufacture such as a memory device or non-transitory computer readable storage medium. The computer readable storage medium may be readable by a computer and may comprise instructions for causing a computer or other device to perform processes described in the present disclosure. The computer readable storage medium may be implemented by a volatile computer memory, non-volatile computer memory, hard drive, solid-state memory, flash drive, removable disk, and/or other media. In addition, components of one or more of the modules and engines may be implemented as in firmware or hardware, which comprises, among other things, analog and/or digital filters (e.g., filters configured as firmware to a digital signal processor (DSP)).
Conditional language used herein, such as, among others, “can,” “could,” “might,” “may,” “e.g.,” and the like, unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without other input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular embodiment. The terms “comprising,” “including,” “having,” and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude additional elements, features, acts, operations, and so forth. Also, the term “or” is used in its inclusive sense (and not in its exclusive sense) so that when used, for example, to connect a list of elements, the term “or” means one, some, or all of the elements in the list.
Disjunctive language such as the phrase “at least one of X, Y, Z,” unless specifically stated otherwise, is otherwise understood with the context as used in general to present that an item, term, etc., may be either X, Y, or Z, or any combination thereof (e.g., X, Y, and/or Z). Thus, such disjunctive language is not generally intended to, and should not, imply that certain embodiments require at least one of X, at least one of Y, or at least one of Z to each be present. As used in this disclosure, the term “a” or “one” may include one or more items unless specifically stated otherwise. Further, the phrase “based on” is intended to mean “based at least in part on” unless specifically stated otherwise.
In various embodiments, a computer-implemented method comprises processing, by a first system using an input layer of a neural-network model, first input data to determine first feature data, the input layer corresponding to first neural-network parameters; sending, from the first system to a second system, the first feature data;
receiving, at the first system from the second system, first transformed data corresponding to the first feature data and determined by a transformation layer of the neural-network model; processing, by the first system, the first transformed data using an output layer of the neural-network model to determine first output data;
determining, by the first system, second transformed data corresponding to the first output data and target output data; sending, from the first system to the second system, the second transformed data; receiving, at the first system from the second system, second feature data corresponding to the second transformed data and target transformed data; determining, by the first system, second neural-network parameters corresponding to the second feature data and target feature data; and processing, by the first system using the input layer and the second neural-network parameters, second input data corresponding to an event to determine third feature data corresponding to a prediction of the event.
Determining the second transformed data may comprise determining, using a loss function, a difference between the first output data and the target output data; and determining a partial derivative of the difference with respect to the second transformed data.
Determining the second feature data may comprises determining, using a loss function, a difference between the second transformed data and the target transformed data; and determining a partial derivative of the difference with respect to the second feature data.
The method may further comprise sending, to a third system, the second neural-network parameters; sending from the third system to a fourth system, data based at least in part on the second neural-network parameters; and processing, by the fourth system using the data, third input data to determine fourth feature data.
The method may further comprise sending, to the second system, the third feature data; receiving, at the first system from the second system, third transformed data corresponding to the third feature data; and processing, by the first system using the output layer of the neural-network model and the third transformed data, the third transformed data to determine output data representing the prediction.
The computer-implemented method of claim 1, wherein the event corresponds to failure of a component corresponding to the first system and wherein the first input data corresponds to operational data corresponding to the component.
The event may correspond to a change in a network corresponding to the first system and wherein the first input data corresponds to operational data corresponding to the network.
The method may further comprise processing, by the second system, the first feature data using a transformation layer of the neural-network mode to determine the first transformed data; and determining, by the second system, the second feature data corresponding to the first output transformed data and target output transformed data.
Processing the first feature data may be based at least in part on an affine transformation.
The method may further comprise determining, by a third system, third neural-network parameters corresponding to the transformation layer, the third neural-network parameters based at least in part on a random value; and sending, from the third system to the second system, the third neural-network parameters.
In various embodiments, a computer-implemented method may comprise receiving, from a first data-provider system at a second system, first feature data determined by a first input layer of a first neural-network model, the first feature data corresponding to a first subset of inputs to an output layer of the first neural-network model; receiving, from a second data-provider system at the second system, second feature data determined by a second input layer of a second neural-network model, the second feature data corresponding to a second subset of inputs to the output layer; determining, by the second system, first combined feature data corresponding to the first feature data and the second feature data; processing, by the second system using the an output layer corresponding to the first of the neural-network model and the second neural-network model, the first combined feature data to determine output data; determining, by the second system, second combined feature data corresponding to the first combined feature data and target feature data; sending, from the second system to the first data-provider system, third feature data corresponding to the second combined feature data and the first subset; and processing, by the first data-provider system using the first neural-network model and based at least in part on the third feature data, input data corresponding to an event to determine fourth feature data representing a prediction of the event.
The event may correspond to a change in a first network corresponding to the first data-provider system, wherein the first feature data corresponds to first operational data corresponding to the first network, and wherein the second feature data corresponds to second operational data corresponding to a second network different from the first network.
Determining the second transformed combined feature data may comprise processing the second combined feature data using an output layer of a third neural-network to determine second output data; determining, using a loss function, a difference between the second output data and the target output data; determining a partial derivative of the difference with respect to the second combined feature data; and determining the third feature data based at least in part on the partial derivative.
The method may further comprise determining parameter data corresponding to an output layer of a third fourth neural-network; and determining a sample of the parameter data, wherein the output data corresponds to the sample.
Determining the sample may comprise at least one of: determining a weighted average corresponding to the parameter data; or determining a distribution representing the parameter data.
The method may further comprise receiving, from a third data-provider system at the second system, fifth feature data determined by a third input layer of a third neural-network model, the first feature data corresponding to the first subset of inputs and to the second subset of inputs; and processing, by the second system using the output layer corresponding to the first neural-network model and the second neural-network model, the fifth feature data to determine second output data.
The method may further comprise processing, by the first data-provider system, the fourth feature data by an output layer of the first neural-network model to determine output data representing the prediction of the event.
The method may further comprise sending, from the second system to the second data-provider system, fifth feature data corresponding to the second combined feature data and the second subset; and processing, by the second data-provider system using the second neural-network model and based at least in part on the fifth feature data, second input data corresponding to a second event to determine sixth feature data representing a prediction of the second event.
The method may further comprise processing, by the first data-provider system, fifth feature data by an output layer of the first neural-network model to determine output data; determining a difference, using a loss function, between the output data and target output data; and determining, by the first data-provider system, neural-network parameters corresponding to the output layer based at least in part on the difference.
The method may further comprise sending, from the first data-provider system to the second data-provider system, encryption data; receiving, from the second data-provider system at the first data-provider system, encrypted data corresponding to the encryption data; and decrypting the encrypted data in accordance with the encryption data to determine second data.
This application claims the benefit of and priority to U.S. Provisional Patent Application No. 62/916,512, filed Oct. 17, 2019, and entitled “Learning Network Modules Over Vertically Partitioned Data Sets,” in the names of John Christopher Muddle, et al.; U.S. Provisional Patent Application No. 62/916,825, filed Oct. 18, 2019, and entitled “TAC Learning of Models to Protect AP's IP from DO,” in the names of Mathew Rogers, et al.; and U.S. Provisional Patent Application No. 62/939,045, filed Nov. 22, 2019, and entitled “TAC Learning of Models to Protect AP's IP from DO,” in the names of Mathew Rogers, et al. The above provisional applications are herein incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
62916512 | Oct 2019 | US | |
62916825 | Oct 2019 | US | |
62939045 | Nov 2019 | US |